networkx-2 support #443

ixcat · 2018-01-31T21:46:15Z

Fixes for networkx version>2

Not heavily tested - unit tests and manual ERD draw for simple single schema set of relations OK.

…ants

coveralls · 2018-01-31T21:51:11Z

Coverage decreased (-0.04%) to 89.315% when pulling ccd1dc4 on ixcat:networkx-2 into a577bb5 on datajoint:master.

eywalker · 2018-02-01T05:52:52Z

datajoint/erd.py

@@ -206,8 +206,8 @@ def _make_graph(self):
                nx.algorithms.boundary.node_boundary(nx.DiGraph(self).reverse(), self.nodes_to_show))
            nodes = self.nodes_to_show.union(a for a in gaps if a.isdigit)
            # construct subgraph and rename nodes to class names
-            graph = nx.DiGraph(self).subgraph(nodes)
-            nx.set_node_attributes(graph, 'node_type', {n: _get_tier(n) for n in graph})
+            graph = nx.DiGraph(nx.DiGraph(self).subgraph(nodes))


what's the reason for the redundancy in DiGraph construction?

The .subgraph call now returns a subgraph 'view', which is immutable and cannot be relabeled later on in line 219's nx.relabel_nodes(graph, mapping, copy=False). I also tried doing a return nx.relabel_nodes(graph, mapping, copy=True) in that line but it didn't work on 1st pass - from here I didn't really investigate the API change further - since copy=True imples making a copy and the line you are mentioning would also make a a copy I viewed these as equivalent and moved on.. That said, I'm not sure why we are calling nx.DiGraph.subgraph specifically in the original case; self.subgraph should ostensibly work I'd imagine..

Yes I'm imagining that self.subgraph should suffice. @dimitri-yatsenko do you recall why we have to do nx.Digraph(self).subgraph? I have a feeling that this is legacy code

working on the speedtest now - it seems the constructor is called internally and the __init___ signatures between the classes vary, though I haven't dug further just yet & theoretically calling via superclass would work around this..

it seems that networkx's digraph.py calls H.__class__() to create an instance copy within the subgraph function - since this refers to Dependencies which requires connection as a parameter, the call fails. Making this an optional argument allows self.subgraph(...) to work, however, this could potentially be undesirable w/r/t the rest of the code depending on opinions...

more related to this this in the descendants performance discussion

We probably should override subgraph so connection gets passed down - although it also makes sense to check if connection is needed in the subgraphs.

Thinking to audit for connection use and proceed or just take the performance hit -

Maintaining a random graph related function in-tree to avoid this issue doesn't seem architecturally correct since this class isn't really a 'graph subclass' in terms of 'providing additional graph functionality' (even though it is technically a subclass) ..

That said it isn't such a big deal to copy the function in, annotate the reasoning and sync when needed with upstream. Networkx is BSD licencsed btw so this wouldn't impact DJ.

@dimitri-yatsenko care to weigh in?

If everything works, then no need to pass the connection

@dimitri-yatsenko apologies if unclear - so does this mean the additional patch to datajoint/dependencies.py mentioned below can be added to this PR after the extra comments/code is cleaned?

eywalker · 2018-02-01T05:53:21Z

requirements.txt

-networkx~=1.11
-pydotplus
+networkx
+pydot


why pydot vs pydotplus?

Networkx apparently flipped in 2.0; my memory from research a few months back was that pydot went unmaintained for a bit, pydotplus was a python 3 + added feature fork, subsequently pydot development picked up and supports python3 and pydotplus appears dormant.

1.11 docs:

https://networkx.github.io/documentation/networkx-1.11/reference/drawing.html#module-networkx.drawing.nx_pydot

Pydot Import and export NetworkX graphs in Graphviz dot format using pydotplus. Either this module or nx_agraph can be used to interface with graphviz. See also PyDotPlus https://github.com/carlos-jenkins/pydotplus

2.1 docs:

https://networkx.github.io/documentation/stable/reference/drawing.html#module-networkx.drawing.nx_pydot

Pydot Import and export NetworkX graphs in Graphviz dot format using pydot. Either this module or nx_agraph can be used to interface with graphviz. See also pydot https://github.com/erocarrera/pydot Graphviz

Thanks for pointing it out - this reminds me I should double check rendering on Windows; my recollection from the initial testing of pydotplus to avoid the graphviz issues is that both seem to work but I'm not 100% on this.

ixcat · 2018-02-01T21:27:36Z

datajoint/dependencies.py

@@ -101,4 +101,5 @@ def descendants(self, full_table_name):
        :return: all dependent tables sorted in topological order.  Self is included.
        """
        nodes = nx.algorithms.dag.descendants(self, full_table_name)
-        return [full_table_name] + nx.algorithms.dag.topological_sort(self, nodes)
+        sort = nx.algorithms.dag.topological_sort(self)
+        return [full_table_name] + [s for s in sort if s in nodes]


alternate/better performing implementation suggestion @eywalker:
create a subgraph from full_table_name and descendants and then sort

Based on a test of calling descendants 10000 times for every class within the graph of 41 classes discussed elsewhere, I was able to get the following results locally:

initial code above: ~250 iterations of the subclasses-for-each-classes per second

Commented code in patch below: ~125 iterations/sec

Live code in patch below: ~450 iterations/sec, however this requires making connection optional in the Dependencies constructor

can provide the test code elsewhere if that might be useful.

Patch:
--8<--

diff --git a/datajoint/dependencies.py b/datajoint/dependencies.py index 6aa169b..2c0167f 100644 --- a/datajoint/dependencies.py +++ b/datajoint/dependencies.py @@ -8,7 +8,7 @@ class Dependencies(nx.DiGraph): """ The graph of dependencies (foreign keys) between loaded tables. """ - def __init__(self, connection): + def __init__(self, connection=None): self._conn = connection self._node_alias_count = itertools.count() super().__init__(self) @@ -95,7 +95,7 @@ class Dependencies(nx.DiGraph): return dict(p[1:3] for p in self.out_edges(table_name, data=True) if primary is None or p[2]['primary'] == primary) - def descendants(self, full_table_name): + def descendants_old(self, full_table_name): """ :param full_table_name: In form `schema`.`table_name` :return: all dependent tables sorted in topological order. Self is included. @@ -103,3 +103,18 @@ class Dependencies(nx.DiGraph): nodes = nx.algorithms.dag.descendants(self, full_table_name) sort = nx.algorithms.dag.topological_sort(self) return [full_table_name] + [s for s in sort if s in nodes] + + def descendants(self, full_table_name): + """ + :param full_table_name: In form `schema`.`table_name` + :return: all dependent tables sorted in topological order. Self is included. + """ + + # nodes = nx.DiGraph(self).subgraph( + # nx.algorithms.dag.descendants(self, full_table_name)) + + nodes = self.subgraph( + nx.algorithms.dag.descendants(self, full_table_name)) + + return [full_table_name] + list( + nx.algorithms.dag.topological_sort(nodes))

this implies that a similar modification to erd.py line 209 discussed in other comments could also improve performance; again, however, this requires making connection optional in the Dependencies code. The previous implementation running at ~250 iterations shouldn't be impacted by a similar change since it is not constructing copied instances within the main method body. As a side note, unit tests still pass with connnection an optional argument. Also, the resulting descendants list sequence (but not count or content) is slightly different in some cases between these two functions, likely a difference in similar-rank nodes resulting from node visitation being different between the two versions. This was not investigated further.

…illitate. Improve descendants performance by sorting the subgraph of nodes rather than sorting the entire graph and selecting a subgraph from it. As part of performance discovered that construction of intermediate graph objects hindered performance; removing this intermediate construction required modificastions to __init__, namely: Since nx's 'topological_sort' method does internal construction of the graph class with no arguments, 'Dependencies' class needed to be updated to allow connetion=None construction for this special case. As a side note: Similar uses of DiGraph constructor within erd.py might also now be removable

ixcat added 7 commits January 31, 2018 12:33

networkx-2: base_relation.py: flatten aliased foreign keys generator

1de2876

networkx-2: dependencies.py: rework descendants to use new nx.descend…

1b91bc5

…ants

networkx-2: erd.py: initial nx2 rework of _make_graph - has issues

5dd95e4

networkx-2: requirements.txt to use current release

b581890

networkx-2: pydotplus->pydot in nx2 requirements.txt

3c55d1b

networkx-2: dependencies.py: fix descendants topological sorting

170b8d5

networkx-2: erd.py: remove dev comment

1e7d69f

ixcat mentioned this pull request Jan 31, 2018

support networkx 2.0+ compatibility #362

Closed

eywalker reviewed Feb 1, 2018

View reviewed changes

ixcat commented Feb 1, 2018

View reviewed changes

dimitri-yatsenko approved these changes Feb 2, 2018

View reviewed changes

dimitri-yatsenko approved these changes Feb 7, 2018

View reviewed changes

eywalker approved these changes Feb 14, 2018

View reviewed changes

eywalker merged commit 07ffd36 into datajoint:master Feb 14, 2018

ixcat deleted the networkx-2 branch October 31, 2018 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networkx-2 support #443

networkx-2 support #443

ixcat commented Jan 31, 2018

coveralls commented Jan 31, 2018 •

edited

eywalker Feb 1, 2018 •

edited

ixcat Feb 1, 2018 •

edited

eywalker Feb 1, 2018

ixcat Feb 1, 2018

ixcat Feb 1, 2018 •

edited

eywalker Feb 2, 2018

ixcat Feb 2, 2018

dimitri-yatsenko Feb 4, 2018

ixcat Feb 5, 2018 •

edited

eywalker Feb 1, 2018

ixcat Feb 1, 2018 •

edited

ixcat Feb 1, 2018

ixcat Feb 1, 2018 •

edited

networkx-2 support #443

networkx-2 support #443

Conversation

ixcat commented Jan 31, 2018

coveralls commented Jan 31, 2018 • edited

eywalker Feb 1, 2018 • edited

Choose a reason for hiding this comment

ixcat Feb 1, 2018 • edited

Choose a reason for hiding this comment

eywalker Feb 1, 2018

Choose a reason for hiding this comment

ixcat Feb 1, 2018

Choose a reason for hiding this comment

ixcat Feb 1, 2018 • edited

Choose a reason for hiding this comment

eywalker Feb 2, 2018

Choose a reason for hiding this comment

ixcat Feb 2, 2018

Choose a reason for hiding this comment

dimitri-yatsenko Feb 4, 2018

Choose a reason for hiding this comment

ixcat Feb 5, 2018 • edited

Choose a reason for hiding this comment

eywalker Feb 1, 2018

Choose a reason for hiding this comment

ixcat Feb 1, 2018 • edited

Choose a reason for hiding this comment

ixcat Feb 1, 2018

Choose a reason for hiding this comment

ixcat Feb 1, 2018 • edited

Choose a reason for hiding this comment

coveralls commented Jan 31, 2018 •

edited

eywalker Feb 1, 2018 •

edited

ixcat Feb 1, 2018 •

edited

ixcat Feb 1, 2018 •

edited

ixcat Feb 5, 2018 •

edited

ixcat Feb 1, 2018 •

edited

ixcat Feb 1, 2018 •

edited