-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
networkx-2 support #443
networkx-2 support #443
Conversation
@@ -206,8 +206,8 @@ def _make_graph(self): | |||
nx.algorithms.boundary.node_boundary(nx.DiGraph(self).reverse(), self.nodes_to_show)) | |||
nodes = self.nodes_to_show.union(a for a in gaps if a.isdigit) | |||
# construct subgraph and rename nodes to class names | |||
graph = nx.DiGraph(self).subgraph(nodes) | |||
nx.set_node_attributes(graph, 'node_type', {n: _get_tier(n) for n in graph}) | |||
graph = nx.DiGraph(nx.DiGraph(self).subgraph(nodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for the redundancy in DiGraph
construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .subgraph
call now returns a subgraph 'view', which is immutable and cannot be relabeled later on in line 219's nx.relabel_nodes(graph, mapping, copy=False)
. I also tried doing a return nx.relabel_nodes(graph, mapping, copy=True)
in that line but it didn't work on 1st pass - from here I didn't really investigate the API change further - since copy=True imples making a copy and the line you are mentioning would also make a a copy I viewed these as equivalent and moved on.. That said, I'm not sure why we are calling nx.DiGraph.subgraph
specifically in the original case; self.subgraph
should ostensibly work I'd imagine..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm imagining that self.subgraph
should suffice. @dimitri-yatsenko do you recall why we have to do nx.Digraph(self).subgraph
? I have a feeling that this is legacy code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on the speedtest now - it seems the constructor is called internally and the __init___
signatures between the classes vary, though I haven't dug further just yet & theoretically calling via superclass would work around this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that networkx's digraph.py
calls H.__class__()
to create an instance copy within the subgraph
function - since this refers to Dependencies
which requires connection
as a parameter, the call fails. Making this an optional argument allows self.subgraph(...)
to work, however, this could potentially be undesirable w/r/t the rest of the code depending on opinions...
more related to this this in the descendants
performance discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should override subgraph
so connection
gets passed down - although it also makes sense to check if connection
is needed in the subgraphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking to audit for connection use and proceed or just take the performance hit -
Maintaining a random graph related function in-tree to avoid this issue doesn't seem architecturally correct since this class isn't really a 'graph subclass' in terms of 'providing additional graph functionality' (even though it is technically a subclass) ..
That said it isn't such a big deal to copy the function in, annotate the reasoning and sync when needed with upstream. Networkx is BSD licencsed btw so this wouldn't impact DJ.
@dimitri-yatsenko care to weigh in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everything works, then no need to pass the connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri-yatsenko apologies if unclear - so does this mean the additional patch to datajoint/dependencies.py
mentioned below can be added to this PR after the extra comments/code is cleaned?
networkx~=1.11 | ||
pydotplus | ||
networkx | ||
pydot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pydot
vs pydotplus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Networkx apparently flipped in 2.0; my memory from research a few months back was that pydot went unmaintained for a bit, pydotplus was a python 3 + added feature fork, subsequently pydot development picked up and supports python3 and pydotplus appears dormant.
1.11 docs:
Pydot
Import and export NetworkX graphs in Graphviz dot format using pydotplus.
Either this module or nx_agraph can be used to interface with graphviz.
See also
PyDotPlus
https://github.com/carlos-jenkins/pydotplus
2.1 docs:
Pydot
Import and export NetworkX graphs in Graphviz dot format using pydot.
Either this module or nx_agraph can be used to interface with graphviz.
See also
pydot
https://github.com/erocarrera/pydot
Graphviz
Thanks for pointing it out - this reminds me I should double check rendering on Windows; my recollection from the initial testing of pydotplus to avoid the graphviz issues is that both seem to work but I'm not 100% on this.
datajoint/dependencies.py
Outdated
@@ -101,4 +101,5 @@ def descendants(self, full_table_name): | |||
:return: all dependent tables sorted in topological order. Self is included. | |||
""" | |||
nodes = nx.algorithms.dag.descendants(self, full_table_name) | |||
return [full_table_name] + nx.algorithms.dag.topological_sort(self, nodes) | |||
sort = nx.algorithms.dag.topological_sort(self) | |||
return [full_table_name] + [s for s in sort if s in nodes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternate/better performing implementation suggestion @eywalker:
create a subgraph from full_table_name and descendants and then sort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a test of calling descendants
10000 times for every class within the graph of 41 classes discussed elsewhere, I was able to get the following results locally:
- initial code above: ~250 iterations of the subclasses-for-each-classes per second
- Commented code in patch below: ~125 iterations/sec
- Live code in patch below: ~450 iterations/sec, however this requires making
connection
optional in theDependencies
constructor
can provide the test code elsewhere if that might be useful.
Patch:
--8<--
diff --git a/datajoint/dependencies.py b/datajoint/dependencies.py
index 6aa169b..2c0167f 100644
--- a/datajoint/dependencies.py
+++ b/datajoint/dependencies.py
@@ -8,7 +8,7 @@ class Dependencies(nx.DiGraph):
"""
The graph of dependencies (foreign keys) between loaded tables.
"""
- def __init__(self, connection):
+ def __init__(self, connection=None):
self._conn = connection
self._node_alias_count = itertools.count()
super().__init__(self)
@@ -95,7 +95,7 @@ class Dependencies(nx.DiGraph):
return dict(p[1:3] for p in self.out_edges(table_name, data=True)
if primary is None or p[2]['primary'] == primary)
- def descendants(self, full_table_name):
+ def descendants_old(self, full_table_name):
"""
:param full_table_name: In form `schema`.`table_name`
:return: all dependent tables sorted in topological order. Self is included.
@@ -103,3 +103,18 @@ class Dependencies(nx.DiGraph):
nodes = nx.algorithms.dag.descendants(self, full_table_name)
sort = nx.algorithms.dag.topological_sort(self)
return [full_table_name] + [s for s in sort if s in nodes]
+
+ def descendants(self, full_table_name):
+ """
+ :param full_table_name: In form `schema`.`table_name`
+ :return: all dependent tables sorted in topological order. Self is included.
+ """
+
+ # nodes = nx.DiGraph(self).subgraph(
+ # nx.algorithms.dag.descendants(self, full_table_name))
+
+ nodes = self.subgraph(
+ nx.algorithms.dag.descendants(self, full_table_name))
+
+ return [full_table_name] + list(
+ nx.algorithms.dag.topological_sort(nodes))
this implies that a similar modification to erd.py
line 209 discussed in other comments could also improve performance; again, however, this requires making connection
optional in the Dependencies
code. The previous implementation running at ~250 iterations shouldn't be impacted by a similar change since it is not constructing copied instances within the main method body. As a side note, unit tests still pass with connnection
an optional argument. Also, the resulting descendants list sequence (but not count or content) is slightly different in some cases between these two functions, likely a difference in similar-rank nodes resulting from node visitation being different between the two versions. This was not investigated further.
…illitate. Improve descendants performance by sorting the subgraph of nodes rather than sorting the entire graph and selecting a subgraph from it. As part of performance discovered that construction of intermediate graph objects hindered performance; removing this intermediate construction required modificastions to __init__, namely: Since nx's 'topological_sort' method does internal construction of the graph class with no arguments, 'Dependencies' class needed to be updated to allow connetion=None construction for this special case. As a side note: Similar uses of DiGraph constructor within erd.py might also now be removable
Fixes for networkx version>2
Not heavily tested - unit tests and manual ERD draw for simple single schema set of relations OK.