Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networkx-2 support #443

Merged
merged 8 commits into from
Feb 14, 2018
Merged

networkx-2 support #443

merged 8 commits into from
Feb 14, 2018

Conversation

ixcat
Copy link
Contributor

@ixcat ixcat commented Jan 31, 2018

Fixes for networkx version>2

Not heavily tested - unit tests and manual ERD draw for simple single schema set of relations OK.

@coveralls
Copy link

coveralls commented Jan 31, 2018

Coverage Status

Coverage decreased (-0.04%) to 89.315% when pulling ccd1dc4 on ixcat:networkx-2 into a577bb5 on datajoint:master.

@@ -206,8 +206,8 @@ def _make_graph(self):
nx.algorithms.boundary.node_boundary(nx.DiGraph(self).reverse(), self.nodes_to_show))
nodes = self.nodes_to_show.union(a for a in gaps if a.isdigit)
# construct subgraph and rename nodes to class names
graph = nx.DiGraph(self).subgraph(nodes)
nx.set_node_attributes(graph, 'node_type', {n: _get_tier(n) for n in graph})
graph = nx.DiGraph(nx.DiGraph(self).subgraph(nodes))
Copy link
Contributor

@eywalker eywalker Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for the redundancy in DiGraph construction?

Copy link
Contributor Author

@ixcat ixcat Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .subgraph call now returns a subgraph 'view', which is immutable and cannot be relabeled later on in line 219's nx.relabel_nodes(graph, mapping, copy=False). I also tried doing a return nx.relabel_nodes(graph, mapping, copy=True) in that line but it didn't work on 1st pass - from here I didn't really investigate the API change further - since copy=True imples making a copy and the line you are mentioning would also make a a copy I viewed these as equivalent and moved on.. That said, I'm not sure why we are calling nx.DiGraph.subgraph specifically in the original case; self.subgraph should ostensibly work I'd imagine..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm imagining that self.subgraph should suffice. @dimitri-yatsenko do you recall why we have to do nx.Digraph(self).subgraph? I have a feeling that this is legacy code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on the speedtest now - it seems the constructor is called internally and the __init___ signatures between the classes vary, though I haven't dug further just yet & theoretically calling via superclass would work around this..

Copy link
Contributor Author

@ixcat ixcat Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that networkx's digraph.py calls H.__class__() to create an instance copy within the subgraph function - since this refers to Dependencies which requires connection as a parameter, the call fails. Making this an optional argument allows self.subgraph(...) to work, however, this could potentially be undesirable w/r/t the rest of the code depending on opinions...

more related to this this in the descendants performance discussion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should override subgraph so connection gets passed down - although it also makes sense to check if connection is needed in the subgraphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking to audit for connection use and proceed or just take the performance hit -

Maintaining a random graph related function in-tree to avoid this issue doesn't seem architecturally correct since this class isn't really a 'graph subclass' in terms of 'providing additional graph functionality' (even though it is technically a subclass) ..

That said it isn't such a big deal to copy the function in, annotate the reasoning and sync when needed with upstream. Networkx is BSD licencsed btw so this wouldn't impact DJ.

@dimitri-yatsenko care to weigh in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything works, then no need to pass the connection

Copy link
Contributor Author

@ixcat ixcat Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri-yatsenko apologies if unclear - so does this mean the additional patch to datajoint/dependencies.py mentioned below can be added to this PR after the extra comments/code is cleaned?

networkx~=1.11
pydotplus
networkx
pydot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pydot vs pydotplus?

Copy link
Contributor Author

@ixcat ixcat Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Networkx apparently flipped in 2.0; my memory from research a few months back was that pydot went unmaintained for a bit, pydotplus was a python 3 + added feature fork, subsequently pydot development picked up and supports python3 and pydotplus appears dormant.

1.11 docs:

https://networkx.github.io/documentation/networkx-1.11/reference/drawing.html#module-networkx.drawing.nx_pydot

Pydot

Import and export NetworkX graphs in Graphviz dot format using pydotplus.

Either this module or nx_agraph can be used to interface with graphviz.

See also

PyDotPlus
    https://github.com/carlos-jenkins/pydotplus

2.1 docs:

https://networkx.github.io/documentation/stable/reference/drawing.html#module-networkx.drawing.nx_pydot

Pydot

Import and export NetworkX graphs in Graphviz dot format using pydot.

Either this module or nx_agraph can be used to interface with graphviz.

See also

pydot
    https://github.com/erocarrera/pydot
Graphviz

Thanks for pointing it out - this reminds me I should double check rendering on Windows; my recollection from the initial testing of pydotplus to avoid the graphviz issues is that both seem to work but I'm not 100% on this.

@@ -101,4 +101,5 @@ def descendants(self, full_table_name):
:return: all dependent tables sorted in topological order. Self is included.
"""
nodes = nx.algorithms.dag.descendants(self, full_table_name)
return [full_table_name] + nx.algorithms.dag.topological_sort(self, nodes)
sort = nx.algorithms.dag.topological_sort(self)
return [full_table_name] + [s for s in sort if s in nodes]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternate/better performing implementation suggestion @eywalker:
create a subgraph from full_table_name and descendants and then sort

Copy link
Contributor Author

@ixcat ixcat Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a test of calling descendants 10000 times for every class within the graph of 41 classes discussed elsewhere, I was able to get the following results locally:

  1. initial code above: ~250 iterations of the subclasses-for-each-classes per second
  2. Commented code in patch below: ~125 iterations/sec
  3. Live code in patch below: ~450 iterations/sec, however this requires making connection optional in the Dependencies constructor

can provide the test code elsewhere if that might be useful.

Patch:
--8<--

diff --git a/datajoint/dependencies.py b/datajoint/dependencies.py
index 6aa169b..2c0167f 100644
--- a/datajoint/dependencies.py
+++ b/datajoint/dependencies.py
@@ -8,7 +8,7 @@ class Dependencies(nx.DiGraph):
     """
     The graph of dependencies (foreign keys) between loaded tables.
     """
-    def __init__(self, connection):
+    def __init__(self, connection=None):
         self._conn = connection
         self._node_alias_count = itertools.count()
         super().__init__(self)
@@ -95,7 +95,7 @@ class Dependencies(nx.DiGraph):
         return dict(p[1:3] for p in self.out_edges(table_name, data=True)
                     if primary is None or p[2]['primary'] == primary)
 
-    def descendants(self, full_table_name):
+    def descendants_old(self, full_table_name):
         """
         :param full_table_name:  In form `schema`.`table_name`
         :return: all dependent tables sorted in topological order.  Self is included.
@@ -103,3 +103,18 @@ class Dependencies(nx.DiGraph):
         nodes = nx.algorithms.dag.descendants(self, full_table_name)
         sort = nx.algorithms.dag.topological_sort(self)
         return [full_table_name] + [s for s in sort if s in nodes]
+
+    def descendants(self, full_table_name):
+        """
+        :param full_table_name:  In form `schema`.`table_name`
+        :return: all dependent tables sorted in topological order.  Self is included.
+        """
+
+        # nodes = nx.DiGraph(self).subgraph(
+        #     nx.algorithms.dag.descendants(self, full_table_name))
+
+        nodes = self.subgraph(
+            nx.algorithms.dag.descendants(self, full_table_name))
+
+        return [full_table_name] + list(
+            nx.algorithms.dag.topological_sort(nodes))

this implies that a similar modification to erd.py line 209 discussed in other comments could also improve performance; again, however, this requires making connection optional in the Dependencies code. The previous implementation running at ~250 iterations shouldn't be impacted by a similar change since it is not constructing copied instances within the main method body. As a side note, unit tests still pass with connnection an optional argument. Also, the resulting descendants list sequence (but not count or content) is slightly different in some cases between these two functions, likely a difference in similar-rank nodes resulting from node visitation being different between the two versions. This was not investigated further.

…illitate.

Improve descendants performance by sorting the subgraph of nodes rather than
sorting the entire graph and selecting a subgraph from it.

As part of performance discovered that construction of intermediate graph
objects hindered performance; removing this intermediate construction required
modificastions to __init__, namely:

Since nx's 'topological_sort' method does internal construction of the graph
class with no arguments, 'Dependencies' class needed to be updated to allow
connetion=None construction for this special case.

As a side note: Similar uses of DiGraph constructor within erd.py might also now
be removable
@eywalker eywalker merged commit 07ffd36 into datajoint:master Feb 14, 2018
@ixcat ixcat deleted the networkx-2 branch October 31, 2018 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants