Implement biparition_graph_mst and bipartition_tree functions #572

InnovativeInventor · 2022-03-19T04:09:26Z

Note: this is a followup from email correspondence that @mtreinish and I had a few months ago. This is a draft PR, so feedback is welcome and it's not ready for merging (yet). It's fairly non-performant (but enough to beat Python/networkx solidly).

This function takes in a graph with population assigned to each node and draws a minimum spanning tree and finds a cut edge that splits the tree into two partitions that have total populations within some epsilon.

To explain the motivation behind this PR, this function is the main workhorse behind the ReCom algorithm detailed in this paper, which has been used in many litigation and civil rights projects to challenge racial and partisan gerrymandered maps and determine VRA compliance (see the recent cases in Alabama, North Carolina, Pennsylvania, etc.). I'm working on a rewrite of MGGG's main gerrymandering analysis software/engine (GerryChain, written in Python) and have achieved a ~15x speedup by naively rewriting the core graph operations in retworkx.

You can still see my messy debugging, but the rough idea is here.

mtreinish

I know this is still a WIP and there is still some work to go on it, but I took a quick look and there are some easy optimizations you can make to improve the performance I commented on inline. I'll wait till it's closer to ready to do a more detailed review.

src/tree.rs

coveralls · 2022-05-17T21:59:38Z

Pull Request Test Coverage Report for Build 2402224991

94 of 96 (97.92%) changed or added relevant lines in 2 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.002%) to 97.159%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/tree.rs	92	94	97.87%

Files with Coverage Reduction	New Missed Lines	%
src/shortest_path/all_pairs_dijkstra.rs	1	98.54%

Totals
Change from base Build 2373810364:	-0.002%
Covered Lines:	12277
Relevant Lines:	12636

💛 - Coveralls

InnovativeInventor · 2022-05-18T02:02:33Z

@mtreinish I just gave this a rebase and cleaned up the code a bit -- should be ready for your review. Let me know what you think!

IvanIsCoding

Thanks for contributing this, we're glad retworkx was useful for your research!

Some things are missing before we are able to merge the PR, an action list would be:

Write tests for the functions you added in tests/
Add a release note announcing the new method using reno new bipartition_tree
Add an entry of the new function on docs/source/api.rst

You can find more details of the steps above in CONTRIBUTING.md. Feel free to ping me or Matthew if you need help

InnovativeInventor · 2022-05-25T13:59:06Z

@IvanIsCoding @mtreinish Ok, I've added tests, made a release note with reno, and added a new entry of the function in the docs. Is there anything else you'd like me to do?

IvanIsCoding

Left some minor comments but overall the code is looking good

releasenotes/notes/bipartition_tree-4c1ad080b1fab9e8.yaml

tests/graph/test_bipartition.py

mtreinish

Overall this LGTM, I'll have to read the paper in more depth to check the implementation matches what's expected but I trust you did that correctly if I don't have to time to check that. :) I just have a few more mechanical questions and suggestion inline but I think this is getting close. Thanks for sticking with this and continuing to push it forward.

src/tree.rs

InnovativeInventor · 2022-05-27T23:11:14Z

@mtreinish @IvanIsCoding I think I addressed all of the feedback you two gave (thanks, by the way!). Let me know if I should re-open any of the comments or address any other concerns/feedback you have.

IvanIsCoding

LGTM

src/tree.rs

georgios-ts · 2022-05-28T10:17:15Z

src/tree.rs

+
+    while balanced_nodes.is_empty() {
+        mst.graph.clear_edges();
+        _minimum_spanning_tree(py, graph, &mut mst, Some(weight_fn.clone()), 1.0)?;


I don't get why we need to recalculate mst every time in the while loop (or even why different loops we'll give different results) since the values of all arguments stays the same.

We recalculate mst every time in the while loop because bipartition_graph_mst is a sampling algorithm and reusing mst when _bipartition_tree fails would change the distribution that we're sampling from.

To be clear: weight_fn is intended to sample and be non-deterministic (e.g. uniformly sampling between the range [0, 1]).

We might need to rename weight_fn to something else because on every other function weight_fn takes an edge and returns a weight. Which is not the case here

I'm not sure what you mean by that. This should be the same weight_fn that minimum_spanning_tree takes (that is, it takes in an edge and return a random weight). E.g. weight_fn can be: lambda _: random.random() (i.e. uniformly picks between [0,1]). However, there are other variants of the ReCom Markov Chain sampling algorithm that require certain edges to be prioritized/deprioritized like so: lambda x: random.random() if edge_crosses_county(x) else random.random() + 0.5. This is used to create redistricting plan sampling methods that respect county or city boundaries.

I totally agree with Ivan. This should be at least documented and given a different name since weight_fn is used in other places in retworkx with a different meaning and might be a source of confusion for users.

georgios-ts · 2022-05-28T10:19:55Z

src/tree.rs

+///     two partitioned subtrees and the set of nodes making up that subtree.
+#[pyfunction]
+#[pyo3(text_signature = "(graph, weight_fn, pop, target_pop, epsilon)")]
+pub fn bipartition_graph_mst(


IMO this function can be omitted from our API since it's just calling minimum_spanning_tree and bipartion_tree but we already provide these functions as part of our API.

I disagree. This allows reuse of the spanning tree object if there are no balanced edges detected and reduces the memory allocs (i.e. the amount of times the graph is cloned to create a new spanning tree object).

That's a fair point but I think we can avoid compromising performance and still omit this function with a different design here. We can implement a new pyclass SpanningTreeSampler that generates random spanning trees and reuse internally the same memory for stroring the MST and avoid unnecessary allocations. Users can then call:

sampler = retworkx.SpanningTreeSampler(graph) balanced_nodes = [] while not balanced_nodes: tree = sampler.sample() balanced_nodes = retworkx.bipartition_tree(tree, ..)

to replicate the output of bipartition_graph_mst (and it'll be marginally slower).

My main motivation for the above design is two-fold:

Sampling a spanning tree is an interesting problem on its own with different algorithms in the literature (which we can implement at a later point) and more users will benefit from it.

The output of bipartition_graph_mst feels a bit artificial to me, e.g cutting an output edge will not necessarily cut the input graph into two connected components but it'll only cut the tree that we randomly drew.

src/tree.rs

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

This reverts commit 8e4ffdf.

This reverts commit 53842ef.

Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>

… queue traversal

InnovativeInventor · 2022-05-28T19:38:40Z

FYI, I just rebased this off main since this PR appeared to be out of date from main.

src/tree.rs

IvanIsCoding · 2022-05-29T20:28:24Z

src/tree.rs

+/// :param weight_fn: A callable object (function, lambda, etc) which
+///     will be passed the edge object and expected to return a ``float``. See
+///     :func:`~minimum_spanning_tree` for details.


This might be me bing picky, but I'd rename to something along the lines of weight_sample_fn and describe it as you described to me in the previous comment. It would make it clearer to the users and maintainers that the intended is non deterministic.

I'm saying so because in general, we expect weight_fn to always return the same result given the same edge. Moreover, wherever possible we also try to cache the calls to weight_fn to avoid calling Python from Rust more than necessary.

georgios-ts · 2022-05-30T10:23:11Z

releasenotes/notes/bipartition_tree-4c1ad080b1fab9e8.yaml

+---
+features:
+  - |
+    Added a new function :func:`~.bipartition_tree` that takes in spanning tree


Suggested change

Added a new function :func:`~.bipartition_tree` that takes in spanning tree

Added a new function :func:`~.bipartition_tree` that takes in a tree

georgios-ts · 2022-05-30T10:23:47Z

releasenotes/notes/bipartition_tree-4c1ad080b1fab9e8.yaml

+    Added a new function :func:`~.bipartition_tree` that takes in spanning tree
+    and a list of populations assigned to each node in the tree and finds all
+    balanced edges, if they exist. A balanced edge is defined as an edge that,
+    when cut, will split the population of the tree into two connected subtrees


Suggested change

when cut, will split the population of the tree into two connected subtrees

when cut, will split the tree into two connected subtrees

georgios-ts · 2022-05-30T10:29:16Z

src/tree.rs

+/// Bipartition tree by finding balanced cut edges of a spanning tree using
+/// node contraction. Assumes that the tree is connected and is a spanning tree.


Suggested change

/// Bipartition tree by finding balanced cut edges of a spanning tree using

/// node contraction. Assumes that the tree is connected and is a spanning tree.

/// Find all balanced cut edges of a tree.

georgios-ts · 2022-05-30T10:29:21Z

src/tree.rs

+/// population of the tree into two connected subtrees that have population near
+/// the population target within some epsilon. The function returns a list of
+/// all such possible cuts, represented as the set of nodes in one
+/// partition/subtree. Wraps around ``_bipartition_tree``.


Suggested change

/// partition/subtree. Wraps around ``_bipartition_tree``.

/// partition/subtree.

georgios-ts · 2022-05-30T10:30:05Z

src/tree.rs

+/// all such possible cuts, represented as the set of nodes in one
+/// partition/subtree. Wraps around ``_bipartition_tree``.
+///
+/// :param PyGraph graph: Spanning tree. Must be fully connected


Suggested change

/// :param PyGraph graph: Spanning tree. Must be fully connected

/// :param PyGraph tree: The input tree.

georgios-ts · 2022-05-30T10:51:53Z

src/tree.rs

+        } else {
+            // Not a leaf yet
+            continue;
+        }


This is not really needed since we'll continue in the iteration anyway.

Suggested change

} else {

// Not a leaf yet

continue;

}

} // else node is not a leaf yet

georgios-ts · 2022-05-30T10:53:25Z

src/tree.rs

+///     balanced edge that can be cut. The tuple contains the root of one of the
+///     two partitioned subtrees and the set of nodes making up that subtree.
+#[pyfunction]
+#[pyo3(text_signature = "(graph, weight_fn, pops, target_pop, epsilon)")]


Suggested change

#[pyo3(text_signature = "(graph, weight_fn, pops, target_pop, epsilon)")]

#[pyo3(text_signature = "(graph, weight_fn, pops, pop_target, epsilon)")]

georgios-ts · 2022-05-30T10:59:40Z

src/tree.rs

+}
+
+/// Internal _bipartition_tree implementation.
+fn _bipartition_tree(


We don't really need to put the code in an internal implementation. We can just define the public bipartition_tree.

georgios-ts · 2022-05-30T11:08:26Z

src/tree.rs

+            pops[neighbor.index()] += pop;
+
+            // Check if balanced; mark as seen
+            if pop >= pop_target * (1.0 - epsilon) && pop <= pop_target * (1.0 + epsilon) {


Reading the docs where a balanced edge is defined and the linked paper I'd guess that pop_target should be equal to the total sum of the population divided by 2. Is there any reason why we allow users to define a different value of pop_target? I'm asking since depending of the value of pop_target this check might fail for the other part of the partition.

georgios-ts · 2022-05-30T12:52:23Z

src/tree.rs

+
+    while balanced_nodes.is_empty() {
+        mst.graph.clear_edges();
+        _minimum_spanning_tree(py, graph, &mut mst, Some(weight_fn.clone()), 1.0)?;


I totally agree with Ivan. This should be at least documented and given a different name since weight_fn is used in other places in retworkx with a different meaning and might be a source of confusion for users.

See this PR: Qiskit/rustworkx#572 We thank the rustworkx reviewers for their suggestions. Co-authored-by: Max Fan <root@max.fan>

InnovativeInventor marked this pull request as draft March 19, 2022 04:10

mtreinish reviewed Mar 24, 2022

View reviewed changes

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 9a821f6 to 46d0950 Compare May 17, 2022 20:42

InnovativeInventor marked this pull request as ready for review May 18, 2022 01:49

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 1d70fab to a33a77c Compare May 18, 2022 01:50

IvanIsCoding reviewed May 22, 2022

View reviewed changes

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 27010c9 to f2fc8b4 Compare May 23, 2022 16:11

InnovativeInventor changed the title ~~Implement biparition_tree function~~ Implement biparition_graph function May 23, 2022

InnovativeInventor force-pushed the feat-balanced-cut-edges branch 2 times, most recently from 11f9b9a to c777080 Compare May 24, 2022 22:02

InnovativeInventor changed the title ~~Implement biparition_graph function~~ Implement biparition_graph and bipartition_tree functions May 24, 2022

InnovativeInventor requested a review from IvanIsCoding May 25, 2022 13:57

IvanIsCoding reviewed May 27, 2022

View reviewed changes

mtreinish reviewed May 27, 2022

View reviewed changes

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

mtreinish reviewed May 27, 2022

View reviewed changes

src/tree.rs Outdated Show resolved Hide resolved

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 877ba74 to 889dff5 Compare May 27, 2022 17:29

InnovativeInventor changed the title ~~Implement biparition_graph and bipartition_tree functions~~ Implement biparition_graph_mst and bipartition_tree functions May 27, 2022

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 4cbcf84 to a025a07 Compare May 27, 2022 23:04

IvanIsCoding approved these changes May 28, 2022

View reviewed changes

georgios-ts requested changes May 28, 2022

View reviewed changes

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 6b811ca to 1edfa62 Compare May 28, 2022 19:31

InnovativeInventor added 5 commits May 28, 2022 15:32

Draft bipartition_tree implementation

1cdf76a

Working bipartition tree impl

20281b0

Release GIL during most of balanced_edge finding code

f85eb52

Ensure that unused vars get gc'ed on each loop

79b3d94

Lint with cargo fmt

d5d464b

InnovativeInventor and others added 20 commits May 28, 2022 15:32

Fix end with blank line linting issue

da1e8a7

Add bipartition tests

99e5257

Fix indent issues in retworkx bipartition docstrings

db87924

Switch to using hashbrown's HashSet impl

f6d807c

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

Make tests deterministic

46404f3

Reorder imports as per cargo fmt

1ea5cd1

Wrap in rst Python code block

79b5e17

Switch to passing by value for mst

4e00b9b

Make test name more accurate

5ba7cc9

Handle holes in graph node indices

3685720

Revert "Switch to passing by value for mst"

8e513f7

This reverts commit 8e4ffdf.

Remove return reference in _minimum_spanning_tree helper

c1f73a2

Rename bipartition_graph to bipartition_graph_mst

f59f7c2

Create _bipartition_tree internal func

4088b76

Use numpy PyReadonlyArray to avoid one, unnecessary copy

663d54e

Revert "Use numpy PyReadonlyArray to avoid one, unnecessary copy"

a8f2305

This reverts commit 53842ef.

Update pyo3 text_signature to reflect args

dd42700

Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>

Apply suggestions from @georgois-ts

7c131a1

Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com>

Update pyo3 text_signature to reflect Rust args

aa24d61

Switch to using LinkedList for cheaper appends and remove unnecessary…

e7fdffe

… queue traversal

InnovativeInventor force-pushed the feat-balanced-cut-edges branch from 1edfa62 to e7fdffe Compare May 28, 2022 19:32

IvanIsCoding reviewed May 28, 2022

View reviewed changes

src/tree.rs Outdated Show resolved Hide resolved

src/tree.rs Outdated Show resolved Hide resolved

Remove LinkedList use

4dc3ba7

IvanIsCoding reviewed May 29, 2022

View reviewed changes

georgios-ts requested changes May 30, 2022

View reviewed changes

IvanIsCoding added 3 commits August 1, 2022 15:38

Merge remote-tracking branch 'origin/main' into feat-balanced-cut-edges

90df5ac

Merge remote-tracking branch 'origin/main' into feat-balanced-cut-edges

5fd550a

Move test file to rustworkx tests

930069a

pjrule added a commit to mggg/gerrychain.rs that referenced this pull request Apr 29, 2023

Add ReCom MST functions (formerly intended for upstream)

e00e5b8

See this PR: Qiskit/rustworkx#572 We thank the rustworkx reviewers for their suggestions. Co-authored-by: Max Fan <root@max.fan>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement biparition_graph_mst and bipartition_tree functions #572

Implement biparition_graph_mst and bipartition_tree functions #572

InnovativeInventor commented Mar 19, 2022

mtreinish left a comment

coveralls commented May 17, 2022 •

edited

Loading

InnovativeInventor commented May 18, 2022 •

edited

Loading

IvanIsCoding left a comment

InnovativeInventor commented May 25, 2022

IvanIsCoding left a comment

mtreinish left a comment •

edited

Loading

InnovativeInventor commented May 27, 2022

IvanIsCoding left a comment

georgios-ts May 28, 2022

InnovativeInventor May 28, 2022

InnovativeInventor May 28, 2022 •

edited

Loading

IvanIsCoding May 29, 2022

InnovativeInventor May 29, 2022

georgios-ts May 30, 2022

georgios-ts May 28, 2022

InnovativeInventor May 28, 2022

georgios-ts May 30, 2022

InnovativeInventor commented May 28, 2022

IvanIsCoding May 29, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

georgios-ts May 30, 2022

	Added a new function :func:`~.bipartition_tree` that takes in spanning tree
	Added a new function :func:`~.bipartition_tree` that takes in a tree

	when cut, will split the population of the tree into two connected subtrees
	when cut, will split the tree into two connected subtrees

		/// Bipartition tree by finding balanced cut edges of a spanning tree using
		/// node contraction. Assumes that the tree is connected and is a spanning tree.

	/// Bipartition tree by finding balanced cut edges of a spanning tree using
	/// node contraction. Assumes that the tree is connected and is a spanning tree.
	/// Find all balanced cut edges of a tree.

	/// partition/subtree. Wraps around ``_bipartition_tree``.
	/// partition/subtree.

	/// :param PyGraph graph: Spanning tree. Must be fully connected
	/// :param PyGraph tree: The input tree.

	#[pyo3(text_signature = "(graph, weight_fn, pops, target_pop, epsilon)")]
	#[pyo3(text_signature = "(graph, weight_fn, pops, pop_target, epsilon)")]

Implement biparition_graph_mst and bipartition_tree functions #572

Are you sure you want to change the base?

Implement biparition_graph_mst and bipartition_tree functions #572

Conversation

InnovativeInventor commented Mar 19, 2022

mtreinish left a comment

Choose a reason for hiding this comment

coveralls commented May 17, 2022 • edited Loading

Pull Request Test Coverage Report for Build 2402224991

💛 - Coveralls

InnovativeInventor commented May 18, 2022 • edited Loading

IvanIsCoding left a comment

Choose a reason for hiding this comment

InnovativeInventor commented May 25, 2022

IvanIsCoding left a comment

Choose a reason for hiding this comment

mtreinish left a comment • edited Loading

Choose a reason for hiding this comment

InnovativeInventor commented May 27, 2022

IvanIsCoding left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InnovativeInventor May 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

InnovativeInventor commented May 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 17, 2022 •

edited

Loading

InnovativeInventor commented May 18, 2022 •

edited

Loading

mtreinish left a comment •

edited

Loading

InnovativeInventor May 28, 2022 •

edited

Loading