Feature/graph analysis identification #28

Croydon-Brixton · 2021-02-24T19:18:17Z

Still WIP, but I'm creating a draft PR so that people can already make early comments and check out the work for their branches.

More detailed info & finalized PR coming soon.
This addresses #25

Note:
There's lots of files because I also pushed the test-cases for testing node identification (the .gpkg files) - simply ignore these.
Note also that the branch contains merges of @herbiebradley latest code on feature/graph-analysis-habitat for fixing the rtree. You can also ignore those.

…/gtc-biodiversity into feature/graph-analysis-identification Accepted all incoming changes. Conflicts: src/models/geograph.py

…diversity into feature/graph-analysis-identification Accepted all changes from master. Fixed two linelength problems with pylint. Conflicts: src/models/geograph.py

…/gtc-biodiversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py

review-notebook-app · 2021-02-24T19:18:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…diversity into feature/graph-analysis-identification

…diversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py

…ndas datastrcuture Warning: Current implementation is not yet optimized for performance and currently slower than previous method (~10x).

Note: old functions kept for now for sanity checks

Croydon-Brixton · 2021-03-01T17:31:12Z

This PR consists of the identify_nodes function as well as a couple of test cases - closes #25

We identify nodes based on the following tactic:

For each node get all candidates via an RTree query
Remove all nodes which have a different class label (only labels with same class will be identified with each other)
Compare the geometries of the nodes (3 different modes possible, which are successively more stringent: corner, edge, interior. corner identifies all nodes that have at least a 0D overlap (point, line, area), edge identifies if there's at least a 1D overlap (line or area), interior identifies if there's at least a 2D overlap)

The node and graph merging and stacking is currently not implemented yet. That will come on a new branch and leverage the identification. I thought it might be useful to have the identification and some test data already now on the main graph-analysis branch - hence this PR.

The relevant files are:

src/models/binary_graph_operations.py and src/models/polygon_utils.py contain the main functionality (together with the legacy functions for now - can remove them though).
notebooks: include sample test cases and timings (indices 8, 9, 10). Best to watch these on ReviewNB
src/tests includes two scripts create_test_data (which creates 15 binary test cases of tiny landscapes (c.f. below) and test_utils which are used to plot node identification examples in the notebooks.

Other contents of this PR:

15 binary test cases of tiny landscapes with different nodes and class labels for doing tests of the node identification (let me know if you think I should exclude them from the PR)
Minor fixes of linting
Adding some defaults
Named our index (just for fun 😄 )

Extra notes:
Note 1: Geopandas access
Takeaway: Do not use .iloc or .loc indexing if you need performance. They can be 10-100x slower than directly accessing the underlying numpy array. This stackexchange issue reports the same observation.

More info here: #34 (comment)

Croydon-Brixton · 2021-03-02T09:16:35Z

src/models/binary_graph_operations.py

+    # Filter candidates according to the same class label
+    # fmt: off
+    candidate_ids = candidate_ids[
+        other_graph._class_label(candidate_ids) == label  # pylint: disable=protected-access


Feature:
Just noticed this now. It'd be useful to also allow identifying nodes with other (custom) classes where we determine the class-to-class mapping. I will add this as an extra small feature.

rdnfn

Looks great to me! Just added some very minor comments, some just to start a general discussion about unifying our notation & style, so those definitely don't need to be fully addressed for this to be merged. Happy for it to be merged as is.

In general excellent to see the progress here, excited for how this work can be used in the GeoGraphTimelineclass and then consequently in our visualisation 🚀

EDIT: github shows the .gpkg files as empty, are they actually?

src/models/binary_graph_operations.py

src/models/geograph.py

src/models/polygon_utils.py

src/tests/utils.py

src/models/binary_graph_operations.py

src/models/geograph.py

doc: fix typos Co-authored-by: rdnfn <75615911+rdnfn@users.noreply.github.com>

herbiebradley · 2021-03-03T11:13:25Z

src/models/binary_graph_operations.py

+            other_graph._geometry(candidate_ids),  # pylint: disable=protected-access
+        )
+    ]
+


Intuitively it should be slightly faster to combine these two filters into one, right?

Yes, I also think that if numpy does short-circuit evaluation on these things this should be faster. @herbiebradley I don't know if it does, do you?

Concretely: Do you know how numpy handles these type of cases?
Case select_from_array[np.logical_or(condition_array1, condition_array2)]
Does it first evaluate both condition_array1 and condition_array2 in the slice [ ... ] and then or the conditions (in which case it'd probably be slower bc we would calculate the geometry overlaps for shapes which won't agree in class label).
Or does it calculate the first element of condition_array1 and then short-circuit decide if that element of condition_array2 even needs to be calculated? (in which case I think it should be slightly faster)

I'm not sure about np.logical_or but if you use base python or and a generator to select then it should short-circuit and may be faster - there are some interesting timing results here on selecting from boolean arrays: https://stackoverflow.com/questions/58422690/filtering-a-numpy-array-what-is-the-best-approach

src/models/polygon_utils.py

herbiebradley

Looks good, I added all the comments I could think of ;)

…imports Note: This addresses #28 (comment)

src/models/polygon_utils.py

src/models/geograph.py

herbiebradley

I fixed the indexing issue discussed on Slack, and added a minor performance improvement to add_habitat, chmod 664 to all saved files, and some minor bugfixes. Looks good to merge now once the return type of identify_nodes is resolved.

Croydon-Brixton added 14 commits February 19, 2021 18:13

Merge branch 'feature/graph-analysis-loading' of github.com:ai4er-cdt…

449c44f

…/gtc-biodiversity into feature/graph-analysis-identification Accepted all incoming changes. Conflicts: src/models/geograph.py

exp: explore node identification and graph

0e51cd6

feature: add concenience func to plot landcover

2f4f78c

Merge branch 'feature/graph-analysis' of github.com:ai4er-cdt/gtc-bio…

1b757ea

…diversity into feature/graph-analysis-identification Accepted all changes from master. Fixed two linelength problems with pylint. Conflicts: src/models/geograph.py

fix: add default values for polygonize

03429a4

Merge branch 'feature/graph-analysis-habitat' of github.com:ai4er-cdt…

b4a7f4b

…/gtc-biodiversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py

feature: add test data for node identification

a376541

feature: add utils for creating GeoGraph test data

29d0340

feature: add polygon utils for overlap computations

ec51cc8

feature: add node identification functionality

37a5f9a

refactor: remove deprecated graph_tools

cdf5e10

feature: add node identification method to GeoGraph

d905ebd

feature: add test data for node identification

8f40d9e

exp: add node identification demo and dev notebook

75081b5

Croydon-Brixton changed the base branch from master to feature/graph-analysis February 24, 2021 19:19

Croydon-Brixton linked an issue Feb 24, 2021 that may be closed by this pull request

Node identification and operations #25

Closed

Croydon-Brixton added this to In progress in GeoGraph Feb 24, 2021

Croydon-Brixton added 7 commits February 24, 2021 22:56

lint: fix lint by turning unneccessary list into generator

8d585b1

Merge branch 'feature/graph-analysis' of github.com:ai4er-cdt/gtc-bio…

4cda25c

…diversity into feature/graph-analysis-identification

exp: simplified notebooks

d9586e7

Merge branch 'feature/graph-analysis' of github.com:ai4er-cdt/gtc-bio…

776d67c

…diversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py

refactor: refactor identify node method for use with underlying geopa…

878e3c0

…ndas datastrcuture Warning: Current implementation is not yet optimized for performance and currently slower than previous method (~10x).

feature: add bulk node identification for new dataframe interface

d223923

Note: old functions kept for now for sanity checks

feature: add node_id sorting for node identification

79bd345

Croydon-Brixton marked this pull request as ready for review March 1, 2021 17:31

Croydon-Brixton requested review from rdnfn and herbiebradley and removed request for rdnfn March 1, 2021 17:32

Croydon-Brixton added concerns: model kind: feature labels Mar 1, 2021

fix: fix self referential nodes in add_habitat

acc7874

More info here: #34 (comment)

Croydon-Brixton commented Mar 2, 2021

View reviewed changes

rdnfn approved these changes Mar 2, 2021

View reviewed changes

herbiebradley reviewed Mar 2, 2021

View reviewed changes

src/models/geograph.py Show resolved Hide resolved

Croydon-Brixton and others added 2 commits March 3, 2021 10:49

Apply suggestions from code review

2d8b943

doc: fix typos Co-authored-by: rdnfn <75615911+rdnfn@users.noreply.github.com>

refactor: rename poly -> polygon

6803470

herbiebradley reviewed Mar 3, 2021

View reviewed changes

src/models/polygon_utils.py Show resolved Hide resolved

herbiebradley approved these changes Mar 3, 2021

View reviewed changes

refactor: create property geometry in favor of _geometry, add module …

a8af146

…imports Note: This addresses #28 (comment)

Croydon-Brixton mentioned this pull request Mar 3, 2021

Performance-improvement: Speed up de9im pattern matching #37

Open

herbiebradley reviewed Mar 3, 2021

View reviewed changes

src/models/polygon_utils.py Outdated Show resolved Hide resolved

Croydon-Brixton and others added 6 commits March 3, 2021 12:24

fix: correct polygon_utils type hints

52e5378

refactor: isort

636bce4

cleanup: remove old identify_nodes function

b1197bf

fix: Rtree indexing and minor improvements

308a7e9

lint: black

359c710

fix: add_habitat bug and perf. improvement

7a99470

herbiebradley reviewed Mar 3, 2021

View reviewed changes

src/models/geograph.py Show resolved Hide resolved

fix: inaccurate comment

1d54eaa

herbiebradley approved these changes Mar 3, 2021

View reviewed changes

fix: return List from identify_node

29beff4

This was referenced Mar 3, 2021

Feature-request: Allow identification with nodes of custom class label #38

Open

Performance-improvement: Combine boolean masks #39

Open

Croydon-Brixton merged commit 7d937c4 into feature/graph-analysis Mar 3, 2021

Croydon-Brixton deleted the feature/graph-analysis-identification branch March 3, 2021 15:13

Croydon-Brixton moved this from In progress to Done in GeoGraph Mar 3, 2021

Croydon-Brixton mentioned this pull request Mar 3, 2021

Node identification and operations #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/graph analysis identification #28

Feature/graph analysis identification #28

Croydon-Brixton commented Feb 24, 2021 •

edited

review-notebook-app bot commented Feb 24, 2021

Croydon-Brixton commented Mar 1, 2021 •

edited

Croydon-Brixton Mar 2, 2021

rdnfn left a comment •

edited

herbiebradley Mar 3, 2021

Croydon-Brixton Mar 3, 2021 •

edited

herbiebradley Mar 3, 2021

herbiebradley left a comment

herbiebradley left a comment

Feature/graph analysis identification #28

Feature/graph analysis identification #28

Conversation

Croydon-Brixton commented Feb 24, 2021 • edited

review-notebook-app bot commented Feb 24, 2021

Croydon-Brixton commented Mar 1, 2021 • edited

Croydon-Brixton Mar 2, 2021

Choose a reason for hiding this comment

rdnfn left a comment • edited

Choose a reason for hiding this comment

herbiebradley Mar 3, 2021

Choose a reason for hiding this comment

Croydon-Brixton Mar 3, 2021 • edited

Choose a reason for hiding this comment

herbiebradley Mar 3, 2021

Choose a reason for hiding this comment

herbiebradley left a comment

Choose a reason for hiding this comment

herbiebradley left a comment

Choose a reason for hiding this comment

Croydon-Brixton commented Feb 24, 2021 •

edited

Croydon-Brixton commented Mar 1, 2021 •

edited

rdnfn left a comment •

edited

Croydon-Brixton Mar 3, 2021 •

edited