Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/graph analysis identification #28

Merged

Conversation

Croydon-Brixton
Copy link
Collaborator

@Croydon-Brixton Croydon-Brixton commented Feb 24, 2021

Still WIP, but I'm creating a draft PR so that people can already make early comments and check out the work for their branches.

More detailed info & finalized PR coming soon.
This addresses #25

Note:
There's lots of files because I also pushed the test-cases for testing node identification (the .gpkg files) - simply ignore these.
Note also that the branch contains merges of @herbiebradley latest code on feature/graph-analysis-habitat for fixing the rtree. You can also ignore those.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Croydon-Brixton Croydon-Brixton changed the base branch from master to feature/graph-analysis February 24, 2021 19:19
@Croydon-Brixton Croydon-Brixton linked an issue Feb 24, 2021 that may be closed by this pull request
@Croydon-Brixton Croydon-Brixton added this to In progress in GeoGraph Feb 24, 2021
…diversity into feature/graph-analysis-identification
…diversity into feature/graph-analysis-identification

Conflicts:
	src/models/geograph.py
…ndas datastrcuture

Warning: Current implementation is not yet optimized for performance and currently slower than previous method (~10x).
Note: old functions kept for now for sanity checks
@Croydon-Brixton
Copy link
Collaborator Author

Croydon-Brixton commented Mar 1, 2021

This PR consists of the identify_nodes function as well as a couple of test cases - closes #25

We identify nodes based on the following tactic:

  1. For each node get all candidates via an RTree query
  2. Remove all nodes which have a different class label (only labels with same class will be identified with each other)
  3. Compare the geometries of the nodes (3 different modes possible, which are successively more stringent: corner, edge, interior. corner identifies all nodes that have at least a 0D overlap (point, line, area), edge identifies if there's at least a 1D overlap (line or area), interior identifies if there's at least a 2D overlap)

The node and graph merging and stacking is currently not implemented yet. That will come on a new branch and leverage the identification. I thought it might be useful to have the identification and some test data already now on the main graph-analysis branch - hence this PR.

The relevant files are:

  • src/models/binary_graph_operations.py and src/models/polygon_utils.py contain the main functionality (together with the legacy functions for now - can remove them though).
  • notebooks: include sample test cases and timings (indices 8, 9, 10). Best to watch these on ReviewNB
  • src/tests includes two scripts create_test_data (which creates 15 binary test cases of tiny landscapes (c.f. below) and test_utils which are used to plot node identification examples in the notebooks.

Other contents of this PR:

  • 15 binary test cases of tiny landscapes with different nodes and class labels for doing tests of the node identification (let me know if you think I should exclude them from the PR)
  • Minor fixes of linting
  • Adding some defaults
  • Named our index (just for fun 😄 )

Extra notes:
Note 1: Geopandas access
Takeaway: Do not use .iloc or .loc indexing if you need performance. They can be 10-100x slower than directly accessing the underlying numpy array. This stackexchange issue reports the same observation.

image

@Croydon-Brixton Croydon-Brixton marked this pull request as ready for review March 1, 2021 17:31
@Croydon-Brixton Croydon-Brixton requested review from rdnfn and herbiebradley and removed request for rdnfn March 1, 2021 17:32
# Filter candidates according to the same class label
# fmt: off
candidate_ids = candidate_ids[
other_graph._class_label(candidate_ids) == label # pylint: disable=protected-access
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature:
Just noticed this now. It'd be useful to also allow identifying nodes with other (custom) classes where we determine the class-to-class mapping. I will add this as an extra small feature.

Copy link
Member

@rdnfn rdnfn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Just added some very minor comments, some just to start a general discussion about unifying our notation & style, so those definitely don't need to be fully addressed for this to be merged. Happy for it to be merged as is.

In general excellent to see the progress here, excited for how this work can be used in the GeoGraphTimelineclass and then consequently in our visualisation 🚀

EDIT: github shows the .gpkg files as empty, are they actually?

src/models/binary_graph_operations.py Outdated Show resolved Hide resolved
src/models/binary_graph_operations.py Outdated Show resolved Hide resolved
src/models/binary_graph_operations.py Outdated Show resolved Hide resolved
src/models/geograph.py Outdated Show resolved Hide resolved
src/models/geograph.py Outdated Show resolved Hide resolved
src/models/geograph.py Outdated Show resolved Hide resolved
src/models/polygon_utils.py Outdated Show resolved Hide resolved
src/tests/utils.py Show resolved Hide resolved
src/models/binary_graph_operations.py Show resolved Hide resolved
Croydon-Brixton and others added 2 commits March 3, 2021 10:49
doc: fix typos

Co-authored-by: rdnfn <75615911+rdnfn@users.noreply.github.com>
other_graph._geometry(candidate_ids), # pylint: disable=protected-access
)
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively it should be slightly faster to combine these two filters into one, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also think that if numpy does short-circuit evaluation on these things this should be faster. @herbiebradley I don't know if it does, do you?

Concretely: Do you know how numpy handles these type of cases?
Case select_from_array[np.logical_or(condition_array1, condition_array2)]
Does it first evaluate both condition_array1 and condition_array2 in the slice [ ... ] and then or the conditions (in which case it'd probably be slower bc we would calculate the geometry overlaps for shapes which won't agree in class label).
Or does it calculate the first element of condition_array1 and then short-circuit decide if that element of condition_array2 even needs to be calculated? (in which case I think it should be slightly faster)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about np.logical_or but if you use base python or and a generator to select then it should short-circuit and may be faster - there are some interesting timing results here on selecting from boolean arrays: https://stackoverflow.com/questions/58422690/filtering-a-numpy-array-what-is-the-best-approach

Copy link
Member

@herbiebradley herbiebradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I added all the comments I could think of ;)

Copy link
Member

@herbiebradley herbiebradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the indexing issue discussed on Slack, and added a minor performance improvement to add_habitat, chmod 664 to all saved files, and some minor bugfixes. Looks good to merge now once the return type of identify_nodes is resolved.

@Croydon-Brixton Croydon-Brixton merged commit 7d937c4 into feature/graph-analysis Mar 3, 2021
@Croydon-Brixton Croydon-Brixton deleted the feature/graph-analysis-identification branch March 3, 2021 15:13
@Croydon-Brixton Croydon-Brixton moved this from In progress to Done in GeoGraph Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
GeoGraph
  
Done
Development

Successfully merging this pull request may close these issues.

Node identification and operations
3 participants