New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/graph analysis identification #28
Feature/graph analysis identification #28
Conversation
…/gtc-biodiversity into feature/graph-analysis-identification Accepted all incoming changes. Conflicts: src/models/geograph.py
…diversity into feature/graph-analysis-identification Accepted all changes from master. Fixed two linelength problems with pylint. Conflicts: src/models/geograph.py
…/gtc-biodiversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…diversity into feature/graph-analysis-identification
…diversity into feature/graph-analysis-identification Conflicts: src/models/geograph.py
…ndas datastrcuture Warning: Current implementation is not yet optimized for performance and currently slower than previous method (~10x).
Note: old functions kept for now for sanity checks
This PR consists of the We identify nodes based on the following tactic:
The node and graph merging and stacking is currently not implemented yet. That will come on a new branch and leverage the identification. I thought it might be useful to have the identification and some test data already now on the main graph-analysis branch - hence this PR. The relevant files are:
Other contents of this PR:
Extra notes: |
More info here: #34 (comment)
# Filter candidates according to the same class label | ||
# fmt: off | ||
candidate_ids = candidate_ids[ | ||
other_graph._class_label(candidate_ids) == label # pylint: disable=protected-access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature:
Just noticed this now. It'd be useful to also allow identifying nodes with other (custom) classes where we determine the class-to-class mapping. I will add this as an extra small feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Just added some very minor comments, some just to start a general discussion about unifying our notation & style, so those definitely don't need to be fully addressed for this to be merged. Happy for it to be merged as is.
In general excellent to see the progress here, excited for how this work can be used in the GeoGraphTimeline
class and then consequently in our visualisation 🚀
EDIT: github shows the .gpkg
files as empty, are they actually?
doc: fix typos Co-authored-by: rdnfn <75615911+rdnfn@users.noreply.github.com>
other_graph._geometry(candidate_ids), # pylint: disable=protected-access | ||
) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively it should be slightly faster to combine these two filters into one, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also think that if numpy does short-circuit evaluation on these things this should be faster. @herbiebradley I don't know if it does, do you?
Concretely: Do you know how numpy handles these type of cases?
Case select_from_array[np.logical_or(condition_array1, condition_array2)]
Does it first evaluate both condition_array1
and condition_array2
in the slice [ ... ]
and then or
the conditions (in which case it'd probably be slower bc we would calculate the geometry overlaps for shapes which won't agree in class label).
Or does it calculate the first element of condition_array1
and then short-circuit decide if that element of condition_array2
even needs to be calculated? (in which case I think it should be slightly faster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about np.logical_or
but if you use base python or
and a generator to select then it should short-circuit and may be faster - there are some interesting timing results here on selecting from boolean arrays: https://stackoverflow.com/questions/58422690/filtering-a-numpy-array-what-is-the-best-approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I added all the comments I could think of ;)
…imports Note: This addresses #28 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the indexing issue discussed on Slack, and added a minor performance improvement to add_habitat
, chmod 664 to all saved files, and some minor bugfixes. Looks good to merge now once the return type of identify_nodes
is resolved.
Still WIP, but I'm creating a draft PR so that people can already make early comments and check out the work for their branches.
More detailed info & finalized PR coming soon.
This addresses #25
Note:
There's lots of files because I also pushed the test-cases for testing node identification (the
.gpkg
files) - simply ignore these.Note also that the branch contains merges of @herbiebradley latest code on
feature/graph-analysis-habitat
for fixing thertree
. You can also ignore those.