Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PBC capability to merge_split_MEST #372

Open
wants to merge 18 commits into
base: RC_v1.5.x
Choose a base branch
from

Conversation

w-k-jones
Copy link
Member

@w-k-jones w-k-jones commented Nov 29, 2023

This turned into a general refactor of merge_split_MEST, with the following changes and additions:

  1. Adds PBC capabilities, using the same BallTree distance search approach as used in feature detection.
  2. Moves filtering by frame_len before calculating the minimum spanning tree, improving the number of merging/splitting cells linked particularly for long tracking time periods.
  3. Changes the cell merging logic to use scipy.sparse.csgraph.connected_components, improving performance
  4. Made sure all output arrays are int dtype, and started the track id from 1 rather than 0

I'm also going to look into adding a flag for whether a cell started with a split or ended with a merge, and which object it was merged into/split from

  • Have you followed our guidelines in CONTRIBUTING.md?
  • Have you self-reviewed your code and corrected any misspellings?
  • Have you written documentation that is easy to understand?
  • Have you written descriptive commit messages?
  • Have you added NumPy docstrings for newly added functions?
  • Have you formatted your code using black?
  • If you have introduced a new functionality, have you added adequate unit tests?
  • Have all tests passed in your local clone?
  • If you have introduced a new functionality, have you added an example notebook?
  • Have you kept your pull request small and limited so that it is easy to review?
  • Have the newest changes from this branch been merged?

@w-k-jones w-k-jones added the enhancement Addition of new features, or improved functionality of existing features label Nov 29, 2023
@w-k-jones w-k-jones added this to the Version 1.5.2 milestone Nov 29, 2023
@w-k-jones
Copy link
Member Author

Keeping as draft until #368 is merged as this contains the same commits

@w-k-jones w-k-jones self-assigned this Nov 29, 2023
@w-k-jones w-k-jones linked an issue Nov 29, 2023 that may be closed by this pull request
@kelcyno
Copy link
Collaborator

kelcyno commented Nov 29, 2023

Oh this looks really interesting @w-k-jones I'll get started on it!

@w-k-jones
Copy link
Member Author

Ok, I ended up going down a bit of a rabbit hole there. I found that because the filtering for frame_len was carried out after the minimum euclidean spanning tree calculation, valid links were being removed because a spatially closer cell at another point in time was being removed and therefore removing the edges between two neighbouring cells. This had more of an impact the longer the time period being tracked. When I swapped to filtering by frame_len before calculating the MEST it massively increased the number of linked cells, which then caused the rest of the merge/split routine to run slower. I've been experimenting with scipy.sparse.csgraph.connected_components recently, and this seemed like a really good application for linking the neighbouring cells into connected tracks

@w-k-jones w-k-jones marked this pull request as ready for review November 29, 2023 21:06
@w-k-jones w-k-jones changed the base branch from RC_v1.5.x to main November 29, 2023 21:07
@w-k-jones w-k-jones changed the base branch from main to RC_v1.5.x November 29, 2023 21:07
Copy link

codecov bot commented Nov 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (53df42d) 56.79% compared to head (e9ec119) 56.61%.
Report is 43 commits behind head on RC_v1.5.x.

Additional details and impacted files
@@              Coverage Diff              @@
##           RC_v1.5.x     #372      +/-   ##
=============================================
- Coverage      56.79%   56.61%   -0.18%     
=============================================
  Files             20       20              
  Lines           3435     3407      -28     
=============================================
- Hits            1951     1929      -22     
+ Misses          1484     1478       -6     
Flag Coverage Δ
unittests 56.61% <100.00%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@freemansw1
Copy link
Member

I haven't started reviewing this yet, but I wonder if we shouldn't try to push this to v1.5.3? I'm starting to get antsy about v1.5.2 coming out.

Copy link
Member

@freemansw1 freemansw1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, @w-k-jones; a critical improvement to an important piece of code.

I'm not that familiar with this code, but I've added in some thoughts here in a few places. Probably would be good to get a review in from @kelcyno.

@@ -1461,7 +1461,7 @@ def filter_min_distance(
target: {'maximum', 'minimum'}, optional
Flag to determine if tracking is targetting minima or maxima in
the data. Default is 'maximum'.
PBC_flag : str('none', 'hdim_1', 'hdim_2', 'both')
PBC_flag : str('none', 'hdim_1', 'hdim_2', 'both'), optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what style is required here, but I think we should move from optional to say that the default is none.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the main functions, I think we need to standardise a way of highlighting optional keywords and their default values. I'll update this for now by stating the default at the end of the description

@@ -22,6 +44,8 @@ def merge_split_MEST(TRACK, dxy, distance=None, frame_len=5):
The x/y grid spacing of the data.
Should be in meters.

dz : float, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the rest of the 3D code, we support both constant vertical grid spacing and user-specified varying vertical grid spacing. We can either require constant vertical grid spacing for this code (less ideal) and check that the user didn't pass in an array-like, or update the code to support both (if it doesn't already). I'm happy enough with either outcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked that functionality in the rest of the code before. I'll see if a similar system to that used in tracking can be used here. I think the same restriction as here may exist in the filtering of features by proximity in feature detection so I'll also look into resolving that

cell_number_unassigned: int, optional
Value given tp unassigned/non-tracked cells by tracking. Default is -1

PBC_flag : str('none', 'hdim_1', 'hdim_2', 'both'), optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on optional as above.

@w-k-jones
Copy link
Member Author

w-k-jones commented Mar 9, 2024

Reminder to myself: currently the PBC handling requires all of h1_min, h1_max, h2_min and h2_max to be provided even if PBCs are calculated over only one dimension. I should fix this to use default values for the dimension not being used before merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Addition of new features, or improved functionality of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PBC support to merge_split_MEST
4 participants