Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistently use uint32 in candidate detection #388

Merged
merged 4 commits into from
Feb 27, 2024

Conversation

alessandrofelder
Copy link
Member

@alessandrofelder alessandrofelder commented Feb 16, 2024

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Structure detection currently fails silently/crashes when there are more than 2**16-1 temporary connected components.

What does this PR do?

Ensures cell candidate detection uses unsigned 32-bit integers (note this may cause an minor increase in memory usage, but will avoid cellfinder hanging/crashing when there are more than 2**16-1 cells). I've documented some additional bits along the way.

References

Closes #383 and https://forum.image.sc/t/append-error-with-cellfinder/88634/9
Related to #389
There are a few improvements that could be made, see #387 and #386 but it's more critical that we fix this bug quickly now.

How has this PR been tested?

Tested on user data provided by https://forum.image.sc/t/append-error-with-cellfinder/88634/9 and an internal MSc student, on the internal HPC system. cellfinder times out with old code, and completes successfully with reasonably looking cell candidates when using new code.

Two external users not associated with the core BrainGlobe team have tested this development branch and have confirmed this fixes their problems.

I have compared the "old" (pre-numba) code and this development branch on the internal MSc student data, and found the below

- pre-numba this dev branch
total candidates found 52502 52649
candidates not present in other 5977 6124
candidates with nearest candidate in other > 2 microns away 147 7

In summary, for real-life data, this development branch basically preserves all but 7 cell candidates (out of 50000+) found pre-numba, and finds 147 additional ones. I would say this is good enough.

I have added a test for the key function in #385 .

Is this a breaking change?

No

Does this PR require an update to the documentation?

No

Checklist:

  • The code has been tested locally (manually)
  • Tests have been added to cover all new functionality (unit & integration)
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@alessandrofelder alessandrofelder self-assigned this Feb 16, 2024
@alessandrofelder alessandrofelder marked this pull request as ready for review February 26, 2024 17:43
@alessandrofelder alessandrofelder requested a review from a team February 26, 2024 17:46
Copy link
Collaborator

@willGraham01 willGraham01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳 Fingers crossed we never encounter any data with 2**32-1 cells in it 😓 Though I did think of some ways that we can adapt the struct_id memory usage given the number of candidates.

Also, we should release a new version after merging this right?

@willGraham01 willGraham01 merged commit 97e28b7 into main Feb 27, 2024
14 checks passed
@willGraham01 willGraham01 deleted the switch-detection-to-uint32 branch February 27, 2024 09:18
@adamltyson
Copy link
Member

I'll release a new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] Inappropriate handling of unsigned integer data types
3 participants