Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing for std calculation #110

Merged
merged 17 commits into from
Feb 19, 2024

Conversation

McHaillet
Copy link
Collaborator

This PR improves the calculation of the standard deviation over a template matching search. It closes #97.

First some background on why I updated this. Template matching has a huge search space (N_voxels * rotations) which is mainly false positives, and has in comparison a tiny fraction of true positives. If we have a Gaussian for the background (with expected mean 0 and some standard deviation), the false alarm rate can be calculated for a certain cut-off value, as it is dependent on the size of the search space. For example, a false alarm rate of (N_voxels * rotations)^(-1), indicates it would expect 1 false positive in the whole search. This can be calculated with error function,

$$N^{-1} = erfc( \theta / ( \sigma \sqrt{2} ) ) / 2$$

, where theta is the cut-off, sigma the standard deviation of the Gaussian, and N the search space.

The search space is easily calculated, the standard deviation can be kept tracked of for each orientation (which is what this custom square_sum_kernel is used for). However there were true problems:

  • with the next_fast_len for ffts the volume was padded with zeros that were also incorporated in the square_sum_kernel.
  • with subvolume splitting I added overhang between subvolumes, needed for accurate scores, but these regions were incorporated doubly in the standard deviation calculation.

I fixed it by passing a search_volume_roi to template matching which contains the actual region of interest wihout fft padding and without template overhang. The indexing is already calculated, so I just created a slicing for the cupy array.

I first added tests to check whether the search_space and std are consistent with subvolume splitting (and also rotation splitting), and realised there was a bug in tmjob.py in calculating the start of the subvolume (lines 380 to 384): I did not put brackets around template_size // 2 which messes with integer division with the minus sign in front 😩 . This is now fixed.

I also added plotting of an extraction graph which shows a histogram of extracted scores together with the background Gaussian (as this is now properly estimated). Which I think is nice for users to see.

Let me know if anything is unclear.

@McHaillet McHaillet requested a review from sroet February 19, 2024 11:17
Copy link
Collaborator

@sroet sroet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 suggestion and 1 line was added twice, LGTM otherwise

src/pytom_tm/matching.py Outdated Show resolved Hide resolved
tests/test_tmjob.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sroet sroet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(selected the wrong option)

Co-authored-by: Sander Roet <sanderroet@hotmail.com>
@McHaillet McHaillet requested a review from sroet February 19, 2024 12:23
Copy link
Collaborator

@sroet sroet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to merge

@McHaillet McHaillet merged commit 9fd8f28 into SBC-Utrecht:main Feb 19, 2024
@McHaillet McHaillet deleted the slicing-for-std-calculation branch February 19, 2024 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pass subvolume slicing to TemplateMatchingGPU for improved std
2 participants