-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct 99.9% normalization to 100% normalization for "many-zero" columns #894
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely solves the error that arises, but actually the correct behavior is for the function to ignore 0's when calculating the 99.9th percentile. For example, this is the mirroring code in create_pixel_matrix
:
ark-analysis/ark/phenotyping/pixel_cluster_utils.py
Lines 780 to 782 in e6050fc
quant_dat[fov] = fov_full_pixel_data.replace( | |
0, np.nan | |
).quantile(q=0.999, axis=0) |
In that function, you are setting all 0's to nan, so the quantile function ignores those numbers. That's what we want here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@HPiyadasa can you please approve this if it looks good? Need it so I can officially merge the changes in. |
What is the purpose of this PR?
Closes #890. When an expression column for cell Pixie clustering is mostly zero, the 99.9% quantile value may equal 0, which will cause NaNs to appear for normalization. To prevent this, 100% quantile values should be set for those columns.
How did you implement your changes
Add correction in the way the normalization values are set in
CellSOMCluster.normalize_data
. Define the normalization values prior to applying them tocell_data_sub
.