Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct 99.9% normalization to 100% normalization for "many-zero" columns #894

Merged
merged 6 commits into from
Feb 9, 2023

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Closes #890. When an expression column for cell Pixie clustering is mostly zero, the 99.9% quantile value may equal 0, which will cause NaNs to appear for normalization. To prevent this, 100% quantile values should be set for those columns.

How did you implement your changes

Add correction in the way the normalization values are set in CellSOMCluster.normalize_data. Define the normalization values prior to applying them to cell_data_sub.

@alex-l-kong alex-l-kong self-assigned this Jan 26, 2023
Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely solves the error that arises, but actually the correct behavior is for the function to ignore 0's when calculating the 99.9th percentile. For example, this is the mirroring code in create_pixel_matrix:

quant_dat[fov] = fov_full_pixel_data.replace(
0, np.nan
).quantile(q=0.999, axis=0)

In that function, you are setting all 0's to nan, so the quantile function ignores those numbers. That's what we want here too.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@alex-l-kong alex-l-kong requested review from HPiyadasa and removed request for HPiyadasa February 9, 2023 20:35
@alex-l-kong
Copy link
Contributor Author

@HPiyadasa can you please approve this if it looks good? Need it so I can officially merge the changes in.

@alex-l-kong alex-l-kong added this pull request to the merge queue Feb 9, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 9, 2023
@alex-l-kong alex-l-kong merged commit 8d698a0 into main Feb 9, 2023
@alex-l-kong alex-l-kong deleted the cell_99 branch February 9, 2023 22:11
@srivarra srivarra added the bug Something isn't working label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In cell clustering, 99th percentile normalization is returning NaNs
4 participants