Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop cell labels before every neighborhood clustering step #319

Merged
merged 9 commits into from
Nov 9, 2020

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Addresses and closes #316. We were not generating the correct silhouette score visualization because compute_cluster_metrics didn't receive neighborhood_counts with the label column dropped. This PR patches that up.

How did you implement your changes

We move the .drop calls on neighborhood_counts into the arguments into the clustering functions.

@alex-l-kong alex-l-kong self-assigned this Nov 3, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't move the drop calls inside of compute_cluster_metrics and generate_cluster_matrix_results? Given that we're always going to want to perform that action for any cluster data that's passed in, I think it actually does make sense to do it in the function

@alex-l-kong
Copy link
Contributor Author

@ngreenwald to me, it's a little awkward having to pass in the explicit label name into both functions when the sole purpose of doing so is simply to drop that column. That being said, that's kind of the same thing having to do .drop every time we pass it in.

How about dropping the label column at the end of create_neighborhood_matrix, so that way, both neighbor_counts and neighbor_freqs already have label dropped beforehand? I think that would save us a lot of trouble.

@ackagel
Copy link
Contributor

ackagel commented Nov 4, 2020

so, the output of the create_neighborhood_matrix is basically a cell table yeah? Any reason we can't append (or add in some standardized way) the phenotype data onto the existing cell table, and just configure settings.py to do the proper trimming?

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented Nov 9, 2020

@ngreenwald so looking back at this problem, I made an explicit call to drop label columns before running the clustering steps in the notebook. The reason is because the user can simply refer to the corresponding label column in all_data if they really need label information for neighbor_counts/neighbor_freqs. I don't think there's a need to really keep a duplicate copy of label in the neighbor matrix.

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented Nov 9, 2020

@ngreenwald @ackagel while Will works on a fix for the database at Van Valen Lab (which affects Segment_Image_Data.ipynb, I'll request a review on this PR since I only changed a few things.

@ngreenwald
Copy link
Member

okay, so neighbor_counts and neighbor_freqs are created by create_neighborhood_matrix. The only thing we use either of these for is plotting, correct? We're never going to return these to the user? So can we instead modify create_neighborhood_matrix to produce a version of these two arrays that already has the label column dropped? Is there a reason they ever need to be included?

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented Nov 9, 2020

@ngreenwald yeah, I can do it in the function as well, will update on my next commit and let you know when that's done.

@alex-l-kong
Copy link
Contributor Author

@ngreenwald moved the dropping of the cell label column to inside spatial_analysis, this PR is ready to be reviewed!

@ngreenwald
Copy link
Member

Great, looks good. Can you add a new issue with Adam's suggestion? I think we can probably do some additional refactoring to consolidate neighbor_counts, neighbor_freqs, and all_data, but given that we haven't finished all the steps in this notebook yet I think it's fine to hold off.

@ngreenwald ngreenwald merged commit c3b700b into master Nov 9, 2020
@ngreenwald ngreenwald deleted the cluster_fix branch November 9, 2020 23:11
alex-l-kong added a commit that referenced this pull request Jan 14, 2021
* Make sure cell labels are dropped before every clustering step

* Drop label column before running clustering

* Minor comment fix, mostly to see if DeepCell server is back up

* Move label dropping logic into neighborhood clustering step to hide from user

* Don't need to drop cell label column anymore in tests
y2kbugger pushed a commit that referenced this pull request Jul 29, 2021
* Make sure cell labels are dropped before every clustering step

* Drop label column before running clustering

* Minor comment fix, mostly to see if DeepCell server is back up

* Move label dropping logic into neighborhood clustering step to hide from user

* Don't need to drop cell label column anymore in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Neighborhood analysis cluster visualization process doesn't drop label column before clustering
3 participants