Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kmeans neighborhood notebook take 2 #714

Merged
merged 27 commits into from
Sep 26, 2022

Conversation

cliu72
Copy link
Contributor

@cliu72 cliu72 commented Sep 17, 2022

What is the purpose of this PR?

Update and clean-up the k-means neighborhood notebook.

How did you implement your changes

  • Fixed a bug where the wrong cell phenotypes were being counted due to fovs missing certain cell phenotypes.
  • The old notebook required the cell table to include both a column that specified a numeric id for each cell phenotype, and a column that had the cell phenotypes. I simplified this to only require the column with the cell phenotypes. Specifically, in create_neighborhood_matrix, removed cluster_id_col, which was redundant with cluster_name_col.
  • Set default of self_neighbor to False, so a cell is not counting itself as its neighbor.
  • Set cluster numbers to start at 1. Previously was starting at 0, which is confusing because in cell overlays, 0 usually corresponds to empty slide.
  • Add inertia as a metric.
  • Subsample cells for silhouette score calculation.
  • Remove cells that have no neighbors before clustering.
  • Scaled (z-scored) the heatmaps, which makes them more interpretable.
  • Added heatmap that shows the composition of each neighborhood (basically the average feature vector for each cluster).
  • Fixed overlay generation function so correct legend is shown.
  • Added subheadings and more descriptive text to notebook.

Remaining issues

  • Need to talk about whether to delete the part of the spatial LDA notebook that does kmeans

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cliu72
Copy link
Contributor Author

cliu72 commented Sep 19, 2022

@ngreenwald wtf

@@ -1,18 +1,32 @@
{
Copy link
Member

@ngreenwald ngreenwald Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you can use a smaller heading size for this text.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it's showing up as huge in ReviewNB. It looks fine on github (https://github.com/angelolab/ark-analysis/blob/kmeans_neighborhood_nb_update/templates_ark/example_neighborhood_analysis_script.ipynb) and looks fine in my jupyter.

@@ -1,18 +1,32 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a min_k as well? Did you ever look at k values below 5?


Reply via ReviewNB

Copy link
Contributor Author

@cliu72 cliu72 Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have looked at k values below 5 when looking at specific regions of the lymph node (i.e. what are the neighborhoods only within the T cell zone?). But I added a min_k anyways for flexibility.

@@ -1,18 +1,32 @@
{
Copy link
Member

@ngreenwald ngreenwald Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grande text size again


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,18 +1,32 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to replace all of this with Mantis once we have the single directory saving implemented? Or some of these plots should be kept even once we make that change?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, the mantis function (create_mantis_dir) just copies all the masks over, so the single directory saving will just take these masks and copy them over. This function actually creates the overlays.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, some additional small comments here

ark/analysis/spatial_analysis.py Outdated Show resolved Hide resolved
ark/analysis/spatial_analysis.py Outdated Show resolved Hide resolved
assert (counts.loc[:9, "Pheno2"] == 8).all()
assert (counts.loc[10:19, "Pheno3"] == 8).all()
# test the counts values
assert (counts[(counts[settings.FOV_ID] == "fov8") &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were the tests not using two FOVs before? Why are we now indexing for specific FOVs, and specific segmentation labels within those FOVs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know what was going on with the tests before. After fixing the bug in the code, those tests were just straight up wrong. I manually confirmed that this version is correct (literally just counted the values in the data table). Feel free to check my work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sometimes ReviewNB does weird stuff. Once you've addressed all of the comments hit the re-request review button at the top, that way I know you're actually finished, easier for you than responding to each comment individually.


# dot binarized 'is neighbor?' matrix with pheno_has_cell to get counts
counts = pheno_has_cell.dot(cell_dist_mat_bin).T
counts_pd = pd.DataFrame(counts, columns=pheno_names_from_tab)
counts_pd = counts_pd.set_index(current_fov_neighborhood_data.index.copy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for code readability or is fixing a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing a bug. Setting the columns is important - not all FOVs have the same cell types. This was the problem before that was leading to the wrong results. Setting the index is needed to merge the tables for all FOVs.

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some additional comments, mostly regarding the notebook.

ark/utils/spatial_analysis_utils.py Outdated Show resolved Hide resolved
@@ -1,18 +1,34 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can adjust the colorbar width so it doesn't get too close to the heatmap?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue with all places where visualize.draw_heatmap is called. I noticed it when testing the cell clustering notebook. I believe it is also used in the spatial enrichment notebook. I think we can leave this for another pull request, so whoever makes the fix can confirm the heatmaps look okay in all of those notebooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed just now that some of the other example notebooks do not have example output visualizations, as you have requested for this one. I think the pixel clustering and cell clustering notebooks, and spatial enrichment notebooks would also benefit from example outputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cliu72 seems like the sample visualizations I had in the pixel and cell clustering notebooks got removed during some recent PRs, I'll ask @srivarra put those back in while he addresses the example dataset PR on his end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue with all places where visualize.draw_heatmap is called. I noticed it when testing the cell clustering notebook. I believe it is also used in the spatial enrichment notebook. I think we can leave this for another pull request, so whoever makes the fix can confirm the heatmaps look okay in all of those notebooks.

Yeah, the underlying heatmap function unfortunately doesn't offer great documentation in order to fix this. I think it'll require looking under the hood to see what internal properties are being set, I had to do that with a different seaborn function somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cliu72 @alex-l-kong Are the sample visualizations the plots generated by the notebook? So the notebooks should be saved with their output contained from each cell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srivarra yes, the overlays for both notebooks as well as the weighted channel heatmaps for the cell clustering notebook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Sounds good!

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple more minor comments on the notebook.

Copy link
Contributor

@alex-l-kong alex-l-kong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@srivarra srivarra added the enhancement New feature or request label Sep 26, 2022
@cliu72
Copy link
Contributor Author

cliu72 commented Sep 26, 2022

Addressed all comments. @ngreenwald @ackagel anything to add?

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ngreenwald ngreenwald merged commit d4a246a into main Sep 26, 2022
@ngreenwald ngreenwald deleted the kmeans_neighborhood_nb_update branch September 26, 2022 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants