Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JUMP Mitocheck feature analysis comparison with KS tests #42

Merged
merged 3 commits into from
Oct 30, 2023

Conversation

gwaybio
Copy link
Member

@gwaybio gwaybio commented Oct 19, 2023

Related to the analysis in WayScience/JUMP-single-cell#13

jump_mitocheck_feature_space_analysis

Legend:
Comparing JUMP and Mitocheck nuclei feature spaces. (A) Kolmogorov-Smirnov (KS) test results comparing JUMP and Mitocheck per common CellProfiler feature colored by specific CellProfiler feature group. The boxplot whiskers represent the interquartile range of 1,000 permutations of randomly subsampled JUMP single-cells from a single plate (JUMP Pilot plate BR00116991) compared to Mitocheck. Mitocheck and JUMP sample size is the same (n = 2,916). We show both raw and z-score normalized comparisons. (B) The same KS test results focused on AreaShape measurements, which showed the lowest differences in feature distributions across datasets. (C) Comparing variance of JUMP and Mitocheck for all CellProfiler features. The dotted lines are the function y=x (anything below is a feature with higher variance in Mitocheck). Note that low variance features group together near zero and obscure colors. (D) The same variance plot as panel C except focused only on the AreaShape features.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! I left a few comments to address but LGTM!


full_summary_boxplot <- (
ggplot(ks_test_df, aes(x=feature, y=ks_stat))
+ geom_boxplot(aes(color = feature_group), outlier.size = 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does outlier.size do?

Copy link
Member Author

@gwaybio gwaybio Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resource does a good job explaining: https://www.geeksforgeeks.org/change-size-of-outlier-labels-on-boxplot-in-r/

ggplot lets you customize a lot of things in your plots. In a boxplot (aka box and whisker plot) the box means something, as do the lines coming from the box (hinges). Usually the hinges of the box means the IQR (interquartile range) of the data, which is the 25th and 75th percentile of the range. The outliers are points that fall beyond this: https://ggplot2.tidyverse.org/reference/geom_boxplot.html

We can control attributes of the outliers using this API! (e.g., outlier.size = ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments I have:

  1. For panel C, what can you confirm that other is the rest of the AreaShape features or based on the legend title, is it the other feature groups? I think this plot would be more convincing if there was more than just Zernike highlighted, but maybe not.
  2. Can you remind me what the dotted blue lines are in panels C and D. Are you able to add it into the legend or is it self explanatory where it isn't necessary for most audiences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to my comment in 2, I can see in the legend that you addressed this.

But now that I take a second look, is there a reason why you don't see any red in the raw data in panel C?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this comment!

I think that based on our recent discussions together with @MattsonCam - we should shift focus of this plot slightly. I agree that it would be more convincing if we show more than just AreaShape in C, and it would help to highlight zernike's in D when we focus. Essentially, the updates are to decrease focus as sharply.

The dotted blue lines are definitely not standard, but a reader will likely intuit their importance. The legend addressess this fully.

But now that I take a second look, is there a reason why you don't see any red in the raw data in panel C?

All these features have very low variance. I can add this to the legend, thanks!

# Define function for loading data
load_process_data <- function(file, normalized_or_raw) {
ks_test_df <- readr::read_tsv(
results_file,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably just my inexperience here with R functions, but I am confused why you have function(file, normalized_or_raw) and then in this line you have results_file. I would assume that if you are loading in a file in the function that the variable name file would be used here and not a new name called results_file.

I just don't see the variable file used in this function so I am a bit confused, but please let me know if I just missed it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major typo! Thanks for catching this! Kudos!

I've fixed it in the next commit. (note, it doesn't actually have any impact in this present script, but could be devastating if used elsewhere.

@gwaybio
Copy link
Member Author

gwaybio commented Oct 30, 2023

Thanks again for the review @jenna-tomkinson ! I will go ahead and merge now.

@gwaybio gwaybio merged commit 6e76b8c into WayScience:main Oct 30, 2023
@gwaybio gwaybio deleted the add-kstest-fig branch October 30, 2023 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants