Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equal cell numbers between conditions? #208

Closed
asgerjakobsen opened this issue Feb 23, 2022 · 6 comments
Closed

Equal cell numbers between conditions? #208

asgerjakobsen opened this issue Feb 23, 2022 · 6 comments

Comments

@asgerjakobsen
Copy link

Hi, thank you for developing this great tool - I have found it really useful for my analysis.

I have a question about the differential abundance testing. I am comparing two conditions where there are significantly more cells in one condition than the other (about twice as many in the control (WT) condition versus the test (mutant) condition). I am interested in knowing whether there is a relative increase in cells in particular neighbourhoods in the test condition, compared to controls (i.e. are mutant cells distributed evenly or are there cell states with a relative increase/decrease in abundance, compared to what is seen in controls?).

I have tried running Milo with all cells in the dataset and do detect neighbourhoods with differential abundance, but I am unsure how to interpret the log fold change in this case. Is it calculated from the ratio of test vs control cells in a neighbourhood, or does it somehow take into account the difference in total cell number by normalising? Would it be better to downsample so that I have equal numbers of cells between conditions for testing?

Thanks!
Asger

@MikeDMorgan
Copy link
Member

Hi @asgerjakobsen Milo does account for differences in the numbers of cells across samples by normalizing in the GLM using trimmed mean of M-values (TMM): https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

Depending on how much these differences are confounded with your variable of interest will affect the quality of the normalisation. If there are global differences in the numbers of cells in and across nhoods, then this will be visible as an off-horizontal distribution in an MA plot. You can plot the results of your analysis using plotNhoodMA(), you should expect to see the data points distributed around 0 on the vertical axis. If these are drastically diagonal (up or down) then this indicates there is a very strong confounding between the numbers of cells and your experimental conditions, in which case downsampling would be the solution to this problem.

@asgerjakobsen
Copy link
Author

Thank you for such a quick response!

Yes, that makes sense now. I should have mentioned that I am doing a paired analysis, so from each sample I have both WT and Mut. I have attached an image of my design matrix. So for testing, my formula is: design = ~ Sample + clone.
Screenshot 2022-02-23 at 10 48 54

Here is an image of the MA plot which looks reasonable ok to me:
image

Do you think I can go ahead without downsampling?

@MikeDMorgan
Copy link
Member

Hi @asgerjakobsen That MA plot looks fine - I think you're good to go ahead.

@asgerjakobsen
Copy link
Author

Thank you for your help!

@alitinet
Copy link

Hi @MikeDMorgan,

I had the same situation in my analysis and as you suggested looked at MA plots for neighborhoods, the first one is what I get when running on unbalanced classes (5k vs 15k cells in each of the conditions) and the second is when I downsampled the second condition to 5k cell too. MA plots don't look that great to me, do you think it's ok to run MILO here or not? And maybe you can also help me interpret these MA plots, e.g. what do these straight lines indicate? Thanks!
Screenshot 2022-05-27 at 13 41 04
Screenshot 2022-05-27 at 13 57 31

@MikeDMorgan
Copy link
Member

Hi @alitinet These straight lines usually occur when you have nhoods with mostly 0's in them - we strongly recommend inspecting the histogram of nhood counts and make sure that the minumum nhood size is ~N X S, where N is between 5+ and S is the total number of experimental samples.

How many samples do you have in your experiment - this looks suspiciously like you are under-powered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants