Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correcting in multiple steps #21

Closed
shokohirosue opened this issue Nov 9, 2020 · 9 comments
Closed

correcting in multiple steps #21

shokohirosue opened this issue Nov 9, 2020 · 9 comments

Comments

@shokohirosue
Copy link

Hi Aaron, thank you for developing and maintaining a great package!

I have time course data from 4 different time points, and each time point has multiple batches. There are biological differences between time points, but there are common cells too.

How would you recommend me to select HVGs, multibatchnorm and batch correct for such a dataset?

If I batch correct within each time point first then batch correct across time points, can I use MNN corrected matrix as an input for MNN correction?

Thank you so much for your help!

@LTLA
Copy link
Owner

LTLA commented Nov 10, 2020

I would create a blocking factor where each batch in each time point is its own level, e.g., T1-B1, T1-B2, etc.

Then I would run it through my variance modelling function of choice with block= set to that blocking factor. Technically, this will ignore HVGs driven by genuine biological differences between time points, but if you're planning on merging the time points together, then you've already committed to ignoring those differences (at least for the merged analyses).

Next is the multiBatchNorm() with batch= set to the blocking factor. If you're dealing with datasets generated by the same technology and as part of the same study, you might not need to do this step at all, but I usually keep it in "just in case".

Finally, onto the fastMNN() step. You can accommodate complex merges by setting the merge.order= argument. For example:

merge.order = list(c("T1-B1", "T1-B2", "T1-B3"), c("T2-B1", "T2-B2"), c("T3-B1", "T3-B2"))

will instruct fastMNN() to merge batches 1, 2 and 3 for time point 1; then batches 1 and 2 for time point 2; then batches 1 and 2 for time point 3; and then finally, to merge the results of all of those initial merges together, across time points. This is, in theory, the best way to do these merges - check out the "Controlling the merge order" section of ?fastMNN.

Alternatively - and not really recommended - you can call multiBatchPCA() to do the PCA step manually, and then perform repeated calls to reducedMNN() to assemble the hierarchical merge of batches-then-timepoints. For various reasons, this won't give quite the same results as allowing fastMNN() to handle it for you, so you should just do the above instead.

@shokohirosue
Copy link
Author

shokohirosue commented Nov 10, 2020

Thank you! This is very helpful.

With the output of modelGeneVar with block=sce$timepoint_batch, how would you suggest to set subset.row argument for fastMNN? Can it take different set of features for different stages of MNN, or am I supposed to combine the variable features from each level in some way? (such as combineVar?)

Thank you very much for your help.

@LTLA
Copy link
Owner

LTLA commented Nov 11, 2020

You'll need a single set of features - you can just call getTopHVGs() on the output of modelGeneVar() without any extra work, as modelGeneVars() will automatically combine the statistics across the batches. I would suggest setting n=5000 in getTopHVGs(), but you can play around with those settings if you like.

@shokohirosue
Copy link
Author

Thank you very much!

@shokohirosue
Copy link
Author

shokohirosue commented Nov 11, 2020

I closed the issue, but would you mind if I ask you one more question?

I have treatment samples and control samples from each time point, and controls are supposed to be the same in every time point. (If there are changes between different time points' controls, I am not interested in those and ideally want to remove them.)

Is it possible to compare biological difference between timepoints using this control information?
Is it possible to correct batch effects for the whole data using subset of cells in the batch? (cells from control samples)

Thank you very much for your help.

@LTLA
Copy link
Owner

LTLA commented Nov 12, 2020

Sure, check out the restrict= option. The idea is to compute the correction to only the cells in the control group, and then to extrapolate the same correction to every cell in the batch.

Personally, I have found this mode to be rather disappointing in practice, because the batch effects in the control are often not the same as the batch effects in the treatment. I suspect it only works if your controls are really similar to the treatments (e.g., same cell types, similar expression profiles). But you can give it a go and see how it works for you.

Also keep in mind that, while you will preserve differences between treatments, this may not be desirable for annotation of your dataset, e.g., if you have to manually re-identify the same cell type across each of your treatments. The main - and maybe only - purpose of merging datasets is so that we only have to cluster and annotate once; if you have a strong treatment effect, the same cell type might be scattered across different clusters (one per treatment), which would be annoying to match up.

@shokohirosue
Copy link
Author

That makes sense, thank you very much.

@shokohirosue
Copy link
Author

Hi Aaron,

Sorry for reopening this issue. May I ask you a few more questions re. this experimental design?

  1. Which would you recommend, normalising within a batch and normalising within a sample?

  2. In this context, at which point should I filter out lowly expressed genes? (calculateAverage(sce) > 0.1)

I am setting min.mean=0.1 for computeSumFactors and using top 5000 highly variable genes for multiBatchNorm/MNN. Do I still have to worry about filtering out genes before within-batch normalization or batch correction?

Thank you very much for your help.

@LTLA
Copy link
Owner

LTLA commented Jan 21, 2021

Would you mind opening a new issue? This should make it easier to keep track of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants