correcting in multiple steps #21

shokohirosue · 2020-11-09T17:25:08Z

Hi Aaron, thank you for developing and maintaining a great package!

I have time course data from 4 different time points, and each time point has multiple batches. There are biological differences between time points, but there are common cells too.

How would you recommend me to select HVGs, multibatchnorm and batch correct for such a dataset?

If I batch correct within each time point first then batch correct across time points, can I use MNN corrected matrix as an input for MNN correction?

Thank you so much for your help!

The text was updated successfully, but these errors were encountered:

LTLA · 2020-11-10T08:32:53Z

I would create a blocking factor where each batch in each time point is its own level, e.g., T1-B1, T1-B2, etc.

Then I would run it through my variance modelling function of choice with block= set to that blocking factor. Technically, this will ignore HVGs driven by genuine biological differences between time points, but if you're planning on merging the time points together, then you've already committed to ignoring those differences (at least for the merged analyses).

Next is the multiBatchNorm() with batch= set to the blocking factor. If you're dealing with datasets generated by the same technology and as part of the same study, you might not need to do this step at all, but I usually keep it in "just in case".

Finally, onto the fastMNN() step. You can accommodate complex merges by setting the merge.order= argument. For example:

merge.order = list(c("T1-B1", "T1-B2", "T1-B3"), c("T2-B1", "T2-B2"), c("T3-B1", "T3-B2"))

will instruct fastMNN() to merge batches 1, 2 and 3 for time point 1; then batches 1 and 2 for time point 2; then batches 1 and 2 for time point 3; and then finally, to merge the results of all of those initial merges together, across time points. This is, in theory, the best way to do these merges - check out the "Controlling the merge order" section of ?fastMNN.

Alternatively - and not really recommended - you can call multiBatchPCA() to do the PCA step manually, and then perform repeated calls to reducedMNN() to assemble the hierarchical merge of batches-then-timepoints. For various reasons, this won't give quite the same results as allowing fastMNN() to handle it for you, so you should just do the above instead.

shokohirosue · 2020-11-10T13:40:45Z

Thank you! This is very helpful.

With the output of modelGeneVar with block=sce$timepoint_batch, how would you suggest to set subset.row argument for fastMNN? Can it take different set of features for different stages of MNN, or am I supposed to combine the variable features from each level in some way? (such as combineVar?)

Thank you very much for your help.

LTLA · 2020-11-11T08:40:42Z

You'll need a single set of features - you can just call getTopHVGs() on the output of modelGeneVar() without any extra work, as modelGeneVars() will automatically combine the statistics across the batches. I would suggest setting n=5000 in getTopHVGs(), but you can play around with those settings if you like.

shokohirosue · 2020-11-11T16:01:24Z

Thank you very much!

shokohirosue · 2020-11-11T16:51:47Z

I closed the issue, but would you mind if I ask you one more question?

I have treatment samples and control samples from each time point, and controls are supposed to be the same in every time point. (If there are changes between different time points' controls, I am not interested in those and ideally want to remove them.)

Is it possible to compare biological difference between timepoints using this control information?
Is it possible to correct batch effects for the whole data using subset of cells in the batch? (cells from control samples)

Thank you very much for your help.

LTLA · 2020-11-12T08:02:27Z

Sure, check out the restrict= option. The idea is to compute the correction to only the cells in the control group, and then to extrapolate the same correction to every cell in the batch.

Personally, I have found this mode to be rather disappointing in practice, because the batch effects in the control are often not the same as the batch effects in the treatment. I suspect it only works if your controls are really similar to the treatments (e.g., same cell types, similar expression profiles). But you can give it a go and see how it works for you.

Also keep in mind that, while you will preserve differences between treatments, this may not be desirable for annotation of your dataset, e.g., if you have to manually re-identify the same cell type across each of your treatments. The main - and maybe only - purpose of merging datasets is so that we only have to cluster and annotate once; if you have a strong treatment effect, the same cell type might be scattered across different clusters (one per treatment), which would be annoying to match up.

shokohirosue · 2020-11-12T16:23:35Z

That makes sense, thank you very much.

shokohirosue · 2021-01-20T12:13:18Z

Hi Aaron,

Sorry for reopening this issue. May I ask you a few more questions re. this experimental design?

Which would you recommend, normalising within a batch and normalising within a sample?
In this context, at which point should I filter out lowly expressed genes? (calculateAverage(sce) > 0.1)

I am setting min.mean=0.1 for computeSumFactors and using top 5000 highly variable genes for multiBatchNorm/MNN. Do I still have to worry about filtering out genes before within-batch normalization or batch correction?

Thank you very much for your help.

LTLA · 2021-01-21T07:57:27Z

Would you mind opening a new issue? This should make it easier to keep track of.

shokohirosue closed this as completed Nov 11, 2020

shokohirosue mentioned this issue Jan 21, 2021

pre-processing for MNN #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correcting in multiple steps #21

correcting in multiple steps #21

shokohirosue commented Nov 9, 2020

LTLA commented Nov 10, 2020

shokohirosue commented Nov 10, 2020 •

edited

Loading

LTLA commented Nov 11, 2020

shokohirosue commented Nov 11, 2020

shokohirosue commented Nov 11, 2020 •

edited

Loading

LTLA commented Nov 12, 2020

shokohirosue commented Nov 12, 2020

shokohirosue commented Jan 20, 2021

LTLA commented Jan 21, 2021

correcting in multiple steps #21

correcting in multiple steps #21

Comments

shokohirosue commented Nov 9, 2020

LTLA commented Nov 10, 2020

shokohirosue commented Nov 10, 2020 • edited Loading

LTLA commented Nov 11, 2020

shokohirosue commented Nov 11, 2020

shokohirosue commented Nov 11, 2020 • edited Loading

LTLA commented Nov 12, 2020

shokohirosue commented Nov 12, 2020

shokohirosue commented Jan 20, 2021

LTLA commented Jan 21, 2021

shokohirosue commented Nov 10, 2020 •

edited

Loading

shokohirosue commented Nov 11, 2020 •

edited

Loading