Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not able to estimate from scRNAseq package #82

Closed
asifzubair opened this issue Sep 23, 2019 · 7 comments
Closed

not able to estimate from scRNAseq package #82

asifzubair opened this issue Sep 23, 2019 · 7 comments

Comments

@asifzubair
Copy link

asifzubair commented Sep 23, 2019

I'm trying to estimate parameters from datasets available in the scRNAseq package. Since these datasets are already of type SingleCellExperiment,

> allen <- scRNAseq::ReprocessedAllenData()
> allen
class: SingleCellExperiment 
dim: 20816 379 
metadata(2): SuppInfo which_qc
assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
rownames(20816): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
rowData names(0):
colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
reducedDimNames(0):
spikeNames(0):
altExpNames(1): ERCC

I thought I should be able to do something like this:

> params <- splatter::splatEstimate(allen)

However, I get an error when I do that:

Error in assay(object, i = exprs_values, ...) : 
  'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i'
'counts' not in names(assays(<SingleCellExperiment>))

Any ideas on what's going on and how to fix it ?

Thanks!

@asifzubair
Copy link
Author

Ok, I was able to estimate params using this:

> counts <- (assay(allen))
> params <- splatEstimate(counts)
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, pdistnam = pdistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): non-finite finite-difference value [2]>

The counts function doesn't seem to be working with allen. But, still confused that I couldn't simply use allen directly.

Also, is simpleError something to worry about ?

@lazappi
Copy link
Collaborator

lazappi commented Sep 23, 2019

Hi @asifzubair

The first error is because this dataset doesn't have anything in the counts slot. If you look at the assays above there is tophat_counts, cufflinks_fpkm, rsem_counts and rsem_tpm but no counts. This is fairly unusual but there should probably be a better error message (or fall back to whatever the first assay is).

The simpleError is from one of the fitting stages. It's probably not something to worry about but ideally you shouldn't see that message. It's something else to look into.

@asifzubair
Copy link
Author

Yes, it is kinda odd that the counts slot is missing. However, the scRNAseq package is from the authors of SingleCellExperiment, so they might have a reason for doing that. BTW, the same error is thrown when I call counts(allen) - because of the reason you stated.

@asifzubair
Copy link
Author

Hi @lazappi

Thank you for addressing this issue in the codebase.

Also, I was wondering if you have any suggestions on how to pick appropriate de.facLoc and de.facScale values? I find that when testing my method it is sensitive to how these values are set and I don't want to set them manually. Is there some way I could estimate them as well?

Thank you,
Best,

Asif

@lazappi
Copy link
Collaborator

lazappi commented Nov 19, 2019

If you have a dataset with known groups in it you can perform a differential expression analysis between two groups and fit a log-normal distribution to the foldchange (not log-foldchange) estimates. This is approximately what Splat is trying to simulate. I usually use the fitdistrplus package for that kind of fitting https://cran.r-project.org/web/packages/fitdistrplus/index.html.

@asifzubair
Copy link
Author

Hi @lazappi - would you recommend fitting the log-normal to the FC between two groups or between one group and the rest?

Also - I just wanted your thoughts on this: recently, in the CellAssign paper they claimed that the splatter model needs to be augmented because it doesn't model logFC well in all cases. They have some very (draft) experiments to demonstrate this here. Do you think their claims hold water and that perhaps the augmented model should be incorporated into splatter? Thanks!

@lazappi
Copy link
Collaborator

lazappi commented Mar 9, 2020

The generated factors are relative to a fictional base cell so I think fitting one group compared to everything else would be closest to that. This is a very approximate process though so I'm not sure it would make too much difference.

Thanks for letting me know about the comments in the CellAssign paper! 😸 I hadn't seen those before. I haven't looked at it in a lot of detail but it is definitely possibly that another distribution is a better fit. It's definitely something to consider adding as an option to the Splat model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants