Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic rounding to integers for downstream use in TotalVI/SCVI #47

Closed
mdmanurung opened this issue May 17, 2022 · 9 comments · Fixed by #50
Closed

Stochastic rounding to integers for downstream use in TotalVI/SCVI #47

mdmanurung opened this issue May 17, 2022 · 9 comments · Fixed by #50
Assignees
Labels
enhancement New feature or request

Comments

@mdmanurung
Copy link
Contributor

Hi Caibin,

I tried using scar's output as input for TotalVI/SCVI. As expected, those gave an error because the input is not integer anymore.
I would suggest implementing stochastic rounding to integers as done in SoupX.

Let me know if you're interested and I can find the time to implement it.

Regards,
Mikhael

@CaibinSh
Copy link
Collaborator

Hi @mdmanurung ,

Thanks very much for your feedback.

No, we haven't tried combining scar with TotalVI/scVI yet. It is good to know that they require integer as input.

Sure, it is a great to implement stochastic rounding to integers, and I am really thankful for your offer. Let's do it together. I am going to create a branch for this.

Best,
Caibin

@CaibinSh CaibinSh added the enhancement New feature or request label May 18, 2022
@CaibinSh
Copy link
Collaborator

As a start, we probably could add a parameter 'rounding' in the following lines:

  1. https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_vae.py: line 97
  2. https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_scar.py: line 532

And then, we can use a if loop before line 116 and 117 in the file of https://github.com/Novartis/scar/blob/47-stochastic-rounding/scar/main/_vae.py.

Please feel free to tell me what you think.

Best, Caibin

@mdmanurung
Copy link
Contributor Author

Hi Caibin, I made PR #48.

Please feel free to adapt it to your code style.

@mdmanurung
Copy link
Contributor Author

Closing because I saw that the merging is in the works now.

Any further plans for the package? I use it quite often these days and would love to contribute more.

@CaibinSh
Copy link
Collaborator

Hi @mdmanurung ,

Yes, I already made a new release.

I am more than happy to welcome contributors. There are definitely many things on the list.

E.g., automating the calculation of ambient profile. This will make it easier for new comers to use scAR.

My idea is to leverage existing methods to define the subsets of cell-free droplets and quantify the ambient profile. I found the idea of DropletQC https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02547-0 very brilliant. But I haven't tested it yet. It would be great if somebody could re-implement it in Python and integrated it in scAR. Please tell me what you think.

Best, Caibin

@mdmanurung
Copy link
Contributor Author

mdmanurung commented May 19, 2022

That's a great step forward. IIRC, DropletQC requires spliced/unspliced counts. I would suspect that this is a rather non-standard worfklow. The users would have to re-run the counting with e.g. velocyto. That can be non-trivial! But if such matrices are already available, then the calculation would be quite straightforward.

Perhaps cellbender's idea can be adapted here. So the users can feed the unfiltered matrix and then the algorithm will probabilistically identify the empty droplets, followed by decontamination with the scAR. That being said, this option would be non-trivial for the coders.

@mdmanurung
Copy link
Contributor Author

mdmanurung commented May 19, 2022

Oh wait, I was late to realize that they have the nuclear_fraction_tags function.

EDIT: after a brief tour through their codebase, I would say that making an rpy2 wrapper would be more feasible.

@CaibinSh
Copy link
Collaborator

CaibinSh commented Jun 7, 2022

Hi @mdmanurung , sorry for the late response. Last month was a busy month for me due to the job interviews.

I agree with you on the DropletQC, after researching, I also found that the BAM files might be required, which may not be very convenient in some cases. In addition, this may put up a barrier in the application of scAR to snRNAseq.

I found the idea of EmptyDrops fits the ambient signal hypothesis well. It tests droplets with multinomial distribution over genes (with ambient profile as the prob) to distinguish cells and empty droplets.

I have incorporated a method called setup_anndata to facilitate the calculation of ambient profile. Please also see a tutorial here. Happy to hear your feedback.

Best, Caibin

@mdmanurung
Copy link
Contributor Author

Sorry for the late response, @CaibinSh! Everything looks great, including the revamped documentation pages. I hope more and more people are using this awesome module.

Regards,
Mikhael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants