Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene fusions #52

Open
schelhorn opened this issue Mar 28, 2016 · 26 comments
Open

Gene fusions #52

schelhorn opened this issue Mar 28, 2016 · 26 comments

Comments

@schelhorn
Copy link

Wicked fast indeed! Are there any plans to extend salmon to also detect gene fusion events? There isn't a fast and accurate way to do that yet, only approaches requiring full alignments. Most often a base-perfect breakpoint isn't required, an estimate within a hash length is fine. We are a heavy user of bcbio and are also running the full STAR alignment just for gene fusions, which really sucks. Any ideas would be much appreciated.

@rob-p
Copy link
Collaborator

rob-p commented Mar 28, 2016

Hi @schelhorn,

Yes; we are actively looking at fusion prediction based on quasi-mapping. The initial results are promising, but we're still working on improving and refining the method. I'll be sure to let you know when we have something that is ready to test :).

Best,
Rob

@schelhorn
Copy link
Author

Excellent. May I point out that tools such as Oncofuse https://github.com/mikessh/oncofuse/ and Pegasus https://github.com/RabadanLab/Pegasus have a particular, additional value since they provide functional annotation of fusion events identified by other approaches? Also, these resources may prove helpful wrt validation data: bcbio/bcbio-nextgen#210 and http://m.genome.cshlp.org/content/early/2015/11/10/gr.186114.114 Adding @roryk here for highlighting this feature request in bcbio.

@rob-p
Copy link
Collaborator

rob-p commented Mar 29, 2016

Awesome; thanks for the pointers! We'll definitely take a look at these.

@schelhorn
Copy link
Author

Hello @rob-p, may I ask whether there are any news concerning gene fusion detection in Salmon?

@rob-p
Copy link
Collaborator

rob-p commented Oct 20, 2016

Hi @schelhorn,

Yes, we have built a pipeline atop salmon and quasi-mapping. At this point, what we see is that it is very fast with high sensitivity. Our main focus has been on improving the specificity, which is current better than some, but not all methods. I realize, of course, that false-positives are a very difficult (and key) problem in this domain, so I'd really like to make sure they are well-handled.

@schelhorn
Copy link
Author

Great; would you like help testing the pipeline, and integrating it into bcbio? We could help with both :)

@schelhorn
Copy link
Author

Also, do you know if the Salmon pseudo-BAM is suitable for fusion calling by standard (alignment-based) fusion calling tools, ie does the BAM include information on mate pairs mapped across transcripts, or reads spanning breakpoints?

@rob-p
Copy link
Collaborator

rob-p commented Oct 25, 2016

Hi @schelhorn,

Sorry for the uncharacteristically slow response on this. We're going full steam ahead for the RECOMB deadline, so I've been less responsive than usual. Anyway, I've invited you to the repository for the fusion project (it's currently private). Feel free to poke around, but it's probably not useful until we can send you a short writeup describing the current pipeline (since things are still very "alpha"). Regarding calling fusions from the sam output of Salmon, one can't do this directly because there are, by default, no encompassing reads (i.e. individual reads split between transcripts) and, to improve abundance estimation, salmon is conservative with it's use of spanning reads. However, we can get at this information from quasi-mapping, so I can definitely consider adding some flags to provide this info (this is the type of thing we output in the fusion pipeline currently, and then we have to postprocess it).

@schelhorn
Copy link
Author

Excellent; thank you. We'll have a look and see what we can contribute.

@schelhorn
Copy link
Author

Hello @rob-p, could you please invite @tetianakh to the repo as well? She'll do the development on our end. Thanks!

@rob-p
Copy link
Collaborator

rob-p commented Oct 28, 2016

Hi @schelhorn,

Sure, I'll had her now. We'll send you a small write-up about the state of the codebase and how to run the current pipeline next week (once my student is back from the current CSHL meeting with all of the cool kids ;P).

@schelhorn
Copy link
Author

Sweet!

@roryk
Copy link
Contributor

roryk commented Nov 4, 2016

Hi Rob,

Could I get in on this? We have a couple projects needing to call fusions on a large amount of samples, and it would be great to have something speedy to iterate on.

@schelhorn
Copy link
Author

FYI, I also asked in the kallisto project: pachterlab/kallisto#122

@tetianakh
Copy link

Hi @rob-p, I haven't received an invitation to the private repo. Could you please invite me? Thanks!

@rob-p
Copy link
Collaborator

rob-p commented Nov 7, 2016

Hi @tetianakh, I've re-sent the invitation. If you don't get it, please send me an e-mail, and I'll reply with the link to join directly.

@tetianakh
Copy link

Thanks, I've received it now.

@rob-p
Copy link
Collaborator

rob-p commented Nov 7, 2016

Great :). I'll have @hiraksarkar write up a brief overview of the current state of the codebase (including which branch contains the latest stuff) this week. We can either share that information in the issues over at that repo, or we can e-mail you the write-up @schelhorn, @tetianakh and @roryk. Let me know if one method is preferable to the other.

@schelhorn
Copy link
Author

Great; directly in the repo is preferred.

@kellrott
Copy link

This sounds cool. Have you looked at submitting your method for the DREAM RNA-Seq analysis challenge ( https://synapse.org/SMC_RNA ) ?

@nellore
Copy link

nellore commented Feb 17, 2017

And any status updates? I'd be interested to test drive a quasi-mapping-based fusion caller!

@schelhorn
Copy link
Author

One fast way using pseudo-alignments should be Kallisto+[Manta|Pizzly], but I haven't tried that myself. We decided to go with full transcriptome alignments instead and integrated EricScript into bcbio. We'd still be interested in something more modern, though.

@rob-p
Copy link
Collaborator

rob-p commented Feb 18, 2017

If one has a downstream fusion pipeline that uses transcriptome mapping, you can already get those from the -z=<output.sam> option for a while. The real challenge is how to properly control the false positive rate. That's the main thing special purpose downstream software must solve.

@nellore
Copy link

nellore commented Feb 18, 2017

Thanks for the tips; I'll experiment.

@erprateek
Copy link

Hi @rob-p,
We are working towards creating fusion calling pipeline based on Salmon/Pizzly. It would be helpful to see the current state of the repository and try to replicate some of the experiments we have done with it. We seem to be hitting good specificity but lagging a bit short on sensitivity.
Thanks,
Prateek

@taylorreiter
Copy link

Hello @rob-p! I was wondering if there have been any updates on the fusion/detection of spanning reads problem. I'm about to embark on a project to process many bacterial transcriptomes from many different genomes/species and plan to use salmon. I would love to be able to detect polycistronic transcripts through the identification of spanning reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants