Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wish: fusion gene for RNA-Seq #210

Closed
tanglingfung opened this issue Dec 9, 2013 · 23 comments
Closed

wish: fusion gene for RNA-Seq #210

tanglingfung opened this issue Dec 9, 2013 · 23 comments

Comments

@tanglingfung
Copy link
Contributor

I am exploring different options of fusion gene detection from RNA-Seq data, would anyone have experience from different tools?
I found FusionQ at https://sites.google.com/site/fusionq1/home/ and FusionCatcher https://code.google.com/p/fusioncatcher/ with a bit of googling.

@tanglingfung
Copy link
Contributor Author

@mjafin
Copy link
Contributor

mjafin commented Dec 9, 2013

Tophat fusion (poor usability), http://tophat.cbcb.umd.edu/fusion_tutorial.html (looks like it's entirely free)
FusionMap, http://www.omicsoft.com/fusionmap/ (not free for for-profits)

Most of the Fusion callers I've seen in use are half baked, produce massive amounts of false positives and their licensing is unclear. Take deFuse where the authors themselves are confused about what license it's on. Further, deFuse and probably other fusion callers depend on BLAT which costs $$$$$$ for for-profits.

A few review papers:
http://www.biomedcentral.com/1471-2105/14/S7/S2
http://www.sciencedirect.com/science/article/pii/S0304383513000360

@tanglingfung
Copy link
Contributor Author

Thanks! STAR can identify fusion transcript as well, it seems Oncofuse is a well-trusted framework, but I cannot locate its license.

@mjafin
Copy link
Contributor

mjafin commented Dec 16, 2013

Oncofuse seems to be under the Apache 2.0 open source license! I can confirm that we have seen positive results using tophat in fusion mode and feeding the output to Oncofuse, to find a well known fusion in prostate cancer.

A simplified pipeline would be tophat (fusion mode) + Oncofuse or STAR + Oncofuse. This would be a completely open source solution and wouldn't depend on the extremely clumsy tophat-fusion-post (which also depends on BLAST and manually downloading data files).

Not 100% certain but Oncofuse might be limited to human only.

@tanglingfung
Copy link
Contributor Author

Great, is there any reason you like tophat over STAR? just curious

@mjafin
Copy link
Contributor

mjafin commented Dec 16, 2013

Sorry, what I meant was that tophat-fusion-post (the post processing step for tophat run in fusion mode) is very clumsy. I haven't actually tried STAR yet.

@roryk
Copy link
Collaborator

roryk commented Dec 20, 2013

Hi Miika and Paul,

Good discussion-- I have never done any fusion gene detection so I don't have any suggestions on that front. I can fix the Tophat support so it supports mapping in fusion mode and I'll fix the STAR support so it actually works, looking at it I checked in something half-done. Nice.

Would that be enough to at least get you to the point where you can run the tools on the output?

@roryk
Copy link
Collaborator

roryk commented Dec 21, 2013

I added Tophat fusion support with dd4fb1a. If you add fusion_mode: True to the algorithm field in your sample YAML file it will run Tophat in fusion mode, using Bowtie1. I'm leaving it undocumented right now because I don't know if it is good or not.

@roryk
Copy link
Collaborator

roryk commented Dec 21, 2013

I re-enabled STAR support via b9630e2, but I haven't tested it out on anything real, just our unit tests. I have a nice set of real test data that I'll report on when it is finished running.

@tanglingfung
Copy link
Contributor Author

Thanks Rory, I can draft the initial implementation of Oncofuse, and Miika can evaluate

@mjafin
Copy link
Contributor

mjafin commented Dec 22, 2013

Hi Paul and Rory,
Thanks, sound like excellent progress! I was about to say I can look into this too but only early next year as I'm away from access to our systems until 3rd Jan (greetings from Finland).

Happy to evaluate all of this on our data. Oncofuse is simple to use on Tophat output at least as it's just a Java package. (The documentation of Oncofuse is a bit, well, funny as one of the outputs is a Bayesian probability interpreted as a p-value and corrected for multiple testing using Bonferroni correction..heh)

Rory, regarding disambiguation and fusions, they are kind of mutually exclusive in the same run of bcbio. What I've done with explant fusion detection is to first run the disambiguation pipeline and then extract the disambiguated reads (and any unaligned mates of reads that survived disambiguation), drop them into fastq files and run fusion detection.

@tanglingfung
Copy link
Contributor Author

Rory, with the new installation script, what's the best way to test the new code without installing? (sorry, I am still not very comfortable with git)

@roryk
Copy link
Collaborator

roryk commented Dec 30, 2013

Hi Paul,

Happy holidays, thanks for all of your contributions this past year. Brad describes his approach to testing here: https://bcbio-nextgen.readthedocs.org/en/latest/contents/code.html#development-infrastructure. I describe what I do here: #147 (comment)

The idea is to not reinstall everything, but just have a separate installation of the bcbio-nextgen Python code that you can run in it's own python virtual environment. Then you can edit that code as much as you want and run it, this is useful if other people are using your bcbio-nextgen installation and you want to hack on one without breaking it for them. One gotcha is that when you invoke the development one you installed and you want to do a run, you need to explicitly point to the bcbio_system.yaml file in the galaxy directory, since it won't find it. So the invocation turns into:

bcbio_nextgen.py --your-options /path/to/bcbio_system.yaml /path/to/your/project.yaml file

Let me know if you run into any issues and thanks again for everything.

@tanglingfung
Copy link
Contributor Author

Thanks for the guideline, that is super helpful. I submitted a pull request #237 to check if that's the right way to implement oncofuse. it may not run properly at first, but I had some issues installing bcbio and will test it when that's fixed.

@tanglingfung
Copy link
Contributor Author

I hope this pull request works fine, and let me close this issue at the moment

#407

@ndaniel
Copy link

ndaniel commented Dec 19, 2014

Hei Miika,

the situation about fusion finders actually is quite good and out there are very good fusion finders which have low false positive rates!

Here is a more up to date comparison of fusion finders:

  • http://code.google.com/p/fusioncatcher/wiki/comparison
  • D. Nicorici, M. Satalan, H. Edgren, S. Kangaspeska, A. Murumagi, O. Kallioniemi, S. Virtanen, O. Kilkku, FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data, bioRxiv, Nov. 2014, DOI:10.1101/011650

@mjafin
Copy link
Contributor

mjafin commented Dec 19, 2014

Thanks Daniel,
The current approach in bcbio relies on STAR alignments and OncoFuse interpretation of the results. There is almost no overhead to standard RNA-seq quantification in the approach and no license requirements. We've seen it pick up validated fusions in our data but there are certainly (likely) false positives reported.

I think there is room for improvement in terms of including more fusion callers and Brad would probably be happy to accept pull requests. I'd be vary of incorporating anything that relies on BLAT though because of the $$$$$ license.

@ndaniel
Copy link

ndaniel commented Dec 19, 2014

FusionCatcher is using four aligners, which are Bowtie, Bowtie2, STAR, and BLAT. It is very easy to disable the BLAT aligner in FusionCatcher just by using the command line option "--skip-blat" (then FusionCatcher is using only 3 aligners instead of 4)! Therefore BLAT license is not an issue!
Regarding STAR we have found that is missing known fusion genes, like for example FGFR3-TACC3 fusion and EML4-ALK.

@roryk
Copy link
Collaborator

roryk commented Dec 19, 2014

Hi @ndaniel,

Thanks for the awesome comments; at HSPH we don't have very much experience at all with fusion genes. Do you have example data available with known fusion genes where Oncofuse and STAR is missing them? We'd love to improve the fusion gene calling and having a known set to work with would be really helpful.

@ndaniel
Copy link

ndaniel commented Dec 19, 2014

@roryk
Copy link
Collaborator

roryk commented Dec 19, 2014

Thanks @ndaniel,

Is there a way where we could skip a bunch of the aligning and what not and start out from using just STAR alignments?

@ndaniel
Copy link

ndaniel commented Dec 20, 2014

@roryk

If you refer to FusionCatcher then the answer is no. FusionCatcher is a fully automatic pipeline by itself and it needs to take as input RAW fastq files. There are no shortcuts.

@ndaniel
Copy link

ndaniel commented Dec 20, 2014

Hi @roryk

here it is a very small testing case which allows to test quickly a pipeline for missed fusions.

These two small FASTQ (paired-end reads) files (size less than 2MB):

contain reads for 9 known spike-in fusion genes:

  • EWS-ATF1
  • TMPRSS2-ETV1
  • EWS-FLI1
  • NTRK3-ETV6
  • CD74-ROS1
  • HOOK3-RET
  • EML4-ALK
  • AKAP9-BRAF
  • BRD4-NUT

from open-access synthetic spike-in mRNA-seq data for cancer gene fusions SRA.

Here is more info about this small case. I estimate that it should take less than 5 minutes to analyze this FASTQ files for your test!

If one runs FusionCatcher with exactly these two input FASTQ files, FusionCatcher detects all 9 fusions and the results are here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants