oncofuse #237

tanglingfung · 2014-01-01T01:24:43Z

haven't tested the script yet but want to check if the style is consistent

happy new year

roryk · 2014-01-01T20:25:55Z

Hi Paul,

Looks good-- you might want to look at bcbio.providance.do.run function instead of using subprocess directly; if you do that you will get all the cool logging of the command line stuff for free.

tanglingfung · 2014-01-01T22:06:11Z

thanks. I just followed the implementation of snpEff. if it's implemented appropriately, I can change snpEff's as well.

now, I am trapped here chapmanb/cloudbiolinux#139. I need to install everything to my new machine to test the script

…t via the shell instead of subprocess #237

chapmanb · 2014-01-02T11:22:10Z

Paul;
Thanks much for this work. I pushed a fix to snpEff to use provenance do.run so it would be a better example. Apologies, I missed this one when updating to the newer framework.

In terms of installation, bcbio-nextgen is not well tested with MacOSX installation. We have some of the pieces in place but really haven't gone through a full installation. I expect this to be a bit of work due to making things compile. If you want to test on a Mac machine, it might be worth using a virtual machine. Rory put together instructions here:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/installation.html#on-a-virtual-machine

We're currently moving towards a VM and docker container approach to try to avoid needing to get all this software cleanly compiling on multiple systems. Hope this helps with testing things out.

tanglingfung · 2014-01-02T14:01:44Z

Thanks Brad. Just that my linux machine is not ready yet, I will move back to centos soon. Appreciate all the helps. The docker container approach looks very nice.

tanglingfung · 2014-01-05T19:56:38Z

sorry that i cannot test till Monday

tanglingfung · 2014-01-08T21:12:41Z

@roryk, I just checked, tophat2+oncofuse should run ok now; I put fusion detection as part of transcript counts, not sure if it's ok with you

roryk · 2014-01-09T03:25:49Z

Hi Paul,

Thanks for the pull request-- do you have anything like a very small test dataset that we could add that should have a fusion event in it so I can add a unit test? We haven't really used this type of data much so we have nothing.

tanglingfung · 2014-01-10T00:29:11Z

@roryk , I personally don't. I was looking into the SRA, and we may use the following study as test. Let me know if it's ok, and I can prepare a template. We may add the sra toolkit such that the pipeline can start with .sra file

http://sra.dnanexus.com/studies/SRP028528

roryk · 2014-01-10T00:49:03Z

Sure-- sounds good! Thanks Paul for taking the lead on this.

mjafin · 2014-01-10T10:41:00Z

Hi Paul,
At least EBI offers most of these files as fastq.gz directly, see e.g. ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR948/SRR948091
(first of the samples in the study you quote). I'm not a big fan of the sra toolkit personally as it used to be very unreliable and buggy.

tanglingfung · 2014-01-10T13:10:49Z

Thanks Miika! I wanted to find it at EBI but couldn't find it at first. Thanks for the link. Should you have a better dataset, please let me know, and you may test if I implemented oncofuse correctly. Thanks.

mjafin · 2014-01-10T14:00:15Z

There is also a simulated data set available from the authors of FusionMap, mentioned in this publication http://www.biomedcentral.com/content/pdf/1471-2105-14-S7-S2.pdf on page 9.
I think it's somewhere here: http://www.arrayserver.com/wiki/index.php?title=FusionMap but I'll check a bit later..

mjafin · 2014-01-11T10:08:59Z

There is some test data in the FusionMap bundle:
http://www.omicsoft.com/fusionmap/Software/FusionMap_2014-01-01.zip

roryk · 2014-01-12T22:58:02Z

Hi Paul and Miika,

Thanks for all your work; let us know how the test data looks. Having a nice unit test and a good idea if it is working properly for this would be great before the merge.

tanglingfung · 2014-01-12T23:44:22Z

Sure Rory.

I have been trying the simulated dataset in fusionmap, but there is an
except caught at FastqToSam (message copied below). I am trying to put the
provided bam as input and see if that makes things easier.

Paul

Exception in thread "main" net.sf.picard.PicardException: In paired mode,
read name 1 (R_1:1) does not match read name 2 (R_1:2)

at net.sf.picard.sam.FastqToSam.getBaseName(FastqToSam.java:291)

at net.sf.picard.sam.FastqToSam.doPaired(FastqToSam.java:187)

at net.sf.picard.sam.FastqToSam.doWork(FastqToSam.java:140)

at
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)

at net.sf.picard.sam.FastqToSam.main(FastqToSam.java:118)

On Sun, Jan 12, 2014 at 2:58 PM, Rory Kirchner notifications@github.comwrote:

Hi Paul and Miika,

Thanks for all your work; let us know how the test data looks. Having a
nice unit test and a good idea if it is working properly for this would be
great before the merge.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/237#issuecomment-32137184
.

roryk · 2014-01-12T23:58:27Z

Hi Paul,

Looks like the :1 and :2 identifiers are throwing it off, they should either be /1 and /2 to denote which member of the pair it is, or the Illumina style 1:Y:18:ATCACG style:

http://en.wikipedia.org/wiki/FASTQ_format

tanglingfung · 2014-01-13T09:13:36Z

Thanks for the hint. It works, but oncofuse didn't report the right answer,
looking into it.

Paul

On Sun, Jan 12, 2014 at 3:58 PM, Rory Kirchner notifications@github.comwrote:

Hi Paul,

Looks like the :1 and :2 identifiers are throwing it off, they should
either be /1 and /2 to denote which member of the pair it is, or the
Illumina style 1:Y:18:ATCACG style:

http://en.wikipedia.org/wiki/FASTQ_format

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/237#issuecomment-32138706
.

tanglingfung · 2014-01-13T22:31:35Z

@roryk , the script runs ok (using the test dataset in fusionMap) on hg19 reference after changing the :1 and :2 identifiers to /1 and /2. It doesn't work on GRCh37 unless I take out 'chr' from refseq annotation in oncofuse package: oncofuse/common/refseq_full.txt

what would you recommend in fixing these two issues?

P.S. the other example takes too long as a unit test

tanglingfung · 2014-01-13T22:32:51Z

I copied my sample config here: https://gist.github.com/tanglingfung/8409280

roryk · 2014-01-14T22:15:11Z

Hi Paul,

This is great! Thank you so much for all of your work. We're going to roll a release tomorrow morning and after that I'll work on testing this and merging it in as part of the next release.

tanglingfung · 2014-01-14T22:58:45Z

@roryk , no problem, I haven't tested with the latest updates, and star index is generated every time when i debug my code

roryk · 2014-01-16T20:30:01Z

Hi Paul,

I merged this in here: 1502ebd. I set you as the author since it was mostly your code, I'm sorry I'm not good enough at git to know how to add my commits on top of yours and also clean up the hhistory, so I merged them all together and set you to be the author.

There is some work that needs to be done to get this production ready. Some help on these items would be awesome.

One problem is the unit test needs to be pointed at the system YAML file and hg19 needs to be installed. The hg19 unit test data only has GTF entries for items in chrMT and oncofuse does not have chrMT as one of the chromosomes. @chapmanb might have a suggestion for how to handle this better.
Ask the oncofuse authors if there is any plan on supporting anything other than hg19 or asking what has to happen to support a new genome.
If no fusions are detected, oncofuse will error out. So some checks for an empty fusion file from STAR or Tophat has to get added.
oncofuse will only work with hg19, so there needs to be some logic early on in the pipeline to detect if fusion_mode and something other than hg19 is turned on, and if it is, throw an error.
The oncofuse fusion gene calls should be saved in the upload directory.

At any rate the unit test for this passes for me but I thought I'd commit this and merge it in so we have a point to jump off of to smooth this all out.

tanglingfung · 2014-01-18T14:28:50Z

@roryk , sorry for my late response.

thanks for taking care of this. I am not very familiar with git either, let me try though

yes, suggestion on how to handle this will be good
I think only they intend to support human genome only, but I will ask
ok, i didn't seem to come across this case, but I will check
yes, and we may have more than 1 tool for transcript fusion annotation.
ok

Thanks for everything

tanglingfung · 2014-04-21T20:20:58Z

let me close this for now, and create a new PR with a few bugs fixed

tanglingfung added 4 commits December 31, 2013 14:34

initial oncofuse support

aeddcd5

add oncofuse.py

f4d3175

add oncofuse to bcbio_system.yaml

e5b9d6b

filled tissue input

a26d819

tanglingfung mentioned this pull request Jan 1, 2014

wish: fusion gene for RNA-Seq #210

Closed

tanglingfung added 2 commits January 1, 2014 13:41

use bcbio.providance.do.run function instead of using subprocess

e906b49

add oncofuse to provenance/program.py

f9c3cf2

chapmanb added a commit that referenced this pull request Jan 2, 2014

Use provenance do.run for snpEff effect prediction, redirecting stdou…

825d0f5

…t via the shell instead of subprocess #237

fixed cmd string input. Thanks Brad Chapman

d25fdbf

tanglingfung added 2 commits January 8, 2014 13:03

bug fixed, tophat2+oncofuse runs fine

8e1dcfa

run fusion detection as part of transcript counts

bd9d8e3

bug fixed, star+oncofuse runs ok now

fde02c9

typo

ed122fa

tanglingfung closed this Apr 21, 2014

tanglingfung mentioned this pull request Apr 22, 2014

bugs fixed for oncofuse #407

Merged

tanglingfung deleted the fusion branch April 27, 2014 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oncofuse #237

oncofuse #237

tanglingfung commented Jan 1, 2014

roryk commented Jan 1, 2014

tanglingfung commented Jan 1, 2014

chapmanb commented Jan 2, 2014

tanglingfung commented Jan 2, 2014

tanglingfung commented Jan 5, 2014

tanglingfung commented Jan 8, 2014

roryk commented Jan 9, 2014

tanglingfung commented Jan 10, 2014

roryk commented Jan 10, 2014

mjafin commented Jan 10, 2014

tanglingfung commented Jan 10, 2014

mjafin commented Jan 10, 2014

mjafin commented Jan 11, 2014

roryk commented Jan 12, 2014

tanglingfung commented Jan 12, 2014

roryk commented Jan 12, 2014

tanglingfung commented Jan 13, 2014

tanglingfung commented Jan 13, 2014

tanglingfung commented Jan 13, 2014

roryk commented Jan 14, 2014

tanglingfung commented Jan 14, 2014

roryk commented Jan 16, 2014

tanglingfung commented Jan 18, 2014

tanglingfung commented Apr 21, 2014

oncofuse #237

oncofuse #237

Conversation

tanglingfung commented Jan 1, 2014

roryk commented Jan 1, 2014

tanglingfung commented Jan 1, 2014

chapmanb commented Jan 2, 2014

tanglingfung commented Jan 2, 2014

tanglingfung commented Jan 5, 2014

tanglingfung commented Jan 8, 2014

roryk commented Jan 9, 2014

tanglingfung commented Jan 10, 2014

roryk commented Jan 10, 2014

mjafin commented Jan 10, 2014

tanglingfung commented Jan 10, 2014

mjafin commented Jan 10, 2014

mjafin commented Jan 11, 2014

roryk commented Jan 12, 2014

tanglingfung commented Jan 12, 2014

roryk commented Jan 12, 2014

tanglingfung commented Jan 13, 2014

tanglingfung commented Jan 13, 2014

tanglingfung commented Jan 13, 2014

roryk commented Jan 14, 2014

tanglingfung commented Jan 14, 2014

roryk commented Jan 16, 2014

tanglingfung commented Jan 18, 2014

tanglingfung commented Apr 21, 2014