Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oncofuse #237

Closed
wants to merge 11 commits into from
Closed

oncofuse #237

wants to merge 11 commits into from

Conversation

tanglingfung
Copy link
Contributor

haven't tested the script yet but want to check if the style is consistent

happy new year

@roryk
Copy link
Collaborator

roryk commented Jan 1, 2014

Hi Paul,

Looks good-- you might want to look at bcbio.providance.do.run function instead of using subprocess directly; if you do that you will get all the cool logging of the command line stuff for free.

@tanglingfung
Copy link
Contributor Author

thanks. I just followed the implementation of snpEff. if it's implemented appropriately, I can change snpEff's as well.

now, I am trapped here chapmanb/cloudbiolinux#139. I need to install everything to my new machine to test the script

chapmanb added a commit that referenced this pull request Jan 2, 2014
@chapmanb
Copy link
Member

chapmanb commented Jan 2, 2014

Paul;
Thanks much for this work. I pushed a fix to snpEff to use provenance do.run so it would be a better example. Apologies, I missed this one when updating to the newer framework.

In terms of installation, bcbio-nextgen is not well tested with MacOSX installation. We have some of the pieces in place but really haven't gone through a full installation. I expect this to be a bit of work due to making things compile. If you want to test on a Mac machine, it might be worth using a virtual machine. Rory put together instructions here:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/installation.html#on-a-virtual-machine

We're currently moving towards a VM and docker container approach to try to avoid needing to get all this software cleanly compiling on multiple systems. Hope this helps with testing things out.

@tanglingfung
Copy link
Contributor Author

Thanks Brad. Just that my linux machine is not ready yet, I will move back to centos soon. Appreciate all the helps. The docker container approach looks very nice.

@tanglingfung
Copy link
Contributor Author

sorry that i cannot test till Monday

@tanglingfung
Copy link
Contributor Author

@roryk, I just checked, tophat2+oncofuse should run ok now; I put fusion detection as part of transcript counts, not sure if it's ok with you

@roryk
Copy link
Collaborator

roryk commented Jan 9, 2014

Hi Paul,

Thanks for the pull request-- do you have anything like a very small test dataset that we could add that should have a fusion event in it so I can add a unit test? We haven't really used this type of data much so we have nothing.

@tanglingfung
Copy link
Contributor Author

@roryk , I personally don't. I was looking into the SRA, and we may use the following study as test. Let me know if it's ok, and I can prepare a template. We may add the sra toolkit such that the pipeline can start with .sra file

http://sra.dnanexus.com/studies/SRP028528

@roryk
Copy link
Collaborator

roryk commented Jan 10, 2014

Sure-- sounds good! Thanks Paul for taking the lead on this.

@mjafin
Copy link
Contributor

mjafin commented Jan 10, 2014

Hi Paul,
At least EBI offers most of these files as fastq.gz directly, see e.g. ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR948/SRR948091
(first of the samples in the study you quote). I'm not a big fan of the sra toolkit personally as it used to be very unreliable and buggy.

@tanglingfung
Copy link
Contributor Author

Thanks Miika! I wanted to find it at EBI but couldn't find it at first. Thanks for the link. Should you have a better dataset, please let me know, and you may test if I implemented oncofuse correctly. Thanks.

@mjafin
Copy link
Contributor

mjafin commented Jan 10, 2014

There is also a simulated data set available from the authors of FusionMap, mentioned in this publication http://www.biomedcentral.com/content/pdf/1471-2105-14-S7-S2.pdf on page 9.
I think it's somewhere here: http://www.arrayserver.com/wiki/index.php?title=FusionMap but I'll check a bit later..

@mjafin
Copy link
Contributor

mjafin commented Jan 11, 2014

There is some test data in the FusionMap bundle:
http://www.omicsoft.com/fusionmap/Software/FusionMap_2014-01-01.zip

@roryk
Copy link
Collaborator

roryk commented Jan 12, 2014

Hi Paul and Miika,

Thanks for all your work; let us know how the test data looks. Having a nice unit test and a good idea if it is working properly for this would be great before the merge.

@tanglingfung
Copy link
Contributor Author

Sure Rory.

I have been trying the simulated dataset in fusionmap, but there is an
except caught at FastqToSam (message copied below). I am trying to put the
provided bam as input and see if that makes things easier.

Paul

Exception in thread "main" net.sf.picard.PicardException: In paired mode,
read name 1 (R_1:1) does not match read name 2 (R_1:2)

at net.sf.picard.sam.FastqToSam.getBaseName(FastqToSam.java:291)

at net.sf.picard.sam.FastqToSam.doPaired(FastqToSam.java:187)

at net.sf.picard.sam.FastqToSam.doWork(FastqToSam.java:140)

at
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)

at net.sf.picard.sam.FastqToSam.main(FastqToSam.java:118)

On Sun, Jan 12, 2014 at 2:58 PM, Rory Kirchner notifications@github.comwrote:

Hi Paul and Miika,

Thanks for all your work; let us know how the test data looks. Having a
nice unit test and a good idea if it is working properly for this would be
great before the merge.


Reply to this email directly or view it on GitHubhttps://github.com//pull/237#issuecomment-32137184
.

@roryk
Copy link
Collaborator

roryk commented Jan 12, 2014

Hi Paul,

Looks like the :1 and :2 identifiers are throwing it off, they should either be /1 and /2 to denote which member of the pair it is, or the Illumina style 1:Y:18:ATCACG style:

http://en.wikipedia.org/wiki/FASTQ_format

@tanglingfung
Copy link
Contributor Author

Thanks for the hint. It works, but oncofuse didn't report the right answer,
looking into it.

Paul

On Sun, Jan 12, 2014 at 3:58 PM, Rory Kirchner notifications@github.comwrote:

Hi Paul,

Looks like the :1 and :2 identifiers are throwing it off, they should
either be /1 and /2 to denote which member of the pair it is, or the
Illumina style 1:Y:18:ATCACG style:

http://en.wikipedia.org/wiki/FASTQ_format


Reply to this email directly or view it on GitHubhttps://github.com//pull/237#issuecomment-32138706
.

@tanglingfung
Copy link
Contributor Author

@roryk , the script runs ok (using the test dataset in fusionMap) on hg19 reference after changing the :1 and :2 identifiers to /1 and /2. It doesn't work on GRCh37 unless I take out 'chr' from refseq annotation in oncofuse package: oncofuse/common/refseq_full.txt

what would you recommend in fixing these two issues?

P.S. the other example takes too long as a unit test

@tanglingfung
Copy link
Contributor Author

I copied my sample config here: https://gist.github.com/tanglingfung/8409280

@roryk
Copy link
Collaborator

roryk commented Jan 14, 2014

Hi Paul,

This is great! Thank you so much for all of your work. We're going to roll a release tomorrow morning and after that I'll work on testing this and merging it in as part of the next release.

@tanglingfung
Copy link
Contributor Author

@roryk , no problem, I haven't tested with the latest updates, and star index is generated every time when i debug my code

@roryk
Copy link
Collaborator

roryk commented Jan 16, 2014

Hi Paul,

I merged this in here: 1502ebd. I set you as the author since it was mostly your code, I'm sorry I'm not good enough at git to know how to add my commits on top of yours and also clean up the hhistory, so I merged them all together and set you to be the author.

There is some work that needs to be done to get this production ready. Some help on these items would be awesome.

  1. One problem is the unit test needs to be pointed at the system YAML file and hg19 needs to be installed. The hg19 unit test data only has GTF entries for items in chrMT and oncofuse does not have chrMT as one of the chromosomes. @chapmanb might have a suggestion for how to handle this better.
  2. Ask the oncofuse authors if there is any plan on supporting anything other than hg19 or asking what has to happen to support a new genome.
  3. If no fusions are detected, oncofuse will error out. So some checks for an empty fusion file from STAR or Tophat has to get added.
  4. oncofuse will only work with hg19, so there needs to be some logic early on in the pipeline to detect if fusion_mode and something other than hg19 is turned on, and if it is, throw an error.
  5. The oncofuse fusion gene calls should be saved in the upload directory.

At any rate the unit test for this passes for me but I thought I'd commit this and merge it in so we have a point to jump off of to smooth this all out.

@tanglingfung
Copy link
Contributor Author

@roryk , sorry for my late response.

thanks for taking care of this. I am not very familiar with git either, let me try though

  1. yes, suggestion on how to handle this will be good
  2. I think only they intend to support human genome only, but I will ask
  3. ok, i didn't seem to come across this case, but I will check
  4. yes, and we may have more than 1 tool for transcript fusion annotation.
  5. ok

Thanks for everything

@tanglingfung
Copy link
Contributor Author

let me close this for now, and create a new PR with a few bugs fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants