Support loading multiple indexed read files #787

Closed
fnothaft opened this Issue Aug 20, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Aug 20, 2015

#732 added support for using index files to load a subset of a single BAM file. We should generalize this to allow parallel loads from multiple BAM files.

@jstjohn

This comment has been minimized.

Show comment
Hide comment
@jstjohn

jstjohn Aug 21, 2015

@fnothaft quick question about this and whether or not I should make a new issue: would this allow me to supply a tumor bam file and a normal bam file (seperate files) into avocado? One really nice thing you can do with GATK that I would really want here as a minor addition, is to pass a user supplied tag associated with the multiple input files that get appended to the reads. For example in GATK (and mutect) you can do this, which allows for my desired usage pattern:

[some args and other things] ... \
--input:normal some_file_normal.bam \
--input:tumor some_file_tumor_1.bam \
--input:tumor some_file_tumor_2.bam \
[other args] ...\

So you can see the same tags could be given to multiple files. Mutect treats the entire pool of files labeled tumor as a big tumor sample for the purpose of generating the noise file which is used to filter downstream sites (when they are not also cosmic sites).

It would be super lame if we had to first combine all bam files into a single file to do that analysis.

Anyhow let me know if I should make the labeling thing above a separate issue, or if that is ok to be appended to this issue?

jstjohn commented Aug 21, 2015

@fnothaft quick question about this and whether or not I should make a new issue: would this allow me to supply a tumor bam file and a normal bam file (seperate files) into avocado? One really nice thing you can do with GATK that I would really want here as a minor addition, is to pass a user supplied tag associated with the multiple input files that get appended to the reads. For example in GATK (and mutect) you can do this, which allows for my desired usage pattern:

[some args and other things] ... \
--input:normal some_file_normal.bam \
--input:tumor some_file_tumor_1.bam \
--input:tumor some_file_tumor_2.bam \
[other args] ...\

So you can see the same tags could be given to multiple files. Mutect treats the entire pool of files labeled tumor as a big tumor sample for the purpose of generating the noise file which is used to filter downstream sites (when they are not also cosmic sites).

It would be super lame if we had to first combine all bam files into a single file to do that analysis.

Anyhow let me know if I should make the labeling thing above a separate issue, or if that is ok to be appended to this issue?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 20, 2016

Member

Closing as dupe of #993

Member

fnothaft commented Jul 20, 2016

Closing as dupe of #993

@fnothaft fnothaft closed this Jul 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment