Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bam's as input to treeshop #8

Closed
rcurrie opened this issue Jan 9, 2018 · 5 comments
Closed

Support bam's as input to treeshop #8

rcurrie opened this issue Jan 9, 2018 · 5 comments

Comments

@rcurrie
Copy link
Contributor

rcurrie commented Jan 9, 2018

@hbeale I've added basic bam to fastq conversion:

                    docker run --rm \
                      -v /mnt/samples:/samples \
                      quay.io/ucsc_cgl/samtools:1.5--98b58ba05641ee98fa98414ed28b53ac3048bc09 \
                      fastq -1 /samples/{0}.R1.fq.gz -2 /samples/{0}.R2.fq.gz /samples/{1}

(Same method as used in cgl-rnaseq)

Treeshop will make the conversion and copy the resulting fastq's back to derived under archive for posterity and then proceed with rnaseq etc....

@hbeale
Copy link
Contributor

hbeale commented Jan 9, 2018 via email

@rcurrie
Copy link
Contributor Author

rcurrie commented Jan 9, 2018

Hmmm...output isn't matching but maybe its my fastq -> bam:

docker run -it --rm -v pwd/samples:/data broadinstitute/picard FastqToSam F1=/data/TEST_R1.fastq.gz F2=/data/TEST_R2.fastq.gz O=/data/TEST.bam SM=TEST001 RG=rg0000

converting this bam back to fastq via samtools, then through rnaseq, then umend and the readDist.txt differs.

@hbeale is it reasonable that these should be identical:

fastqs -> rnaseq sorted.bam output -> umend

fastqs -> picard bam -> samtools to fastq -> rnaseq sorted.bam -> umend

?

< Total Reads                   3416
< Total Tags                    4133
< Total Assigned Tags           3922
---
> Total Reads                   1626
> Total Tags                    2050
> Total Assigned Tags           1978
6,15c6,15
< CDS_Exons           37671772            2792                0.07
< 5'UTR_Exons         18392664            219                 0.01
< 3'UTR_Exons         46333687            734                 0.02
< Introns             1419121300          155                 0.00
< TSS_up_1kb          26926674            2                   0.00
< TSS_up_5kb          121398195           9                   0.00
< TSS_up_10kb         221886368           18                  0.00
< TES_down_1kb        28738628            0                   0.00
< TES_down_5kb        125348902           2                   0.00
< TES_down_10kb       224262488           4                   0.00
---
> CDS_Exons           37671772            1230                0.03
> 5'UTR_Exons         18392664            58                  0.00
> 3'UTR_Exons         46333687            486                 0.01
> Introns             1419121300          167                 0.00
> TSS_up_1kb          26926674            3                   0.00
> TSS_up_5kb          121398195           3                   0.00
> TSS_up_10kb         221886368           3                   0.00
> TES_down_1kb        28738628            8                   0.00
> TES_down_5kb        125348902           30                  0.00
> TES_down_10kb       224262488           34                  0.00

@rcurrie
Copy link
Contributor Author

rcurrie commented Jan 9, 2018

converted bam in the develop branch:

https://github.com/UCSC-Treehouse/pipelines/tree/develop/samples

@hbeale
Copy link
Contributor

hbeale commented Jan 9, 2018 via email

@rcurrie
Copy link
Contributor Author

rcurrie commented Jan 16, 2018

Verified via notebook that round trip fastq -> bam -> fastq matches at the read level. Also verified that bam -> fastq generates the exact same secondary output as the original fastq's. Samtools in cgl docker used for the actual conversion. Will run a CHOC sample to finalize this improvement.

@rcurrie rcurrie closed this as completed Apr 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants