Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine whether the BQSR plotting workflow still works #322

Closed
droazen opened this issue Mar 20, 2015 · 4 comments

Comments

@droazen
Copy link
Collaborator

@droazen droazen commented Mar 20, 2015

In the old GATK, when you wanted to produce pre/post recalibration plots, you would run BQSR twice, first normally and then a second time with the -BQSR table_from_first_run engine argument to produce a post-recalibration table, then feed both tables into AnalyzeCovariates.

Since the -BQSR engine-level argument is not present in hellbender (no on-the-fly recalibration), the equivalent hellbender workflow would seem to be "run BQSR, run ApplyBQSR, run BQSR on the recalibrated bam to produce a post-recalibration table, then feed both tables into AnalyzeCovariates". We need to verify that this workflow is equivalent to the old workflow described above.

@akiezun

This comment has been minimized.

Copy link
Contributor

@akiezun akiezun commented Apr 20, 2015

For reference, doc from the GATK3 best practices (https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_bqsr_AnalyzeCovariates.php)

 # Generate the first pass recalibration table file.
 java -jar GenomeAnalysisTK.jar \
      -T BaseRecalibrator \
      -R myreference.fasta \
      -I myinput.bam \
      -knownSites bundle/my-trusted-snps.vcf \ # optional but recommendable
      -knownSites bundle/my-trusted-indels.vcf \ # optional but recommendable
      ... other options
      -o firstpass.table

 # Generate the second pass recalibration table file.
 java -jar GenomeAnalysisTK.jar \
      -T BaseRecalibrator \
      -BQSR firstpass.table \
      -R myreference.fasta \
      -I myinput.bam \
      -knownSites bundle/my-trusted-snps.vcf \
      -knownSites bundle/my-trusted-indels.vcf \
      ... other options \
      -o secondpass.table

 # Finally generate the plots and also keep a copy of the csv (optional).
 java -jar GenomeAnalysisTK.jar \
      -T AnalyzeCovariates \
      -R myrefernce.fasta \
      -before firstpass.table \
      -after secondpass.table \
      -csv BQSR.csv \ # optional
      -plots BQSR.pdf
@akiezun

This comment has been minimized.

Copy link
Contributor

@akiezun akiezun commented Apr 20, 2015

Bacause AnalyzeCovariates just takes 2 tables and doesnt care who made the tables, all we need to test is that GATK3 pipeline: BaseRecalibrator --BQSR is the equal to GATK4 pipeline ApplyBQSR --bqsr followed by BaseRecalibrator on the resulting bam

@akiezun

This comment has been minimized.

Copy link
Contributor

@akiezun akiezun commented Apr 21, 2015

this is confirmed, the resulting tables are the same. Here's what i ran:

BAM="./src/test/resources/org/broadinstitute/hellbender/tools/BQSR/NA12878.chr17_69k_70k.dictFix.bam"
REF="./src/test/resources/human_g1k_v37.chr17_1Mb.fasta"
VCF="./src/test/resources/org/broadinstitute/hellbender/tools/BQSR/dbsnp_132.b37.excluding_sites_after_129.chr17_69k_70k.vcf"

java -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -R $REF --knownSites $VCF -I $BAM -o gatk3.pre.cols.table --sort_by_all_columns
java -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T BaseRecalibrator -R $REF --knownSites $VCF -I $BAM -BQSR gatk3.pre.cols.table -o gatk3.post.cols.table --sort_by_a\
ll_columns

bamOut=gatk4.recalibrated.bam
build/install/hellbender/bin/hellbender BaseRecalibrator -R $REF --knownSites $VCF -I $BAM -RECAL_TABLE_FILE gatk4.pre.cols.table --sort_by_all_columns true
build/install/hellbender/bin/hellbender ApplyBQSR -I $BAM --bqsr_recal_file gatk4.pre.cols.table -O $bamOut
build/install/hellbender/bin/hellbender BaseRecalibrator -R $REF --knownSites $VCF -I $bamOut -RECAL_TABLE_FILE gatk4.post.cols.table --sort_by_all_columns true

diff gatk3.post.cols.table gatk4.post.cols.table

The result is

18c18
< recalibration_report        /Users/akiezun/IdeaProjects/hellbender/gatk3.pre.cols.table
---
> recalibration_report        null

Which is expected because GATK4 does not know that a pre table was used. Integration test coming shortly.

akiezun added a commit that referenced this issue Apr 21, 2015
@akiezun akiezun added the in_review label Apr 24, 2015
akiezun added a commit that referenced this issue May 1, 2015
@akiezun

This comment has been minimized.

Copy link
Contributor

@akiezun akiezun commented May 3, 2015

done by #420

@akiezun akiezun closed this May 3, 2015
lbergelson pushed a commit that referenced this issue May 31, 2017
Updating to latest gatk to get bugfixes regarding spark cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.