Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error due to interval.list instead of interval_list suffix #1479

Open
GATKSupportTeam opened this issue Mar 2, 2020 · 5 comments
Open

Error due to interval.list instead of interval_list suffix #1479

GATKSupportTeam opened this issue Mar 2, 2020 · 5 comments
Assignees

Comments

@GATKSupportTeam
Copy link
Collaborator

GATKSupportTeam commented Mar 2, 2020

Error due to interval.list instead of interval_list suffix

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360056676692-HS-PENALTY-20X-is-1-on-one-version-of-GATK-crashes-on-newest-version-

--

When running GATK 4.0.0.0 this works fine but I get a HS_PENALTY_20X of -1

It errors out on GATK v4.1.4.1

I assume a -1 for HS_PENALTY_20X is incorrect?

 

# CONFIRMING FILES EXIST

=================================

3.8G /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam
54M /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list
45M /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

# GATK VERSION

=================================

The Genome Analysis Toolkit (GATK) v4.1.4.1
HTSJDK Version: 2.21.0
Picard Version: 2.21.2
Using GATK jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar --version

 

# GATK COMMAND

gatk CollectHsMetrics --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

=================================

22:29:20.348 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Feb 03 22:29:20 EST 2020] CollectHsMetrics --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --METRIC_ACCUMULATION_LEVEL ALL_READS --NEAR_DISTANCE 250 --MINIMUM_MAPPING_QUALITY 20 --MINIMUM_BASE_QUALITY 20 --CLIP_OVERLAPPING_READS true --INCLUDE_INDELS false --COVERAGE_CAP 200 --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Feb 03, 2020 10:29:20 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Mon Feb 03 22:29:20 EST 2020] Executing as nowackj1@ridus004.ind.roche.com on Linux 3.10.0-1062.1.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.4.1
[Mon Feb 03 22:29:20 EST 2020] picard.analysis.directed.CollectHsMetrics done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2972712960
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.SAMException: Cannot read non-existent file: file:///data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/@hd%09VN:1.4%09SO:unsorted
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:498)
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:485)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:115)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data1/BIOINFORMATICS/SOFTWARE/ANACONDA_JN/MINI-CONDA/envs/gatk-newest/share/gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar CollectHsMetrics --INPUT /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3f_MARK_DUPLICATES_FALSE/19065WBC_fixmate_novosort_dupsrmFalse.bam --OUTPUT TEMP_NEW/19065WBC_fixmate_novosort_dupsrm.bam_hs_metrics.txt --BAIT_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.bait.interval.list --TARGET_INTERVALS /data/BIOINFORMATICS/PROJECT_PROD_JN/CAS-0010688777/snakemake-ez-dpops/R3d_INTERVALS/19065WBC_R1.target.interval.list

(created from Zendesk ticket #4552)
gz#4552

@bhanugandham bhanugandham changed the title Error due to interval.list instead of interval_list suffix Error due to interval.list instead of interval_list suffix Mar 2, 2020
@bhanugandham
Copy link
Contributor

Hi @lbergelson as discussed during the office hrs, I created this issue ticket to brainstorm ideas around how to check for either "@"/"#" identifiers in the interval list file.

@yfarjoun
Copy link
Contributor

I think that this is a GATK (not picard) problem.

@whaleberg
Copy link

It's a barclay problem. We patched barclay to add a warning in the case of an incorrectly labelled interval.list file which should mitigate it. Waiting on a barclay release though.

@lbergelson
Copy link
Member

Huh, a mysterious stranger with insight into the problem. Lets all forget about whoever that person may be. I'm pretty sure they're correct in their assessment though..

@yfarjoun
Copy link
Contributor

yfarjoun commented Mar 16, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants