Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with BQSR maximum cycle value #129

Open
irzamsarfraz opened this issue Nov 16, 2023 · 1 comment
Open

Issue with BQSR maximum cycle value #129

irzamsarfraz opened this issue Nov 16, 2023 · 1 comment

Comments

@irzamsarfraz
Copy link

Hi,

Thanks for creating this useful pipeline!

I am running the pipeline using a singularity container on some long read sequencing data. It seems to work well with smaller input fastq files but as soon as the files tend to become larger, the workflow abruptly ends.

Below is the ending portion of the output:

01:22:26.587 INFO PrintReads - ------------------------------------------------------------
01:22:26.588 INFO PrintReads - The Genome Analysis Toolkit (GATK) v4.1.9.0
01:22:26.588 INFO PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/
01:22:26.588 INFO PrintReads - Executing as isarfraz@scc-gn3 on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64
01:22:26.588 INFO PrintReads - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04
01:22:26.588 INFO PrintReads - Start Date/Time: November 16, 2023 at 1:22:26 AM GMT
01:22:26.588 INFO PrintReads - ------------------------------------------------------------
01:22:26.588 INFO PrintReads - ------------------------------------------------------------
01:22:26.588 INFO PrintReads - HTSJDK Version: 2.23.0
01:22:26.589 INFO PrintReads - Picard Version: 2.23.3
01:22:26.589 INFO PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:22:26.589 INFO PrintReads - Deflater: IntelDeflater
01:22:26.589 INFO PrintReads - Inflater: IntelInflater
01:22:26.589 INFO PrintReads - GCS max retries/reopens: 20
01:22:26.589 INFO PrintReads - Requester pays: disabled
01:22:26.589 INFO PrintReads - Initializing engine
01:22:26.671 INFO PrintReads - Done initializing engine
01:22:26.685 INFO ProgressMeter - Starting traversal
01:22:26.685 INFO ProgressMeter - Current Locu
[2023-11-16 01:22:34,58] [info] WorkflowManagerActor WorkflowActor-1318276e-f6f8-460f-bd69-6fd7a1ef3d3b is in a terminal state: WorkflowFailedState
[2023-11-16 01:23:01,22] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2023-11-16 01:23:06,05] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmple2zpd52.json
[2023-11-16 01:23:06,08] [info] Workflow polling stopped
[2023-11-16 01:23:06,09] [info] 0 workflows released by cromid-0f089e9
[2023-11-16 01:23:06,09] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2023-11-16 01:23:06,10] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2023-11-16 01:23:06,10] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2023-11-16 01:23:06,10] [info] Aborting all running workflows.
[2023-11-16 01:23:06,10] [info] JobExecutionTokenDispenser stopped
[2023-11-16 01:23:06,10] [info] WorkflowStoreActor stopped
[2023-11-16 01:23:06,11] [info] WorkflowLogCopyRouter stopped
[2023-11-16 01:23:06,11] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2023-11-16 01:23:06,11] [info] WorkflowManagerActor All workflows finished
[2023-11-16 01:23:06,11] [info] WorkflowManagerActor stopped
[2023-11-16 01:23:06,27] [info] Connection pools shut down
[2023-11-16 01:23:06,28] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] SubWorkflowStoreActor stopped
[2023-11-16 01:23:06,28] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2023-11-16 01:23:06,28] [info] JobStoreActor stopped
[2023-11-16 01:23:06,28] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2023-11-16 01:23:06,28] [info] KvWriteActor Shutting down: 0 queued messages to process
[2023-11-16 01:23:06,28] [info] CallCacheWriteActor stopped
[2023-11-16 01:23:06,28] [info] IoProxy stopped
[2023-11-16 01:23:06,28] [info] ServiceRegistryActor stopped
[2023-11-16 01:23:06,29] [info] DockerHashActor stopped
[2023-11-16 01:23:06,30] [info] Database closed
[2023-11-16 01:23:06,30] [info] Stream materializer shut down
[2023-11-16 01:23:06,31] [info] WDL HTTP import resolver closed
Workflow 1318276e-f6f8-460f-bd69-6fd7a1ef3d3b transitioned to state Failed

Investigating a little further, I found that it is the BQSR call that causes the workflow to fail. Looking at the stderr of this call, it seems that as the input data becomes larger, the default value of 20000 for the argument --maximum-cycle-value is not enough. Please see below, the output from the stderr:

01:22:29.268 INFO ApplyBQSR - ------------------------------------------------------------
01:22:29.269 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
01:22:29.269 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/
01:22:29.269 INFO ApplyBQSR - Executing as isarfraz@scc-gn3 on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64
01:22:29.269 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04
01:22:29.269 INFO ApplyBQSR - Start Date/Time: November 16, 2023 at 1:22:29 AM GMT
01:22:29.269 INFO ApplyBQSR - ------------------------------------------------------------
01:22:29.269 INFO ApplyBQSR - ------------------------------------------------------------
01:22:29.270 INFO ApplyBQSR - HTSJDK Version: 2.23.0
01:22:29.270 INFO ApplyBQSR - Picard Version: 2.23.3
01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:22:29.270 INFO ApplyBQSR - Deflater: IntelDeflater
01:22:29.270 INFO ApplyBQSR - Inflater: IntelInflater
01:22:29.270 INFO ApplyBQSR - GCS max retries/reopens: 20
01:22:29.270 INFO ApplyBQSR - Requester pays: disabled
01:22:29.270 INFO ApplyBQSR - Initializing engine
01:22:29.382 INFO ApplyBQSR - Done initializing engine
01:22:29.395 INFO ProgressMeter - Starting traversal
01:22:29.396 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
01:22:31.784 INFO ApplyBQSR - Shutting down engine
[November 16, 2023 at 1:22:31 AM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2147483648


A USER ERROR has occurred: The maximum allowed value for the cycle is 20000, but a larger cycle (20001) was detected. Please use the --maximum-cycle-value argument (when creating the recalibration table in BaseRecalibrator) to increase this value (at the expense of requiring more memory to run)

However, I can't really find a way to use this argument to manually specify a larger value. Using the --no_bqsr argument with the singularity image execution does seem to eliminate this issue and produces the output. I am not fully sure if excluding BQSR would have significant effect on the output produced. Also, using smaller fastq files does not seem to produce this issue.

I am hoping if there is a potential hack/workaround to manually specify this param while using the singularity image.

I use the following code for execution:

singularity exec -e -B ${DATA_FOLDER}:${DATA_FOLDER} -B /tmp:/tmp \
          -B genome_path/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro  \
            ctat_mutations_latest.sif \
           /usr/local/src/ctat-mutations/ctat_mutations \
           --left ${DATA_FOLDER}/some_reads.fastq.gz \
           --is_long_reads \
           --sample_id test \
           --output ${DATA_FOLDER}/example.HC.singularity \
           --cpu 10 \
           --genome_lib_dir /ctat_genome_lib_dir

Thanks for looking into this!

Best,
Irzam

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Nov 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants