matched_variants.txt.gz empty when pgsc_calc is ran with many scores, resulting in KeyError: 'CHR:POS:A0:A1' and pgscatalog-intersect failure. #361

Fiwx · 2024-08-13T22:12:07Z

Description of the bug

reference_variants.txt.gz is empty, containing only the header. This problem did not occur when running ~30 scores, but it occur when running ~100 scores. This causes the pgscatalog-intersect step to crash.

Command used and terminal output

Aug-13 16:22:29.056 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
Aug-13 16:22:29.056 [main] DEBUG nextflow.Session - Session await
Aug-13 16:22:29.738 [Actor Thread 45] DEBUG nextflow.container.SingularityCache - Singularity found local store for image=oras://ghcr.io/pgscatalog/zstd:2-beta-singularity; path=/home/user/singularity_containers/ghcr.io-pgscatalog-zstd-2-beta-singularity.img
Aug-13 16:22:29.740 [Actor Thread 23] INFO  nextflow.container.SingularityCache - Pulling Singularity image oras://ghcr.io/pgscatalog/pygscatalog:pgscatalog-utils-1.3.1-singularity [cache /home/user/singularity_containers/ghcr.io-pgscatalog-pygscatalog-pgscatalog-utils-1.3.1-singularity.img]
Aug-13 16:22:29.753 [Actor Thread 45] DEBUG nextflow.container.SingularityCache - Singularity found local store for image=oras://ghcr.io/pgscatalog/plink2:2.00a5.10-singularity; path=/home/user/singularity_containers/ghcr.io-pgscatalog-plink2-2.00a5.10-singularity.img
Aug-13 16:22:30.248 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Aug-13 16:22:30.263 [Task submitter] INFO  nextflow.Session - [33/dec25d] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (file127 chromosome ALL)
Aug-13 16:22:30.288 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Aug-13 16:22:30.295 [Task submitter] INFO  nextflow.Session - [66/1acfd2] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE(1)
Aug-13 16:22:44.159 [Actor Thread 23] DEBUG nextflow.container.SingularityCache - Singularity pull complete image=oras://ghcr.io/pgscatalog/pygscatalog:pgscatalog-utils-1.3.1-singularity path=/home/user/singularity_containers/ghcr.io-pgscatalog-pygscatalog-pgscatalog-utils-1.3.1-singularity.img
Aug-13 16:23:52.176 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE (1); status: COMPLETED; exit: 0; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/66/1acfd293fb7055df2f78d51cb2c65c]
Aug-13 16:23:52.180 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=12; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Aug-13 16:23:52.668 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Process PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE > Skipping output binding because one or more optional files are missing: fileoutparam<0:3>
Aug-13 16:23:52.669 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Process PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE > Skipping output binding because one or more optional files are missing: fileoutparam<1:1>
Aug-13 16:24:19.838 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF (file127 chromosome ALL); status: COMPLETED; exit: 0; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/33/dec25de8d4611e2a7081fff57c3fe6]
Aug-13 16:24:19.921 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Aug-13 16:24:19.929 [Task submitter] INFO  nextflow.Session - [45/5a701e] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)
Aug-13 16:24:20.184 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
Aug-13 16:24:20.199 [Task submitter] INFO  nextflow.Session - [a1/025735] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL)
Aug-13 16:27:26.583 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 2 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
~> TaskHandler[id: 4; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL);status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344]
Aug-13 16:32:26.659 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 2 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
~> TaskHandler[id: 4; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL);status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344]
Aug-13 16:37:26.755 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 2 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
~> TaskHandler[id: 4; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL);status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344]
Aug-13 16:42:26.779 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 2 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
~> TaskHandler[id: 4; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL);status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344]
Aug-13 16:44:17.528 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 4; name: PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL); status: COMPLETED; exit: 1; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344]
Aug-13 16:44:17.539 [TaskFinalizer-3] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL); work-dir=/home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344
  error [nextflow.exception.ProcessFailedException]: Process `PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL)` terminated with an error exit status (1)
Aug-13 16:44:17.620 [TaskFinalizer-3] ERROR nextflow.processor.TaskProcessor - Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS (file127 chromosome ALL)` terminated with an error exit status (1)


Command executed:

  pgscatalog-intersect --ref GRCh37_1000G_ALL.pvar.zst         --target GRCh37_file127_ALL.pvar.zst         --chrom ALL         --maf_target 0.0         --geno_miss 0.1         --outdir .         -v

  n_matched=$(sed -n '3p' intersect_counts_ALL.txt)

  if [ $n_matched == "0" ]
  then
      echo "ERROR: No variants in intersection"
      exit 1
  else
      mv matched_variants.txt.gz file127_ALL_matched.txt.gz
  fi

  cat <<-END_VERSIONS > versions.yml
  INTERSECT_VARIANTS:
      pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:40:38 INFO     Processed 69000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:40:45 INFO     Processed 69500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:40:52 INFO     Processed 70000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:40:59 INFO     Processed 70500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:07 INFO     Processed 71000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:13 INFO     Processed 71500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:20 INFO     Processed 72000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:27 INFO     Processed 72500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:33 INFO     Processed 73000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:40 INFO     Processed 73500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:47 INFO     Processed 74000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:41:53 INFO     Processed 74500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:00 INFO     Processed 75000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:07 INFO     Processed 75500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:15 INFO     Processed 76000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:21 INFO     Processed 76500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:28 INFO     Processed 77000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:35 INFO     Processed 77500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:42 INFO     Processed 78000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:49 INFO     Processed 78500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:42:55 INFO     Processed 79000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:03 INFO     Processed 79500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:09 INFO     Processed 80000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:17 INFO     Processed 80500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:24 INFO     Processed 81000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:31 INFO     Processed 81500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:38 INFO     Processed 82000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:45 INFO     Processed 82500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:52 INFO     Processed 83000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:43:58 INFO     Processed 83500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:44:05 INFO     Processed 84000000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:44:13 INFO     Processed 84500000 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:44:17 INFO     Processed 84805772 REFERENCE variants
  pgscatalog.match.cli.intersect_cli: 2024-08-13 16:44:17 INFO     Outputting REFERNCE variants -> reference_variants.txt.gz
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-intersect", line 8, in <module>
      sys.exit(run_intersect())
               ^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/intersect_cli.py", line 85, in run_intersect
      for v in heapq.merge(
    File "/usr/local/lib/python3.11/heapq.py", line 376, in merge
      h_append([key(value), order * direction, value, next])
                ^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/intersect_cli.py", line 87, in <lambda>
      key=lambda v: (v["CHR:POS:A0:A1"], v["ID_REF"], v["REF_REF"]),
                     ~^^^^^^^^^^^^^^^^^
  KeyError: 'CHR:POS:A0:A1'
  cp: '.command.out' and '.command.out' are the same file
  cp: '.command.err' and '.command.err' are the same file
  cp: '.command.trace' and '.command.trace' are the same file

Work dir:
  /home/user/runner/core/test/file_127.ancestry/work/a1/0257359fca6824f76e34134b849344

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Aug-13 16:44:17.692 [TaskFinalizer-3] INFO  nextflow.Session - Execution cancelled -- Finishing pending tasks before exit
Aug-13 16:44:17.753 [main] DEBUG nextflow.Session - Session await > all processes finished
Aug-13 16:44:17.867 [Actor Thread 56] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:PLINK2_MAKEBED_TARGET; work-dir=null
  error [java.lang.InterruptedException]: java.lang.InterruptedException
Aug-13 16:44:17.895 [Actor Thread 52] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:ANCESTRY_ANALYSIS; work-dir=null
  error [java.lang.InterruptedException]: java.lang.InterruptedException
Aug-13 16:44:17.981 [Actor Thread 60] DEBUG nextflow.sort.BigSort - Sort completed -- entries: 2; slices: 1; internal sort time: 0.064 s; external sort time: 0.002 s; total time: 0.066 s
Aug-13 16:44:17.993 [Actor Thread 60] DEBUG nextflow.file.FileCollector - Saved collect-files list to: /home/user/runner/core/test/file_127.ancestry/work/collect-file/a952058c44a348251f3ab72a876aec7b
Aug-13 16:44:18.016 [Actor Thread 60] DEBUG nextflow.file.FileCollector - Deleting file collector temp dir: /tmp/nxf-10103040858170102137
Aug-13 16:47:26.843 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
Aug-13 16:52:26.939 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
Aug-13 16:57:26.979 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below
~> TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: RUNNING; exit: -; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
Aug-13 17:00:08.884 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1); status: COMPLETED; exit: 0; error: -; workDir: /home/user/runner/core/test/file_127.ancestry/work/45/5a701ed3fa296084261a1f939cc266]
Aug-13 17:00:08.887 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
Aug-13 17:00:08.889 [main] DEBUG nextflow.Session - Session await > all barriers passed
Aug-13 17:00:08.919 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'TaskFinalizer' shutdown completed (hard=false)
Aug-13 17:00:08.988 [main] INFO  nextflow.Nextflow - -[pgscatalog/pgsc_calc] Pipeline completed with errors-
Aug-13 17:00:09.021 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=3; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=1h 17m 52s; failedDuration=19m 57s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=2; peakCpus=4; peakMemory=24 GB; ]
Aug-13 17:00:09.021 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file
Aug-13 17:00:09.027 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Aug-13 17:00:10.088 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline
Aug-13 17:00:10.480 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Aug-13 17:00:10.532 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-prov@1.2.2'
Aug-13 17:00:10.532 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-prov
Aug-13 17:00:10.532 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-schema@2.0.0'
Aug-13 17:00:10.532 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-schema
Aug-13 17:00:10.533 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Aug-13 17:00:10.555 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Code:
https://github.com/PGScatalog/pygscatalog/blob/main/pgscatalog.match/src/pgscatalog/match/cli/intersect_cli.py

    with xopen(outdir / "reference_variants.txt.gz", "wt") as outf:
        outf.write("CHR:POS:A0:A1\tID_REF\tREF_REF\tIS_INDEL\tSTRANDAMB\tIS_MA_REF\n")
        for v in heapq.merge(
            *[read_var_general(x) for x in o_tmp_r],
            key=lambda v: (v["CHR:POS:A0:A1"], v["ID_REF"], v["REF_REF"]),
        ):
            outf.write("\t".join(v.values()) + "\n")

Since reference_variants is empty, the error makes sense.

Data Processing: The script successfully processed 84,805,772 reference variants and wrote them to temporary files. The temporary files were correctly written with the expected structure, including the "CHR:POS:A0:A1" column.

Idea:
When attempting to merge these temporary files and write to the final output, the system ran out of available RAM. This memory exhaustion caused the heapq.merge operation to fail silently, right after the file is opened and the header is written. As a result, only the header was written to the reference_variants.txt.gz file before the process was interrupted. Inspecting the Nextflow run did not confirm or reject this idea. Perhaps it needs more than 4 GB?

PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF task:
- Allocated memory: 16 GB (17179869184 bytes)
- Peak memory usage: 2.5 GB (2715578368 bytes) RSS, 16.3 GB (17425510400 bytes) VMEM
PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:EXTRACT_DATABASE task:
- Allocated memory: 8 GB (8589934592 bytes)
- Peak memory usage: 9.6 MB (10066944 bytes) RSS
PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES task:
- Allocated memory: 16 GB (17179869184 bytes)
- Peak memory usage: 9.5 GB (10200547328 bytes) RSS, 9.7 GB (10415800320 bytes) VMEM
PGSCATALOG_PGSCCALC:PGSCCALC:ANCESTRY_PROJECT:INTERSECT_VARIANTS task:
- Allocated memory: 4 GB (4294967296 bytes)
- Memory usage: Not available (task failed)
- This task failed with exit code 1, so no memory usage was recorded.



### Relevant files

reference_variants.txt only contains only the following:

CHR:POS:A0:A1 ID_REF REF_REF IS_INDEL STRANDAMB IS_MA_REF



=== Contents of reference_variants.txt ===
CHR:POS:A0:A1   ID_REF  REF_REF IS_INDEL        STRANDAMB       IS_MA_REF

=== Contents of GRCh37_1000G_ALL.psam ===
#IID    PAT     MAT     SEX     SuperPop        Population
HG00096 0       0       1       EUR     GBR
HG00097 0       0       2       EUR     GBR
HG00099 0       0       2       EUR     GBR
HG00100 0       0       2       EUR     GBR

=== Contents of GRCh37_file127_ALL.afreq.gz ===
#CHROM  ID      REF     ALT     ALT_FREQS       OBS_CT
1       1:10642:G:A     G       A       0       2
1       1:11008:C:G     C       G       0       2
1       1:11012:C:G     C       G       0       2
1       1:11063:T:G     T       G       0       2

=== Contents of GRCh37_file127_ALL.vmiss.gz ===
#ID     F_MISS_DOSAGE   F_MISS
1:10642:G:A     0       0
1:11008:C:G     0       0
1:11012:C:G     0       0
1:11063:T:G     0       0

=== Contents of GRCh37_1000G_ALL.pvar.zst ===
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##contig=<ID=1,assembly=b37,length=249250621>
##contig=<ID=2,assembly=b37,length=243199373>
##contig=<ID=3,assembly=b37,length=198022430>
##contig=<ID=4,assembly=b37,length=191154276>

=== Contents of GRCh37_file127_ALL.pvar.zst ===
#CHROM  POS     ID      REF     ALT
1       10642   1:10642:G:A     G       A
1       11008   1:11008:C:G     C       G
1       11012   1:11012:C:G     C       G
1       11063   1:11063:T:G     T       G

--- Contents of GRCh37_1000G_ALL.psam ---
#IID    PAT     MAT     SEX     SuperPop        Population
HG00096 0       0       1       EUR     GBR
HG00097 0       0       2       EUR     GBR
HG00099 0       0       2       EUR     GBR
HG00100 0       0       2       EUR     GBR

--- Contents of GRCh37_file127_ALL.psam ---
#IID    SEX
file_127.ancestry.txt NA

=== Contents of /tmp/tmpjo6asoom/tmpchbm3n41 ===
CHR:POS:A0:A1   ID_REF  REF_REF IS_INDEL        STRANDAMB       IS_MA_REF
1:10000006:A:G  1:10000006:G:A  G       False   False   False
1:10000020:A:T  1:10000020:T:A  T       False   True    False
1:10000072:C:T  1:10000072:C:T  C       False   False   False
1:10000143:C:T  1:10000143:C:T  C       False   False   False
1:10000160:C:G  1:10000160:G:C  G       False   True    False
1:10000179:A:AAAAAAAC   1:10000179:AAAAAAAC:A   AAAAAAAC        True    False   False
1:10000185:A:C  1:10000185:A:C  A       False   False   False
1:10000186:C:G  1:10000186:C:G  C       False   True    False
1:10000228:C:T  1:10000228:T:C  T       False   False   False
1:10000236:C:T  1:10000236:T:C  T       False   False   False
1:10000283:A:G  1:10000283:G:A  G       False   False   False
1:10000302:A:T  1:10000302:T:A  T       False   True    False
1:10000320:C:T  1:10000320:C:T  C       False   False   False
1:10000327:C:T  1:10000327:C:T  C       False   False   False
1:10000354:C:T  1:10000354:C:T  C       False   False   False
1:10000371:A:T  1:10000371:A:T  A       False   True    False
1:1000037:A:G   1:1000037:A:G   A       False   False   False
1:10000396:A:G  1:10000396:A:G  A       False   False   False
1:10000400:A:T  1:10000400:T:A  T       False   True    False

### System information

Information:
  pgscatalog/pgsc_calc v2.0.0-beta.3
  profile        : singularity
  CPUs: 4 - Mem: 31 GB (3.3 GB) - Swap: 0 (0)
  Nextflow version: 24.04.4

The text was updated successfully, but these errors were encountered:

Fiwx · 2024-08-14T06:23:15Z

I believe there may be a problem with heapq or something similar in intersect_cli.py. Running the same command with ~20 scores worked fine.

smlmbrt · 2024-08-14T14:15:30Z

I believe there may be a problem with heapq or something similar in intersect_cli.py. Running the same command with ~20 scores worked fine.

This step with intersect_cli.py shouldn't be dependant on the number of scores, it sounds like it was just a random error or out of memory bug?

When attempting to merge these temporary files and write to the final output, the system ran out of available RAM. This memory exhaustion caused the heapq.merge operation to fail silently, right after the file is opened and the header is written. As a result, only the header was written to the reference_variants.txt.gz file before the process was interrupted. Inspecting the Nextflow run did not confirm or reject this idea. Perhaps it needs more than 4 GB?

We will look into that (cc @nebfield)

Fiwx · 2024-08-16T01:20:24Z

I don't know why, but I am able to replicate the error running on many scores, and it goes away with fewer scores. Perhaps some other step is using more resources in the background, but this task fails instead, if this step is truly not dependent on the number of scores?

smlmbrt · 2024-08-16T10:21:40Z

If you run the .command.run script in the failed job's work directory alone does it run to completion or fail?

Fiwx · 2024-08-19T03:42:56Z

The same error/ failure occurs with .command.run.

smlmbrt · 2024-08-19T09:16:38Z

The same error/ failure occurs with .command.run.

If you edit that script to request more memory does it solve the problem?

Fiwx · 2024-08-19T13:48:02Z

Is there a way to do this at the beginning of the run? Such as a configuration file that can be modified?

smlmbrt · 2024-08-19T16:02:21Z

pgsc_calc/modules/local/ancestry/intersect_variants.nf

Lines 1 to 5 in 96fbb23

    
           process INTERSECT_VARIANTS { 
        
               // labels are defined in conf/modules.config 
        
               label 'process_single' 
        
               label 'pgscatalog_utils' // controls conda, docker, + singularity options

Could replace the process_low label with process_high_memory.

Fiwx · 2024-08-19T16:16:44Z

Thank you; I will try that. I see the memory label in conf/base.config.

pgsc_calc/conf/base.config

Line 43 in 96fbb23

withLabel:process_high_memory {

Then, in modules/local/ancestry/intersect_variants.nf, I will change:

process INTERSECT_VARIANTS {
    // labels are defined in conf/modules.config
    label 'process_single'
    label 'pgscatalog_utils' // controls conda, docker, + singularity options

to:

process INTERSECT_VARIANTS {
    // labels are defined in conf/modules.config
    label 'process_single'
    label 'pgscatalog_utils' // controls conda, docker, + singularity options
    label 'process_high_memory'

Fiwx added the bug Something isn't working label Aug 13, 2024

smlmbrt assigned nebfield Aug 14, 2024

smlmbrt added the user-query User queries & requests label Aug 15, 2024

Fiwx mentioned this issue Aug 26, 2024

v2.0.0-beta.3 run: failed to find loop device, then --resume results in Failed to invoke observer completion handler (NullPointerException in BcoRenderer) #362

Open

nebfield closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matched_variants.txt.gz empty when pgsc_calc is ran with many scores, resulting in KeyError: 'CHR:POS:A0:A1' and pgscatalog-intersect failure. #361

matched_variants.txt.gz empty when pgsc_calc is ran with many scores, resulting in KeyError: 'CHR:POS:A0:A1' and pgscatalog-intersect failure. #361

Fiwx commented Aug 13, 2024 •

edited

Loading

Fiwx commented Aug 14, 2024

smlmbrt commented Aug 14, 2024

Fiwx commented Aug 16, 2024

smlmbrt commented Aug 16, 2024

Fiwx commented Aug 19, 2024

smlmbrt commented Aug 19, 2024

Fiwx commented Aug 19, 2024

smlmbrt commented Aug 19, 2024

Fiwx commented Aug 19, 2024

matched_variants.txt.gz empty when pgsc_calc is ran with many scores, resulting in KeyError: 'CHR:POS:A0:A1' and pgscatalog-intersect failure. #361

matched_variants.txt.gz empty when pgsc_calc is ran with many scores, resulting in KeyError: 'CHR:POS:A0:A1' and pgscatalog-intersect failure. #361

Comments

Fiwx commented Aug 13, 2024 • edited Loading

Description of the bug

Command used and terminal output

Fiwx commented Aug 14, 2024

smlmbrt commented Aug 14, 2024

Fiwx commented Aug 16, 2024

smlmbrt commented Aug 16, 2024

Fiwx commented Aug 19, 2024

smlmbrt commented Aug 19, 2024

Fiwx commented Aug 19, 2024

smlmbrt commented Aug 19, 2024

Fiwx commented Aug 19, 2024

Fiwx commented Aug 13, 2024 •

edited

Loading