Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resources assignment when perform parallel jobs #3727

Open
wangpenhok opened this issue Nov 15, 2023 · 1 comment
Open

resources assignment when perform parallel jobs #3727

wangpenhok opened this issue Nov 15, 2023 · 1 comment

Comments

@wangpenhok
Copy link

wangpenhok commented Nov 15, 2023

Version info

  • bcbio version (bcbio_nextgen.py --version):1.2.9
  • OS name and version (lsb_release -ds): Ubuntu 20.04.5 LTS

To Reproduce
Exact bcbio command you have used:

 nohup bcbio_nextgen.py ../config/my_project.yaml  -n 64 &

Your yaml configuration file:

- algorithm:
    align_split_size: false
    aligner: bwa
    coverage_interval: regional
    ensemble:
      numpass: 2
    exclude_regions: lcr
    svcaller:
    - manta
    - cnvkit
    variant_regions: /home/data/bcbio/genomes/Hsapiens/hg38/coverage/capture_regions/Exome-Agilent_V6.bed
    variantcaller:
      germline:
      - freebayes
      - gatk-haplotype
      - strelka2
      somatic:
      - vardict
      - mutect2
      - strelka2
  analysis: variant2
  description: sample8
  files:
  - /home/data/bcbio/projects/input/S10_L4_518_R1.fastq.gz
  - /home/data/bcbio/projects/input/S10_L4_518_R2.fastq.gz
  genome_build: hg38
  metadata:
    batch: MatchWith_sample8
    phenotype: tumor
    prep_method: 300x
    tissue: tissue
resources:
    bamsormadup:
      cores: 8
      memory: 2G
    bwa:
      cores: 8
      memory: 2G
    gatk:
      jvm_opts:
      - -Xms2g
      - -Xmx4g
    genome:
      dir: /home/data/bcbio/genomes/Hsapiens/hg38
    samtools:
      cores: 16
      memory: 2G

Supposably, when I set the number of all available cores as -n 64 with the setup in my yaml file shown above, each job would occupy only 8 cores to perform bwa mem. However, when I checked the log files, both the debug-log and command log showed that the resources were not deployed as I wished. Besides, the pipeline repeatedly threw error indicating " Segmentation fault (core dumped) ", as is shown below.
I have no idea how this happened and what should I do to fix it , could you please help me with this problem? Thanks~

Log files (could be found in work/log)

debug-log

[2023-11-15T06:04Z] System YAML configuration: /home/data/bcbio/galaxy/bcbio_system.yaml.
[2023-11-15T06:04Z] Locale set to C.UTF-8.
[2023-11-15T06:04Z] Resource requests: bwa, sambamba, samtools; memory: 2.00, 6.00, 2.00; cores: 8, 32, 16
[2023-11-15T06:04Z] Configuring 1 jobs to run, using 32 cores each with 192.1g of memory reserved for each job
[2023-11-15T06:04Z] Timing: organize samples
[2023-11-15T06:04Z] multiprocessing: organize_samples

command-log

[2023-11-15T06:05Z] unset JAVA_HOME && /home/data/bcbio/galaxy/../anaconda/bin/bwa mem   -c 250 -M -t 32  -R '@RG\tID: sample8\tPL:illumina\tPU:sample8\tSM:sample8' -v 1 /home/data/bcbio/genomes/Hsapiens/hg38/bwa/hg38.fa /home/data/bcbio/projects/work/align_prep/sample8_S38_L3_543_R1.fastq.gz /home/data/bcbio/projects/work/align_prep/sample8_S38_L3_543_R2.fastq.gz  | /home/data/bcbio/galaxy/../anaconda/bin/bamsormadup inputformat=sam threads=24 tmpfile=/home/data/bcbio/projects/work/bcbiotx/tmpeva3dfj4/sample8-sort-sorttmp-markdup SO=coordinate indexfilename=/home/data/bcbio/projects/twin_somatic/twin_somatic/work/bcbiotx/tmpeva3dfj4/sample8-sort.bam.bai > /home/data/bcbio/projects/work/bcbiotx/tmpeva3dfj4/sample8-sort.bam

Segmentation fault error

     2397570 Segmentation fault      (core dumped) | /home/data/bcbio/galaxy/../anaconda/bin/bamsormadup inputformat=sam threads=12 tmpfile=/home/data/bcbio/projects/work/bcbiotx/tmp0716fu54/sample8-sort-sorttmp-markdup SO=coordinate indexfilename=/home/data/bcbio/projects/work/bcbiotx/tmp0716fu54/sample8-sort.bam.bai > /home/data/bcbio/projects/work/bcbiotx/tmp0716fu54/sample8-sort.bam
@naumenko-sa
Copy link
Contributor

naumenko-sa commented Dec 11, 2023

Hi @wangpenhok !

I suspect that here you have an indentation issue: you have 4 spaces instead of 2 after resources, and you specifications have not been parsed.

For a one-node non-distributed run, bcbio's logic in allocating resources with (-n 64) is

After these calculations, bcbio uses: 32 cores each with 192.1g

When bcbio runs a pipe, it accounts for the fact that every command in the pipe consumes RAM, so it has to decrease cores to fit into the RAM which happened in the command:

bwa mem -t 32 | bamsormadup threads=24

Still, these values are very high for this server. The memory is also consumed for the IO buffers.
You need to try running bcbio with -n 7 or -n10, maximum with -n20.

Large core numbers -n only make sense in a distributed bcbio runs, when these cores are requested across many servers.

SN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants