Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`bwa-mem2': free(): invalid next size (normal): 0x00007f3a1d66fb5 #49

Closed
yihchii opened this issue Feb 20, 2020 · 2 comments
Closed

`bwa-mem2': free(): invalid next size (normal): 0x00007f3a1d66fb5 #49

yihchii opened this issue Feb 20, 2020 · 2 comments
Assignees

Comments

@yihchii
Copy link

yihchii commented Feb 20, 2020

Setting:

  • Binary built using commit 091cf8a
  • Ran on AWS c5d.18xlarge with Linux 4.4.0-1099-aws and Ubuntu 16.04.6 LTS
  • Executing in AVX512 mode

Command:

⟫ bwa-mem2 mem -t 8 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR3242672_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR3242672_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242672_1\tPL:ILLUMINA\tPU:ERR3242672_1\tLB:ERR3242672\tSM:ER^C242672' > /dev/null

Error message:

...
[M::mem_pestat] (25, 50, 75) percentile: (827, 2626, 3427)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8627)
[M::mem_pestat] mean and std.dev: (2383.79, 1436.55)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11227)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[0000] 10. Calling kt_for - worker_sam
*** Error in `bwa-mem2': free(): invalid next size (normal): 0x00007f3a1d66fb50 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f48411137e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f484111c37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f484112053c]
bwa-mem2[0x44cdf8]
bwa-mem2[0x445869]
bwa-mem2[0x42658c]
bwa-mem2[0x46a631]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f4841f286ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f48411a341d]
======= Memory map: ========
00400000-0049d000 r-xp 00000000 00:2a 75543                              /bwa-mem2/bwa-mem2.avx512bw
0069c000-006b0000 rw-p 0009c000 00:2a 75543                              /bwa-mem2/bwa-mem2.avx512bw
006b0000-006d1000 rw-p 00000000 00:00 0 
025dd000-03a99000 rw-p 00000000 00:00 0                                  [heap]
7f38f4000000-7f38f8000000 rw-p 00000000 00:00 0 
7f38f8000000-7f38fbffd000 rw-p 00000000 00:00 0 
7f38fbffd000-7f38fc000000 ---p 00000000 00:00 0 
7f390c000000-7f3910000000 rw-p 00000000 00:00 0
...

Below are the files we used:
Reference genome (hs38DH.fa as a tar file, built after 6743183): https://dl.dnanex.us/F/D/19XXQfV51qBYpP4jgY46YzpX7ZB6GQYv78Zg6B4x/hs38DH.bwa-index.avx512.v2.tar.gz
ERR3242672_1.fastq.gz (12.03 GiB): https://dl.dnanex.us/F/D/FXyYZk9kqk8Y0qVGPV1VkBpY0byfQyB1Q2zf7v84/ERR3242672_1.fastq.gz
ERR3242672_2.fastq.gz (13.77 GiB): https://dl.dnanex.us/F/D/8y5qbZbkbVYyJJZXqB1PPg1YVZ5JFK5J7QjYx82j/ERR3242672_2.fastq.gz

@yuk12 yuk12 self-assigned this Feb 20, 2020
@yuk12
Copy link
Member

yuk12 commented Apr 20, 2020

I tried with latest commit on the given input. It executes to completion.
Could you try with the latest commit.

Run info:
Compiler used: icc-18, gcc.
Command line same as mentioned above.

make CXX=icpc multi
./bwa-mem2 mem -t 8 /scratch/omics/genomes-indexes-reads/human/ref/hs38DH/hs38DH.fa /scratch/omics/genomes-indexes-reads/human/reads/ERR3242672_1.fastq.gz /scratch/omics/genomes-indexes-reads/human/reads/ERR3242672_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242672_1\tPL:ILLUMINA\tPU:ERR3242672_1\tLB:ERR3242672\tSM:ER^C242672' > /scratch/mvasimud/sam_out/tmpv.sa

Here is the log (icpc-18):
-----------------------------
Executing in AVX512 mode!!
-----------------------------
* Ref file: /scratch/omics/genomes-indexes-reads/human/ref/hs38DH/hs38DH.fa
* Entering FMI_search
* Reference seq len for bi-index = 6418915857
* Count:
0, 1
1, 1877984092
2, 3209457929
3, 4540931766
4, 6418915857

* Reading other elements of the index from files /scratch/omics/genomes-indexes-reads/human/ref/hs38DH/hs38DH.fa
* Index prefix: /scratch/omics/genomes-indexes-reads/human/ref/hs38DH/hs38DH.fa
* Read 0 ALT contigs
* Done reading Index!!
* Reading reference genome..
* Binary seq file = /scratch/omics/genomes-indexes-reads/human/ref/hs38DH/hs38DH.fa.0123
* Reference genome size: 6418915856 bp
* Done reading reference genome !!


  1. Memory pre-allocation for Chaining: 1393.3971 MB
  2. Memory pre-allocation for BSW: 1916.9362 MB
  3. Memory pre-allocation for BWT: 618.5134 MB

* Threads used (compute): 8
* No. of pipeline threads: 2

[0000] read_chunk: 100000000, work_chunk_size: 100000200, nseq: 666668
[0000][ M::kt_pipeline] read 666668 sequences (100000200 bp)...
[0000] Reallocating initial memory allocations!!
[0000] Calling mem_process_seqs.., task: 0
[0000] 1. Calling kt_for - worker_bwt
[0000] read_chunk: 100000000, work_chunk_size: 100000200, nseq: 666668
[0000][ M::kt_pipeline] read 666668 sequences (100000200 bp)...
[0000] 2. Calling kt_for - worker_aln
[0000] Inferring insert size distribution of PE reads from data, l_pac: 3209457928, n: 666668
[0000][PE] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[0000][PE] skip orientation FF as there are not enough pairs
[0000][PE] skip orientation FR as there are not enough pairs
[0000][PE] skip orientation RF as there are not enough pairs
[0000][PE] skip orientation RR as there are not enough pairs
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 666668 reads in 82.258 CPU sec, 10.655 real sec

[[Skipped the intermediate logs]]

[0000] Calling mem_process_seqs.., task: 1140
[0000] 1. Calling kt_for - worker_bwt
[0000] read_chunk: 100000000, work_chunk_size: 42700500, nseq: 284670
[0000][ M::kt_pipeline] read 284670 sequences (42700500 bp)...
[0000] 2. Calling kt_for - worker_aln
[0000] Inferring insert size distribution of PE reads from data, l_pac: 3209457928, n: 666668
[0000][PE] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[0000][PE] skip orientation FF as there are not enough pairs
[0000][PE] skip orientation FR as there are not enough pairs
[0000][PE] skip orientation RF as there are not enough pairs
[0000][PE] skip orientation RR as there are not enough pairs
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 666668 reads in 51.979 CPU sec, 6.602 real sec
[0000] Calling mem_process_seqs.., task: 1141
[0000] 1. Calling kt_for - worker_bwt
[0000] read_chunk: 100000000, work_chunk_size: 0, nseq: 0
[0000] 2. Calling kt_for - worker_aln
[0000] Inferring insert size distribution of PE reads from data, l_pac: 3209457928, n: 284670
[0000][PE] # candidate unique pairs for (FF, FR, RF, RR): (0, 0, 0, 0)
[0000][PE] skip orientation FF as there are not enough pairs
[0000][PE] skip orientation FR as there are not enough pairs
[0000][PE] skip orientation RF as there are not enough pairs
[0000][PE] skip orientation RR as there are not enough pairs
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 284670 reads in 21.638 CPU sec, 2.706 real sec
[0000] read_chunk: 100000000, work_chunk_size: 0, nseq: 0
[0000] Computation ends..
No. of OMP threads: 8
Processor is runnig @2693.609926 MHz

@yihchii
Copy link
Author

yihchii commented Jun 11, 2020

Hi @yuk12, with the latest binary (20 days old by now) we have, this job failed with Segmentation fault when I used up all threads on the instance.

  • Ran on AWS c5d.18xlarge (72 cores) with Linux 4.4.0-1099-aws and Ubuntu 16.04.6 LTS
  • Executing in AVX512 mode

See log and error message below:

$ bwa-mem2 mem -t 72 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR3242672_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR3242672_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242672_1\tPL:ILLUMINA\tPU:ERR3242672_1\tLB:ERR3242672\tSM:ERR3242672'
...
-----------------------------
Executing in AVX512 mode!!
-----------------------------
* Ref file: genome/hs38DH.fa
* Entering FMI_search
* Index file found. Loading index from genome/hs38DH.fa.bwt.8bit.32
* Reference seq len for bi-index = 6434693835
* Count:
0,	1
1,	1882204624
2,	3217346918
3,	4552489212
4,	6434693835
* Reading other elements of the index from files genome/hs38DH.fa
* Index prefix: genome/hs38DH.fa
* Read 3171 ALT contigs
* Done reading Index!!
* Reading reference genome..
* Binary seq file = genome/hs38DH.fa.0123
* Reference genome size: 6434693834 bp
* Done reading reference genome !!
...

[V] 97061139 59:98552699
[V] 97552442 01:00:59557999
[V] 98707864 01:01:36115999
[V] 99033930
[D]	md5	a35c658a7220c05b2c16255584710479
[V] checksum ok
[V] blocks merged in time 01:05:77533499
[V] run time 13:02:54596900 (782.546 s)	MemUsage(size=15814.6,rss=5763.01,peak=20143.8)
1550 Segmentation fault

When I lowered the number of threads o use to 36, the job finished sucessfully:

$ bwa-mem2 mem -t 36 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR3242672_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR3242672_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242672_1\tPL:ILLUMINA\tPU:ERR3242672_1\tLB:ERR3242672\tSM:ERR3242672' > test.sam
...
[0000][PE] (25, 50, 75) percentile: (361, 419, 490)
[0000][PE] low and high boundaries for computing mean and std.dev: (103, 748)
[0000][PE] mean and std.dev: (427.63, 96.85)
[0000][PE] low and high boundaries for proper pairs: (1, 877)
[0000][PE] skip orientation RF as there are not enough pairs
[0000][PE] skip orientation RR as there are not enough pairs
[0000][PE] skip orientation FF
[0000] 3. Calling kt_for - worker_sam
        [0000][ M::mem_process_seqs] Processed 284670 reads in 51.176 CPU sec, 1.977 real sec
[0000] read_chunk: 100000000, work_chunk_size: 0, nseq: 0
[0000] Computation ends..
No. of OMP threads: 36
Processor is runnig @3000.173222 MHz
Runtime profile:

        Time taken for main_mem function: 6739.18 sec

        IO times (sec) :
        Reading IO time (reads) avg: 1270.00, (1270.00, 1270.00)
        Writing IO time (SAM) avg: 1055.70, (1055.70, 1055.70)
        Reading IO time (Reference Genome) avg: 4.14, (4.14, 4.14)
        Index read time avg: 31.98, (31.98, 31.98)

        Overall time (sec) (Excluding Index reading time):
        PROCESS() (Total compute time + (read + SAM) IO time) : 6698.43
        MEM_PROCESS_SEQ() (Total compute time (Kernel + SAM)), avg: 6695.87, (6695.87, 6695.87)

         SAM Processing time (sec):
        --WORKER_SAM avg: 2357.22, (2357.22, 2357.22)

        Kernels' compute time (sec):
        Total kernel (smem+sal+bsw) time avg: 4209.77, (4209.77, 4209.77)
                SMEM compute avg: 1487.40, (1505.20, 1480.82)
                SAL compute avg: 552.52, (558.28, 539.82)
                BSW time, avg: 1625.21, (1638.39, 1612.70)

        Total re-allocs: -1939826418 out of total requests: 1906125222, Rate: -1.02

Important parameter settings: 
        BATCH_SIZE: 512
        MAX_SEQ_LEN_REF: 256
        MAX_SEQ_LEN_QER: 128
        MAX_SEQ_LEN8: 128
        SEEDS_PER_READ: 500
        SIMD_WIDTH8 X: 64
        SIMD_WIDTH16 X: 32
        AVG_SEEDS_PER_READ: 64

You may close this ticket as there are workaround with the Segmentation fault, by lowering the number of threads to use.

@yuk12 yuk12 closed this as completed Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants