Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for an arbitrary number of CIGAR ops (was: convert2bed can't find sort-bed binary) #157

Closed
adamfreedman opened this issue Jul 20, 2016 · 34 comments
Assignees

Comments

@adamfreedman
Copy link

In trying to convert a bam file to a bed file....whether I add the bedops dir to path, point specifically to convert2bed, use the pre-built or build from source, I get the same error:

Error: Cannot find sort-bed binary required for sorting BED output, and then the usage help string.

cmd used is ~/software/bedops_distr/convert2bed --input=BAM < sorted_mappedonly_mus_dendtritic_stranded_Trinity.GmapToGenome.bam > test.bed

Also tried opening the bam file and piping to convert2bed...same error.

@alexpreynolds
Copy link
Collaborator

This error results from not having sort-bed and other binaries findable via the PATH environment variable. I suggest double-checking that the bedops binary directory is correctly entered in your environment's PATH, or copying the binaries to a directory already in the PATH.

@adamfreedman
Copy link
Author

yes...I just got this to work on the built-from-source version. In the distributed package, I had added it to path but it wasn't working in that case. For the latter, I'd also run it from the bedops dir with all the executables in it, assuming that pwd would be searched when executing convert2bed. But that was yesterday....maybe I needed more coffee and I did something bone-headed. At any rate, it's chugging along. Thanks -Adam

@adamfreedman
Copy link
Author

now that that's resolved, another thing happening:
*** glibc detected *** /n/home_rc/afreedman/software/bedops-2.4.19/applications/bed/conversion/bin/convert2bed: free(): invalid next size (normal): 0x00000000008e0dd0 ***
======= Backtrace: =========
[0x4244ca]
[0x4271ac]
[0x408751]
[0x41927b]
[0x4003a9]
======= Memory map: ========
00400000-004d7000 r-xp 00000000 00:13 11655828501 /n/home_rc/afreedman/software/bedops-2.4.19/applications/bed/conversion/bin/convert2bed
006d7000-006d9000 rw-p 000d7000 00:13 11655828501 /n/home_rc/afreedman/software/bedops-2.4.19/applications/bed/conversion/bin/convert2bed
006d9000-006e0000 rw-p 00000000 00:00 0
008cf000-008f2000 rw-p 00000000 00:00 0 [heap]
2b165de3b000-2b165de3c000 ---p 00000000 00:00 0
2b165de3c000-2b165e03c000 rw-p 00000000 00:00 0
2b165e03c000-2b165e03d000 ---p 00000000 00:00 0
2b165e03d000-2b165e23d000 rw-p 00000000 00:00 0
2b165e23d000-2b165e23e000 ---p 00000000 00:00 0
2b165e23e000-2b165e43e000 rw-p 00000000 00:00 0
2b165e43e000-2b165e43f000 ---p 00000000 00:00 0
2b165e43f000-2b165e63f000 rw-p 00000000 00:00 0
2b1660000000-2b1660121000 rw-p 00000000 00:00 0
2b1660121000-2b1664000000 ---p 00000000 00:00 0
2b1664000000-2b1664001000 rw-p 00000000 00:00 0
2b1668000000-2b1668024000 rw-p 00000000 00:00 0
2b1668024000-2b166c000000 ---p 00000000 00:00 0
7fff76ba0000-7fff76c24000 rw-p 00000000 00:00 0 [stack]
7fff76d68000-7fff76d69000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)

@alexpreynolds
Copy link
Collaborator

That's a different issue, perhaps a bug. If you can make your BAM file available somewhere, I can use that to figure out if an internal buffer needs resizing.

@alexpreynolds
Copy link
Collaborator

(That is, accessible where I can download it and run it for local debugging.)

@adamfreedman
Copy link
Author

Yep. Working on that. Seeing if I can make something accessible to through
our 2-factor authentication iron curtain :)

@alexpreynolds
Copy link
Collaborator

If you can split off a chromosome's worth of the original BAM, and converting that smaller file results in a core dump, that would perhaps be enough to throw onto a public share in a Dropbox folder or similar, unless you need to keep it protected.

@adamfreedman
Copy link
Author

it was just the mapped trinity contigs mapped back to a ref genome with gmap, so it's only 23mb. you can find it here:
https://www.dropbox.com/sh/kv334a3mbdsk2o6/AACinRf_LhgXZkYhG_2u_A2ya?dl=0

@alexpreynolds
Copy link
Collaborator

Thanks for the report. I get a similar error.

$ bam2bed < sorted_mappedonly_mus_dendtritic_stranded_Trinity.GmapToGenome.bam > sorted_mappedonly_mus_dendtritic_stranded_Trinity.GmapToGenome.bam.bed
*** glibc detected *** convert2bed: free(): invalid next size (normal): 0x00000000012647a0 ***
======= Backtrace: =========
[0x41b02b]
[0x41f1f6]
[0x408870]
[0x412de0]
[0x4001b9]
======= Memory map: ========
00400000-004a6000 r-xp 00000000 00:2f 78998897                           /net/module/sw/bedops/2.4.19/bin/convert2bed
006a5000-006a7000 rw-p 000a5000 00:2f 78998897                           /net/module/sw/bedops/2.4.19/bin/convert2bed
006a7000-006ae000 rw-p 00000000 00:00 0 
01253000-01275000 rw-p 00000000 00:00 0                                  [heap]
7f8b3204a000-7f8b3204b000 ---p 00000000 00:00 0 
7f8b3204b000-7f8b3284b000 rw-p 00000000 00:00 0 
7f8b3284b000-7f8b3284c000 ---p 00000000 00:00 0 
7f8b3284c000-7f8b3304c000 rw-p 00000000 00:00 0 
7f8b3304c000-7f8b3304d000 ---p 00000000 00:00 0 
7f8b3304d000-7f8b3384d000 rw-p 00000000 00:00 0 
7f8b3384d000-7f8b3384e000 ---p 00000000 00:00 0 
7f8b3384e000-7f8b3404e000 rw-p 00000000 00:00 0 
7fff0460f000-7fff04692000 rw-p 00000000 00:00 0                          [stack]
7fff046d4000-7fff046d6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/net/module/sw/bedops/2.4.19/bin/bam2bed: line 148: 130565 Aborted                 (core dumped) ${cmd} ${options} - 0<&0

I'll take a look soon and follow up when I know more.

@alexpreynolds
Copy link
Collaborator

It looks like I did not allocate enough memory to store large numbers of CIGAR operations. I bumped up the maximum number of operations:

https://github.com/bedops/bedops/blob/master/applications/bed/conversion/src/convert2bed.h#L69

This patch is in the v2p4p20 branch, if you want a quick fix. I'll probably push out a master release in the next day or two.

@dvanic
Copy link

dvanic commented Oct 17, 2016

Hi Alex! I'm afraid I'm also having the latter issue, with the latest release bedops_2.4.20.v2 (albeit looking at some 3rd generation mapping data, so my reads and hence CIGARs are looong (over 1.5 million nt in raw read length). Is there any possibility for such monster CIGARs to be handled? Thanks in advance!

@alexpreynolds
Copy link
Collaborator

Sorry about this. I will need to look at doing something a bit more dynamic with this, it seems, so I'll add this to the to-do pile. In the meantime, if you have a chromosome of your BAM file you can post somewhere for me to download, that would be very useful for testing.

@alexpreynolds alexpreynolds reopened this Oct 17, 2016
@alexpreynolds alexpreynolds changed the title convert2bed can't find sort-bed binary Support for an arbitrary number of CIGAR ops (was: convert2bed can't find sort-bed binary) Oct 17, 2016
@alexpreynolds
Copy link
Collaborator

Investigating https://bitbucket.org/genomeinformatics/simlord/ to simulate long sequencing reads.

@alexpreynolds
Copy link
Collaborator

Added patch in commit 502726e that includes support for third-generation sequencer reads.

Testing was done against simulated reads from SimLoRD, up to 100 kbases in length.

If others have test inputs they wish to try, the v2p4p21 branch has the relevant code to compile.

@aechchiki
Copy link

hey @alexpreynolds I was testing code on your v2p4p21 branch to convert a bam file from long-reads alignment (nanopore & pacbio) and I got core dump:

For nanopore:

/home/aechchik/software/bedops/bin/bam2bed < gmap_r7.bam > gmap_r7_bedops.bed
*** Error in `convert2bed': double free or corruption (!prev): 0x000000000088fe10 ***
======= Backtrace: =========
[0x425ab3]
[0x42c8a8]
[0x431487]
[0x4019d3]
[0x41ac73]
[0x40279c]
======= Memory map: ========
00400000-004ef000 r-xp 00000000 00:13 3992981018                         /Home/aechchik/bin/convert2bed
006ef000-006f2000 rw-p 000ef000 00:13 3992981018                         /Home/aechchik/bin/convert2bed
006f2000-006f8000 rw-p 00000000 00:00 0
0087e000-008a1000 rw-p 00000000 00:00 0                                  [heap]
7fbb28000000-7fbb28023000 rw-p 00000000 00:00 0
7fbb28023000-7fbb2c000000 ---p 00000000 00:00 0
7fbb30000000-7fbb30121000 rw-p 00000000 00:00 0
7fbb30121000-7fbb34000000 ---p 00000000 00:00 0
7fbb360b0000-7fbb360b1000 ---p 00000000 00:00 0
7fbb360b1000-7fbb36ab1000 rw-p 00000000 00:00 0
7fbb36ab1000-7fbb36ab2000 ---p 00000000 00:00 0
7fbb36ab2000-7fbb374b2000 rw-p 00000000 00:00 0
7fbb374b2000-7fbb374b3000 ---p 00000000 00:00 0
7fbb374b3000-7fbb37eb3000 rw-p 00000000 00:00 0
7fbb388b3000-7fbb388b4000 rw-p 00000000 00:00 0
7fff2acf4000-7fff2ad77000 rw-p 00000000 00:00 0                          [stack]
7fff2ad95000-7fff2ad96000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/home/aechchik/software/bedops/bin/bam2bed: line 148: 32652 Aborted                 (core dumped) ${cmd} ${options} - 0<&0

For pacbio:

/home/aechchik/software/bedops/bin/bam2bed < gmap_isoseq_sort.bam > gmap_isoseq_bedops.bed
*** Error in `convert2bed': free(): invalid next size (normal): 0x0000000001fe3e10 ***
======= Backtrace: =========
[0x425ab3]
[0x42c8a8]
[0x431487]
[0x4019d3]
[0x41ac73]
[0x40279c]
======= Memory map: ========
00400000-004ef000 r-xp 00000000 00:19 3992981018                         /Home/aechchik/bin/convert2bed
006ef000-006f2000 rw-p 000ef000 00:19 3992981018                         /Home/aechchik/bin/convert2bed
006f2000-006f8000 rw-p 00000000 00:00 0
01fd2000-01ff5000 rw-p 00000000 00:00 0                                  [heap]
7f68ec000000-7f68ec023000 rw-p 00000000 00:00 0
7f68ec023000-7f68f0000000 ---p 00000000 00:00 0
7f68f4000000-7f68f4121000 rw-p 00000000 00:00 0
7f68f4121000-7f68f8000000 ---p 00000000 00:00 0
7f68f85bf000-7f68f85c0000 ---p 00000000 00:00 0
7f68f85c0000-7f68f8fc0000 rw-p 00000000 00:00 0
7f68f8fc0000-7f68f8fc1000 ---p 00000000 00:00 0
7f68f8fc1000-7f68f99c1000 rw-p 00000000 00:00 0
7f68f99c1000-7f68f99c2000 ---p 00000000 00:00 0
7f68f99c2000-7f68fa3c2000 rw-p 00000000 00:00 0
7f68fadc2000-7f68fadc3000 rw-p 00000000 00:00 0
7ffc5ee8a000-7ffc5ef0e000 rw-p 00000000 00:00 0                          [stack]
7ffc5ef37000-7ffc5ef38000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/home/aechchik/software/bedops/bin/bam2bed: line 148: 17257 Aborted                 (core dumped) ${cmd} ${options} - 0<&0

The installation works perfectly on illumina data (101 nt per read), though.

@alexpreynolds
Copy link
Collaborator

alexpreynolds commented May 4, 2017

Thanks for the report. I am using sample Nanopore BAM reads from here to help debug: https://github.com/nanopore-wgs-consortium/NA12878

I did a brief test on OS X 10.12.4 with convert2bed 2.4.26, using the chrY Nanopore alignment as input, and it looks like an intermediate buffer may need more memory allocated to it.

I'll do some more testing and a patch will probably get put into 2.4.27.

If you have PacBio reads on hand, that would be useful for further testing.

@alexpreynolds alexpreynolds reopened this May 4, 2017
@aechchiki
Copy link

Many thanks @alexpreynolds for the reply. Please e-mail me (a.echchiki@gmail.com) to let me know what kind of PacBio data do you need, and which format. I would be happy to help! Cheers

@alexpreynolds
Copy link
Collaborator

Provisional fix in commit d45974c

@alexpreynolds
Copy link
Collaborator

Commit 8f8a43c includes fixes for a new splice processing option and the option to reduce output conversion and sorting overhead.

I'm going to close this up, but please feel free to follow up if there are further problems.

@ma-diroma
Copy link

ma-diroma commented Mar 6, 2018

Hi,

I got the same error using bedops v.2.4.30 on Nanopore data

Error in `convert2bed': free(): invalid next size (fast): 0x0000000001a36340 ***
======= Backtrace: =========
[0x42b74c]
[0x402f2c]
[0x41ed24]
[0x41ee4e]
[0x403a96]
======= Memory map: ========
00400000-004f2000 r-xp 00000000 08:14 1431533                            /home/madiroma/bin/bin/convert2bed-typical
006f1000-006f4000 rw-p 000f1000 08:14 1431533                            /home/madiroma/bin/bin/convert2bed-typical
006f4000-006fb000 rw-p 00000000 00:00 0 
01a2f000-01a52000 rw-p 00000000 00:00 0                                  [heap]
7f6200000000-7f62004e6000 rw-p 00000000 00:00 0 
7f62004e6000-7f6204000000 ---p 00000000 00:00 0 
7f6208000000-7f62084e6000 rw-p 00000000 00:00 0 
7f62084e6000-7f620c000000 ---p 00000000 00:00 0 
7f6210000000-7f62104e6000 rw-p 00000000 00:00 0 
7f62104e6000-7f6214000000 ---p 00000000 00:00 0 
7f6217959000-7f6217b3b000 rw-p 00000000 00:00 0 
7f6218000000-7f62184e6000 rw-p 00000000 00:00 0 
7f62184e6000-7f621c000000 ---p 00000000 00:00 0 
7f621c31e000-7f621c31f000 ---p 00000000 00:00 0 
7f621c31f000-7f621cd1f000 rw-p 00000000 00:00 0 
7f621cd1f000-7f621cd20000 ---p 00000000 00:00 0 
7f621cd20000-7f621d720000 rw-p 00000000 00:00 0 
7f621d720000-7f621d721000 ---p 00000000 00:00 0 
7f621d721000-7f621e121000 rw-p 00000000 00:00 0 
7f621eb21000-7f621eb22000 rw-p 00000000 00:00 0 
7fff7fc75000-7fff7fcb8000 rw-p 00000000 00:00 0                          [stack]
7fff7fccc000-7fff7fccd000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

I am trying to convert wig to bed format.
Any solution?
Thanks

@sjneph
Copy link
Collaborator

sjneph commented Mar 6, 2018

The Nanopore data may require larger fields. If you have downloaded the binary version of BEDOPS, then:

> /path-to-bedops/switch-BEDOPS-binary-type --megarow

This will symlink in a build that can support far larger fields, and you can call convert2bed as normal.

@ma-diroma
Copy link

I followed your suggestion

/home/madiroma/bin/bin/switch-BEDOPS-binary-type --megarow

but I got the same error

wig2bed < myfile.wig --multisplit=something > out.bed

*** Error in `convert2bed': free(): invalid next size (fast): 0x00000000024a6340 ***
======= Backtrace: =========
[0x42b73c]
[0x402f2c]
[0x41ed14]
[0x41ee3e]
[0x403a96]
======= Memory map: ========
00400000-004f2000 r-xp 00000000 08:14 1431486                            /home/madiroma/bin/bin/convert2bed-megarow
006f1000-006f4000 rw-p 000f1000 08:14 1431486                            /home/madiroma/bin/bin/convert2bed-megarow
006f4000-006fb000 rw-p 00000000 00:00 0 
0249f000-024c2000 rw-p 00000000 00:00 0                                  [heap]
7fa13b3fc000-7fa148000000 rw-p 00000000 00:00 0 
7fa148000000-7fa1484e6000 rw-p 00000000 00:00 0 
7fa1484e6000-7fa14c000000 ---p 00000000 00:00 0 
7fa14c000000-7fa14c4e6000 rw-p 00000000 00:00 0 
7fa14c4e6000-7fa150000000 ---p 00000000 00:00 0 
7fa150000000-7fa1504e6000 rw-p 00000000 00:00 0 
7fa1504e6000-7fa154000000 ---p 00000000 00:00 0 
7fa1575ff000-7fa157600000 ---p 00000000 00:00 0 
7fa157600000-7fa158000000 rw-p 00000000 00:00 0 
7fa158000000-7fa1584e6000 rw-p 00000000 00:00 0 
7fa1584e6000-7fa15c000000 ---p 00000000 00:00 0 
7fa15d56b000-7fa15d56c000 ---p 00000000 00:00 0 
7fa15d56c000-7fa15df6c000 rw-p 00000000 00:00 0 
7fa15df6c000-7fa15df6d000 ---p 00000000 00:00 0 
7fa15df6d000-7fa15e96d000 rw-p 00000000 00:00 0 
7fa15f36d000-7fa15f76f000 rw-p 00000000 00:00 0 
7ffd34969000-7ffd349ab000 rw-p 00000000 00:00 0                          [stack]
7ffd349bf000-7ffd349c0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

@alexpreynolds
Copy link
Collaborator

Hi Maria,

If possible, I'd like to test out what you're running locally. Would you be able to post your WIG file where I can download it?

Regards,
Alex

@ma-diroma
Copy link

ma-diroma commented Mar 6, 2018 via email

@alexpreynolds
Copy link
Collaborator

Thanks, I'll investigate as soon as I can and follow up here.

@alexpreynolds
Copy link
Collaborator

Thanks for the test input. There are chromosome names in your sample WIG file that are longer than what I allot to the buffer for storing chromosome names. The wig2bed command fails with a segmentation fault when trying to fit the longer name into the too-small buffer.

I resized this buffer in v2.4.31 (commit 97bfc2b) to the suite-wide constant and this resolves the segmentation fault for your input, at least from tests on a Mac OS X host.

After some more tests, I will push out prebuilt binaries of this new version of BEDOPS in the next day or two, but you could build v2.4.31 from source, if you need new binaries sooner; cf. http://bedops.readthedocs.io/en/latest/content/installation.html#via-source-code

Thanks for the report and please let me know if you have any questions.

@ma-diroma
Copy link

Thanks for your help. I changed this line
#define C2B_MAX_CHROMOSOME_LENGTH 32
to
#define C2B_MAX_CHROMOSOME_LENGTH TOKEN_CHR_MAX_LENGTH
within bedops/applications/bed/conversion/src/convert2bed.h
Now I got a new error

*** glibc detected *** convert2bed: free(): invalid next size (fast): 0x0000000001f4bdf0 ***
======= Backtrace: =========
[0x4289fa]
[0x42b7cc]
[0x40d40b]
[0x41d93b]
[0x400425]
======= Memory map: ========
00400000-004df000 r-xp 00000000 08:14 3899030                            /home/madiroma/bin/bedops/bin/convert2bed
006df000-006e1000 rw-p 000df000 08:14 3899030                            /home/madiroma/bin/bedops/bin/convert2bed
006e1000-006e8000 rw-p 00000000 00:00 0 
01f45000-01f68000 rw-p 00000000 00:00 0                                  [heap]
7f9398000000-7f93984e6000 rw-p 00000000 00:00 0 
7f93984e6000-7f939c000000 ---p 00000000 00:00 0 
7f93a0000000-7f93a04e6000 rw-p 00000000 00:00 0 
7f93a04e6000-7f93a4000000 ---p 00000000 00:00 0 
7f93a4000000-7f93a44e6000 rw-p 00000000 00:00 0 
7f93a44e6000-7f93a8000000 ---p 00000000 00:00 0 
7f93a8000000-7f93a84e6000 rw-p 00000000 00:00 0 
7f93a84e6000-7f93ac000000 ---p 00000000 00:00 0 
7f93ac6bc000-7f93ac89e000 rw-p 00000000 00:00 0 
7f93acd63000-7f93acd64000 ---p 00000000 00:00 0 
7f93acd64000-7f93ad764000 rw-p 00000000 00:00 0 
7f93ad764000-7f93ad765000 ---p 00000000 00:00 0 
7f93ad765000-7f93ae165000 rw-p 00000000 00:00 0 
7f93ae165000-7f93ae166000 ---p 00000000 00:00 0 
7f93ae166000-7f93aeb66000 rw-p 00000000 00:00 0 
7f93af566000-7f93af567000 rw-p 00000000 00:00 0 
7ffe6c9f4000-7ffe6ca37000 rw-p 00000000 00:00 0                          [stack]
7ffe6cbc6000-7ffe6cbc7000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

@alexpreynolds
Copy link
Collaborator

alexpreynolds commented Mar 7, 2018 via email

@ma-diroma
Copy link

I followed your link 97bfc2b and then used git clone to download the package https://github.com/bedops/bedops/tree/97bfc2b22b74a1b7e7c1c96eed82b047e8fc1489, but the version installed is 2.4.30.

@bedops
Copy link
Owner

bedops commented Mar 7, 2018 via email

@ma-diroma
Copy link

ma-diroma commented Mar 7, 2018 via email

@alexpreynolds
Copy link
Collaborator

You might try something like the following:

$ cd /tmp
$ git clone https://github.com/bedops/bedops.git
$ cd bedops
$ git checkout v2p4p31
$ make && make install
...
$ /tmp/bedops/bin/convert2bed --input=wig < myfile.wig > myfile.bed

The git checkout v2p4p31 step checks out the v2.4.31 branch in its current state, so that the subsequent make && make install step builds and installs v2.4.31 within a subfolder in /tmp.

After I have published an "official" version, you won't need to do any of this; however, these steps may be useful for conversion work until that is available.

@ma-diroma
Copy link

Perfect! Thank you very much. Now it works.

Best wishes,
Maria Angela

@bedops
Copy link
Owner

bedops commented Mar 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants