Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::invalid_argument' #917

Closed
Shicheng-Guo opened this issue May 24, 2021 · 7 comments
Closed

Comments

@Shicheng-Guo
Copy link

Any idea to this issue?

terminate called after throwing an instance of 'std::invalid_argument'

(base) [sguo2@login02 regulomedb]$ bedtools sort -i hg38_RegulomeDB.bed > hg38_RegulomeDB.sort.bed
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoll
Aborted (core dumped)

(base) [sguo2@login02 regulomedb]$ bedtools
bedtools is a powerful toolset for genome arithmetic.

Version: v2.30.0
About: developed in the quinlanlab.org and by many contributors worldwide.
Docs: http://bedtools.readthedocs.io/
Code: https://github.com/arq5x/bedtools2
Mail: https://groups.google.com/forum/#!forum/bedtools-discuss

Usage: bedtools [options]

The bedtools sub-commands include:

@arq5x
Copy link
Owner

arq5x commented May 24, 2021

could you share the first few lines of the input file?

@Shicheng-Guo
Copy link
Author

Thanks Aaron, Here is top10 lines

(base) [sguo2@login01 regulomedb]$ head hg38_RegulomeDB.bed
chr1    11011   11012   rs544419019     False   False   True    True    False   False   False   6
chr1    13109   13110   rs540538026     False   False   False   False   False   False   False   7
chr1    13115   13116   rs62635286      False   False   False   False   False   False   False   7
chr1    13117   13118   rs62028691      False   False   False   False   False   False   False   7
chr1    13272   13273   rs531730856     False   False   False   False   False   False   False   7
chr1    13667   13668   rs2691328       False   False   True    False   False   False   False   6
chr1    14463   14464   rs546169444     False   False   True    False   False   False   False   6
chr1    14929   14930   rs6682385       False   False   False   False   False   False   False   7
chr1    14932   14933   rs199856693     False   False   False   False   False   False   False   7
chr1    15773   15774   rs374029747     False   False   False   False   False   False   False   7

@arq5x
Copy link
Owner

arq5x commented May 24, 2021

The error is being raised by the "False" and "True" in the file because it has exactly 12 columns and bedtools is interpreting it to be a BED12 format where False and True would be invalid for some of the columns where a number is strictly expected. I would recommend replacing True with 1 and False with 0.

sed -e 's/False/0/g' -e 's/True/1/g' hg38_RegulomeDB.bed > hg38_RegulomeDB.new.bed

@Shicheng-Guo
Copy link
Author

Thanks Aaron! the solution works perfect.

@arq5x arq5x closed this as completed Jun 3, 2021
tgstoecker added a commit to ncbench/ncbench-workflow that referenced this issue Mar 14, 2023
@hepcat72
Copy link

I'm running into this same issue, though I'm using the output of a bedtools intersect with -loj on 2 bed6 files... I didn't get this error on a different dataset and I'm not sure why... So my input looks like this:

chr1	259023	259024	ENST00000450734.1:ENSG00000228463	.	-	chr1	259023	259024	ENST00000450734.1:ENSG00000228463	.	-
chr1	264732	264733	ENST00000442116.1:ENSG00000228463	.	-	chr1	264732	264733	ENST00000442116.1:ENSG00000228463	.	-
chr1	266854	266855	ENST00000669836.1:ENSG00000286448	.	+	chr1	266854	266855	ENST00000669836.1:ENSG00000286448	.	+
chr1	268815	268816	ENST00000448958.2:ENSG00000228463,ENST00000634344.2:ENSG00000228463	.	-	chr1	268815	268816	ENST00000448958.2:ENSG00000228463,ENST00000634344.2:ENSG00000228463	.	-
chr1	297501	297502	ENST00000424587.7:ENSG00000228463	.	-	chr1	297501	297502	ENST00000424587.7:ENSG00000228463	.	-
chr1	348365	348366	ENST00000458203.2:RPL23AP24	.	-	chr1	348365	348366	ENST00000458203.2:RPL23AP24	.	-
chr1	358856	358857	ENST00000450983.1:ENSG00000236601	.	+	chr1	358856	358857	ENST00000450983.1:ENSG00000236601	.	+
chr1	358871	358872	ENST00000412666.1:ENSG00000236601	.	+	chr1	358871	358872	ENST00000412666.1:ENSG00000236601	.	+
chr1	359680	359681	ENST00000441866.2:ENSG00000228463	.	-	chr1	359680	359681	ENST00000441866.2:ENSG00000228463	.	-
chr1	360056	360057	ENST00000635159.1:ENSG00000236601	.	+	chr1	360056	360057	ENST00000635159.1:ENSG00000236601	.	+

And the series of commands are (in a snakemake rule):

        # Duplicate the columns to retain the actual tss coordinates
        intersectBed -a {input.tss:q} -b {input.tss:q} -loj 2> {log.dupe:q} | \
        # Expand the size of the tss position by the promoter distance
        slopBed -b {params.promoter_distance:q} -g {input.lengths:q} -i stdin 2> {log.slop:q} | \
        # Find all tss records that are within promoter_distance of each peak
        intersectBed -a {input.peaks:q} -b stdin -loj 2> {log.get_tss:q} | \
        # Calculate the distance between each peak and its tss records
        awk -v promoter_distance={params.promoter_distance:q} -F $'\\t' -f {input.awk_tss_distance_script:q} 2> {log.awk:q} | \
        # Group all tss distances together
        groupBy -grp 1-5 -c 6,7,8,9 -o collapse 2> {log.group_tss:q} 1> {output:q}

It's the slopBed command that issues the error.

@hepcat72
Copy link

Well, it looks like if I append a 13th column, it works...

intersectBed -a {input.tss:q} -b {input.tss:q} -loj 2> {log.dupe:q} | \
perl -e 'while(<>){chomp;print;print("\t13thcolumn\n")}' | \
slopBed -b {params.promoter_distance:q} -g {input.lengths:q} -i stdin 2> {log.slop:q} | \
...

Still, I'm not sure what the issue is, because I just tried running my test data and then manually executing this series of commands (that doesn't produce the error - and without me adding the 13th column) and the output of intersect looks exactly the same to me. I mean, there clearly is different data, but the types of the data is the same. I just would like it to make sense. Is there a way to know what row/value is causing the error? Could it have to do with the size of the data?

@hepcat72
Copy link

Oh wait. The chromosome names are different. The test data is just "19" (smallest chromosome) and the full data is "chr", e.g. "chr19". I bet that's it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants