Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault from bedtools in prepare_results.R #40

Closed
davidaknowles opened this issue Sep 19, 2017 · 7 comments
Closed

Segfault from bedtools in prepare_results.R #40

davidaknowles opened this issue Sep 19, 2017 · 7 comments
Assignees
Labels

Comments

@davidaknowles
Copy link
Owner

I'm trying to run prepare_results.R on the YRI vs EU example in /worked_out_example. At the line

all.introns_intersect <- fread(all.introns.cmd)

bedtools is segfaulting:

***** WARNING: File ../leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz has inconsistent naming convention for record:
GL000220.1	154617	154624	"CH507-513H4.5"	"ENSG00000281383.1_2"	-	"ENST00000629969.1_1"	1	"lincRNA"	"OTTHUMG00000189717.1_2"

sh: line 1: 17448 Segmentation fault: 11  ( bedtools intersect -a for_leafviz/all_junctions.bed -b ../leafviz/annotation_codes/gencode_hg19/gencode_hg19_all_introns.bed.gz -wa -wb -loj -f 1 ) 

I think the has inconsistent naming convention for record is todo with checking for e.g. chr1 vs 1 naming of chromosomes, but nothing like that is going on there. Maybe bedtools doesn't like the GL000220.1 chromosome name for some reason? This is my all_junctions.bed:
https://www.dropbox.com/s/43eq0zdtv27owh7/all_junctions.bed?dl=0
and gencode_hg19_all_introns.bed.gz is here:
https://www.dropbox.com/s/pt1pbh5r40pjjs8/gencode_hg19_all_introns.bed.gz?dl=0

@jackhump
Copy link
Collaborator

this is pretty odd. Every time I've run prepare_results.R with the hg19 bed files I get two warnings about inconsistent records but I've never had a seg fault. Have you updated to the latest version of bedtools?

If these weird chromosome names are the case then maybe we could get rid of them when creating the bed files?

@davidaknowles
Copy link
Owner Author

That warning may just be a coincidence, not sure. I'll try updating and see if it helps!

@davidaknowles
Copy link
Owner Author

Hmm so upgrading bedtools from v2.25.0 to v2.26.0 didn't fix this.

@davidaknowles
Copy link
Owner Author

I removed the "weird" chromosome names from gencode_hg19_all_introns.bed.gz. I don't get the warning anymore but I still get the segfault, so I don't think this was actually the root of the problem. @jackhump can you try running on your machine and see if you the same problem?

@davidaknowles
Copy link
Owner Author

I tried with a new annotation_code generated with wrangle_annotation.sh and still have the same issue.

@davidaknowles
Copy link
Owner Author

So I'm starting to think this is a bug in bedtools, I've raised an issue here: arq5x/bedtools2#576

@davidaknowles
Copy link
Owner Author

I've reworked prepare_results.R to use dplyr joins instead of bedtools now so we avoid these issues. I've also changed the CLI to be a bit simpler to document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants