You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most of the time that I want to use subtractBed, the b file is very big (e.g.: 2 GiB: 66 million lines) as it contains all known snp positions.
This command is then very slow (file a contains only 10000 positions) and sometimes is unable to run as there is not enough memory to load the b file in memory:
subtractBed -a sample_snps.bed -b allknownsnps.bed
This is the error I get when I run it on a machine with 32GiB of RAM:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
When running the command with strace:
strace -o subtractBed.strace subtractBed -a sample_snps.bed -b allknownsnps.bed
# Line number lose to where the PC runs out of memory: 58469306
$ grep -n -m1 '^chr17'$'\t''67999172' allknownsnps.bed
58469306:chr17 67999172 67999173 T G PASS
# Total number of lines in the file: 66007044
$ wc -l allknownsnps.bed
66007044
It would be nice to use subtractBed like this:
subtractBed -loada -a sample_snps.bed -b allknownsnps.bed
Where -loada loads file a in memory and b from disk.
So if file b is read line by line, it just needs to remove all entries from file a (that is loaded in memory) that are found in file b.
I know it is possible to mimic this behaviour with, the following, but it would be much easier if it was implemented in subtractBed directly:
intersectBed -wb -a allknownsnps.bed -b sample_snps.bed | subtractBed -a sample_snps.bed -b stdin
The text was updated successfully, but these errors were encountered:
Most of the time that I want to use subtractBed, the b file is very big (e.g.: 2 GiB: 66 million lines) as it contains all known snp positions.
This command is then very slow (file a contains only 10000 positions) and sometimes is unable to run as there is not enough memory to load the b file in memory:
This is the error I get when I run it on a machine with 32GiB of RAM:
When running the command with strace:
This is the end of the log:
It would be nice to use subtractBed like this:
Where -loada loads file a in memory and b from disk.
So if file b is read line by line, it just needs to remove all entries from file a (that is loaded in memory) that are found in file b.
I know it is possible to mimic this behaviour with, the following, but it would be much easier if it was implemented in subtractBed directly:
intersectBed -wb -a allknownsnps.bed -b sample_snps.bed | subtractBed -a sample_snps.bed -b stdin
The text was updated successfully, but these errors were encountered: