Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add filterBySequenceDictionary to GenomicRDD #1575
referenced this issue
Jun 21, 2017
So, I didn't quite understand what was going on back on #1557 (comment), and I still don't quite understand what this method would do. Is it "filter items mapped to contigs present in a sequence dictionary"? If so, we should be able to mash
I was mistaken earlier in that I thought the sequence dictionary was used in region joins. The question is really what to do when reading in features:
When reading in a BED file:
When reading in a BED file with a genome file:
The thinking behind this suggestion was to load records without regard for sequence dictionary and then filter after the fact, if desired.
It used to be.
WRT BED, my feeling is empty sequence dictionary in the first case, load records without regard for sequence dictionary in the second case. We don't currently validate reads or variants/genotypes against a sequence dictionary, so why add checking specific to feature file formats?