Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Add filterBySequenceDictionary to GenomicRDD #1575
So, I didn't quite understand what was going on back on #1557 (comment), and I still don't quite understand what this method would do. Is it "filter items mapped to contigs present in a sequence dictionary"? If so, we should be able to mash
I was mistaken earlier in that I thought the sequence dictionary was used in region joins. The question is really what to do when reading in features:
When reading in a BED file:
When reading in a BED file with a genome file:
The thinking behind this suggestion was to load records without regard for sequence dictionary and then filter after the fact, if desired.
It used to be.
WRT BED, my feeling is empty sequence dictionary in the first case, load records without regard for sequence dictionary in the second case. We don't currently validate reads or variants/genotypes against a sequence dictionary, so why add checking specific to feature file formats?