Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Can't merge BAM files containing the same sample #1663
We outlawed this in our glob loader code. However, this comes up if you want to do something like load in BAM files that are split out by sample. I had a benchmarking workflow where I had split a single aligned dataset into a single file per chromosome for debug, and got this on load:
I may have mentioned this before, but I ran into the other case, where I have BAM files for 450 samples that all re-use the same read group names. I didn't come up with a good way to handle this. While including all reads from all samples in the same RDD perhaps doesn't make a lot of sense, including reads overlapping a region of interest across all samples certainly does.
Resolves bigdatagenomics#1663. When unioning two read groups together, calls distinct after appending the two RecordGroupDictionaries.
Resolves #1663. When unioning two read groups together, calls distinct after appending the two RecordGroupDictionaries.