Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Can't merge BAM files containing the same sample #1663
We outlawed this in our glob loader code. However, this comes up if you want to do something like load in BAM files that are split out by sample. I had a benchmarking workflow where I had split a single aligned dataset into a single file per chromosome for debug, and got this on load:
I may have mentioned this before, but I ran into the other case, where I have BAM files for 450 samples that all re-use the same read group names. I didn't come up with a good way to handle this. While including all reads from all samples in the same RDD perhaps doesn't make a lot of sense, including reads overlapping a region of interest across all samples certainly does.