Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
adamGetReferenceString doesn't reduce pairs correctly #967
Throws the error
This issue manifests when trying to reduce multiple pairs of NucleotideContigFragments. Spark's reduce operation requires the function to be commutative and associative when applied to the data, which isn't necessarily the case all the time. If we have 10 fragments, each of ReferenceRegion 10kb, sometimes we will end up reducing nonadjacent regions.
To solve this issue, I implemented a solution that first collects the data and applies scala Array's reduceLeft operator so that the merged regions are always adjacent (shown below in NucleotideContigFragmentRDDFunctions). I'll submit a PR for it soon.