Do reference partitioners restrict a partition to contain keys from a single contig? #573

Closed
fnothaft opened this Issue Feb 8, 2015 · 1 comment

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Feb 8, 2015

I wasn't sure about this and couldn't tell 100% for sure from looking at the code. When the two reference partitioners map keys to a partition, do they ensure that a partition only contains keys from a single reference contig?

This is relevant to my interests because I've got code that currently assumes that after a:

rdd.keyBy(ReferencePosition(_))
  .repartitionAndSortWithinPartitions(new GenomicPositionPartitioner(...))

The _.pos of all keys will be monotonically increasing (i.e., I'll never wrap from the end of one contig to the start of the next contig). I don't think it's a big deal either way (i.e., it would be easy to make my code handle wrapping), but it isn't documented in the partitioners file and it does impact whether my code will work or not.

@fnothaft fnothaft added the question label Feb 8, 2015

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Feb 27, 2015

Contributor

Looks to me like a single partition can have multiple contigs. I believe GenomicRegionPartitioner wil not though.

Contributor

laserson commented Feb 27, 2015

Looks to me like a single partition can have multiple contigs. I believe GenomicRegionPartitioner wil not though.

@fnothaft fnothaft closed this Jul 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment