Skip to content

Conversation

@ms7s
Copy link
Contributor

@ms7s ms7s commented Dec 6, 2016

Current partitioners assume that the partitions are sorted according
to partition ID in all_partitions. However, this is not guaranteed
in the KafkaProducer implementation as the values that are passed
come from a set. Sets are not guaranteed to iterate values in any
particular order, so we need to sort the values before passing
them further along.

Before this change, the code depended on internal implementation of
Python interpreters. In CPython 3.5 and lower it seems that integers
are returned in sorted order from sets so the code appears to work.
In PyPy and CPython 3.6, sets and dictionaries preserve the order
of insertions [1] which means that the code may not work in these
environments (I have not tested this). As far as I could find,
the order of partitions used in this case is the order that is
returned by the broker, but the documentation does not say anything
about partition order.

[1] https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-compactdict

Current partitioners assume that the partitions are sorted according
to partition ID in all_partitions. However, this is not guaranteed
in the KafkaProducer implementation as the values that are passed
come from a set. Sets are not guaranteed to iterate values in any
particular order, so we need to sort the values before passing
them further along.

Before this change, the code depended on internal implementation of
Python interpreters. In CPython 3.5 and lower it seems that integers
are returned in sorted order from sets so the code appears to work.
In PyPy and CPython 3.6, sets and dictionaries preserve the order
of insertions [1] which means that the code may not work in these
environments (I have not tested this). As far as I could find,
the order of partitions used in this case is the order that is
returned by the broker, but the documentation does not say anything
about partition order.

[1] https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-compactdict
return partition

all_partitions = list(self._metadata.partitions_for_topic(topic))
all_partitions = sorted(list(self._metadata.partitions_for_topic(topic)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted will operate on a set, so I think we can drop the explicit list conversion

@dpkp dpkp merged commit 46f9b1f into dpkp:master Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants