Added `max_bytes` option and FetchRequest_v3 usage. #962

tvoinarovskyi · 2017-01-30T10:30:17Z

Also added a check for api_version=(0, 10, 1) and MetadataRequest_v2
KIP-74
NOTE: Depends on PR #974

tvoinarovskyi · 2017-01-30T10:31:10Z

Took me long enough =) Fixes #870

tvoinarovskyi · 2017-01-30T10:37:29Z

kafka/consumer/fetcher.py

+                # they are requested, so to avoid starvation with
+                # `fetch_max_bytes` option we need this shuffle
+                partition_data = list(partition_data.items())
+                random.shuffle(partition_data)


A note to this shuffle. As python dicts and sets are not ordered, currently it will work properly without this shuffle (it will basically be shuffled by _create_fetch_requests, as it uses sets and dicts), but I still prefere it here, as:

Python3.6 has ordered dicts, which can make the distribution way less equal

While dicts are not retain insert order, they preserve key order by hashes, which can (probably) result in unequal consumption.

Please correct me if I'm wrong here. My tests run OK even without this shuffle, so I'm a bit concerned.

Weird, Github swallowed my earlier comment.

+1 for explicitly shuffling to avoid py3.6 issues.

However, since the protocol explicitly responds in the order requested, maybe this should be exposed to the user? Default to shuffling the order to avoid starvation, but allow the user to override that to always request a particular order... I can see someone using this as a sort of poor-man's priority-queue. Not sure if this is a good idea, or if it'd require reworking a lot of of kafka-python internals... just a thought.

I did have the same idea, but could not find any applications to this. And it's probably not this PR's responsibility, as it's rather a feature.

jeffwidman · 2017-01-30T17:39:02Z

kafka/consumer/group.py

+            first message in the first non-empty partition of the fetch is
+            larger than this value, the message will still be returned to
+            ensure that the consumer can make progress. NOTE: consumer performs
+            multiple fetches in parallel so memory usage will be higher.


This note is a great idea, but tweak the wording slightly? It's unclear 1) what the memory usage will be higher than , and 2) how much higher... are we talking 20%, 2x, 5x, etc?

jeffwidman · 2017-01-30T17:45:06Z

kafka/protocol/metadata.py

@@ -47,6 +47,30 @@ class MetadataResponse_v1(Struct):
    )


+class MetadataResponse_v2(Struct):


Maybe this should be in a dedicated commit? It just feels logically separate, and I'm unclear why it's needed for the Fetch_Request/Response stuff. Personally prefer a commit that says "Here's this new struct, here's the JIRA ticket KAFKA-XXXX where it was added. Requires broker version x.xx.xx.x. Here's what it changes compared to the old one" just because often spelunking in code I'll lookup the commit history to understand changes like this.

Broken out in #974.

jeffwidman · 2017-01-30T17:52:54Z

kafka/consumer/fetcher.py

+            else:
+                # As of version == 3 partitions will be returned in order as
+                # they are requested, so to avoid starvation with
+                # `fetch_max_bytes` option we need this shuffle


Maybe also add to the code comment something about Python 3.6 consistent ordering of dicts...

...this shuffle. Otherwise, in Python >= 3.6 dicts have consistent order, so can't guarantee hashing to dict/set will randomize order of partitions.

jeffwidman · 2017-02-02T22:02:46Z

kafka/consumer/group.py

+            larger than this value, the message will still be returned to
+            ensure that the consumer can make progress. NOTE: consumer performs
+            fetches to multiple nodes in parallel so memory usage will depend
+            on the number of nodes containing partitions for the topic.


Can we call them 'brokers' rather than 'nodes'?

jeffwidman · 2017-02-02T22:03:37Z

LGTM with the caveat that I didn't have time to try the code, just read through it.

dpkp

Tests are failing on py2 for some reason -- can you investigate?

dpkp · 2017-02-03T17:55:29Z

kafka/conn.py

@@ -778,7 +778,8 @@ def filter(self, record):
        log.addFilter(log_filter)

        test_cases = [
-            ((0, 10), ApiVersionRequest[0]()),
+            ((0, 10, 1), MetadataRequest[2]([])),


note to self: for any broker after 0.10 we should just use the results of ApiVersions api request to set api version support.

Should I implement it?

ah yes. forgot about this little wart of kafka brokers. so it turns out that kafka < 0.10 will silently ignore incorrect metadata api version requests (or at least not close the socket as expected). The socket behavior is different between py2 and py3, causing KafkaConnection._recv() to block indefinitely on py2.

So we can't use MetadataRequest[2] to check version and will need to implement the smarter ApiVersion approach to avoid breaking auto version checks on older brokers.

test via KAFKA_VERSION=0.8.0 tox -e py27 -- test/test_consumer_group.py::test_paused -- this hangs indefinitely on my laptop

ok, will try it.

dpkp · 2017-02-03T17:57:56Z

kafka/protocol/metadata.py

@@ -47,6 +47,30 @@ class MetadataResponse_v1(Struct):
    )


+class MetadataResponse_v2(Struct):
+    API_KEY = 3
+    API_VERSION = 1


should be API_VERSION = 2 ?

dpkp · 2017-02-03T17:58:26Z

kafka/protocol/metadata.py

@@ -47,6 +47,30 @@ class MetadataResponse_v1(Struct):
    )


+class MetadataResponse_v2(Struct):


agree -- would prefer to separate this into separate PR.

It's separated, see #974

tvoinarovskyi · 2017-02-04T16:01:19Z

As for the failing tests, I do suspect that something is strange, as they fail quite strangely on py27 and py26 on KAFKA<0.10.1, but the fail message has no sense:

ERROR: InvocationError: '/home/travis/build/dpkp/kafka-python/.tox/py27/bin/py.test --pylint --pylint-rcfile=pylint.rc --pylint-error-types=EF --cov=kafka --cov-config=.covrc'

And the tests run OK on my machine, I have no clue.

Also added a check for api_version=(0, 10, 1)

tvoinarovskyi · 2017-02-14T19:21:41Z

Changed check_version to use ApiVersionResponse result. Works quite good for me.

tvoinarovskyi · 2017-02-26T12:12:50Z

@dpkp Is there anything else for this PR, that needs to be addressed?

dpkp · 2017-03-03T18:27:57Z

kafka/conn.py

@@ -830,6 +849,10 @@ def connect():
                self._sock.setblocking(False)

            if f.succeeded():
+                if version == (0, 10):


better would be if isinstance(request, ApiVersionRequest[0]):

dpkp · 2017-03-03T18:30:29Z

kafka/conn.py

@@ -752,6 +753,24 @@ def _next_correlation_id(self):
        self._correlation_id = (self._correlation_id + 1) % 2**31
        return self._correlation_id

+    def _check_version_above_0_10(self, response):
+        test_cases = [
+            # format (<broker verion>, <needed struct>)


might note that the order here matters, and/or make sure we reverse sort it before checking for best match below

dpkp · 2017-03-03T18:32:41Z

Looks great. I added two very minor points that we can fix later (or you can address now if you have time). I'll merge this before next release.

tvoinarovskyi · 2017-03-07T12:05:04Z

Thanks for the cleanup, looks good =) I missed it on the weekend

* Added `max_bytes` option and FetchRequest_v3 usage. * Add checks for versions above 0.10 based on ApiVersionResponse

tvoinarovskyi mentioned this pull request Jan 30, 2017

WIP: Use the new FetchRequest/Response_v3 calls #946

Closed

tvoinarovskyi commented Jan 30, 2017

View reviewed changes

jeffwidman reviewed Jan 30, 2017

View reviewed changes

tvoinarovskyi mentioned this pull request Feb 2, 2017

Add MetadataRequest_v2 and MetadataResponse_v2 structures for KIP-78 #974

Merged

tvoinarovskyi force-pushed the add_fetch_max_bytes branch from 2f826b3 to bb1887e Compare February 2, 2017 17:00

jeffwidman reviewed Feb 2, 2017

View reviewed changes

dpkp requested changes Feb 3, 2017

View reviewed changes

tvoinarovskyi force-pushed the add_fetch_max_bytes branch from bb1887e to 85113a4 Compare February 4, 2017 16:13

tvoinarovskyi added 3 commits February 14, 2017 18:25

Added max_bytes option and FetchRequest_v3 usage.

69e25b7

Also added a check for api_version=(0, 10, 1)

Add some more comments

1d40e8f

Add checks for versions above 0.10 based on ApiVersionResponse

3220fbf

tvoinarovskyi force-pushed the add_fetch_max_bytes branch from 14b5b6f to 3220fbf Compare February 14, 2017 18:50

Fix Python2.6 compatibility

051ba57

dpkp reviewed Mar 3, 2017

View reviewed changes

dpkp approved these changes Mar 3, 2017

View reviewed changes

dpkp merged commit 9c19ea7 into dpkp:master Mar 6, 2017

dpkp added a commit that referenced this pull request Mar 6, 2017

Small cleanup for #962

ab2f4ff

88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Apr 7, 2017

Added max_bytes option and FetchRequest_v3 usage. (dpkp#962)

7971930

* Added `max_bytes` option and FetchRequest_v3 usage. * Add checks for versions above 0.10 based on ApiVersionResponse

88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Apr 7, 2017

Small cleanup for dpkp#962

e5c0169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `max_bytes` option and FetchRequest_v3 usage. #962

Added `max_bytes` option and FetchRequest_v3 usage. #962

tvoinarovskyi commented Jan 30, 2017 •

edited

tvoinarovskyi commented Jan 30, 2017

tvoinarovskyi Jan 30, 2017

jeffwidman Jan 30, 2017

tvoinarovskyi Jan 31, 2017

jeffwidman Jan 30, 2017 •

edited

jeffwidman Jan 30, 2017 •

edited

jeffwidman Feb 2, 2017

jeffwidman Jan 30, 2017 •

edited

jeffwidman Feb 2, 2017

jeffwidman commented Feb 2, 2017

dpkp left a comment

dpkp Feb 3, 2017

tvoinarovskyi Feb 4, 2017

dpkp Feb 5, 2017

dpkp Feb 5, 2017

tvoinarovskyi Feb 6, 2017

dpkp Feb 3, 2017

tvoinarovskyi Feb 4, 2017

dpkp Feb 3, 2017

tvoinarovskyi Feb 4, 2017

tvoinarovskyi commented Feb 4, 2017

tvoinarovskyi commented Feb 14, 2017

tvoinarovskyi commented Feb 26, 2017

dpkp Mar 3, 2017

dpkp Mar 3, 2017

dpkp commented Mar 3, 2017

tvoinarovskyi commented Mar 7, 2017

		@@ -47,6 +47,30 @@ class MetadataResponse_v1(Struct):
		)


		class MetadataResponse_v2(Struct):

Added max_bytes option and FetchRequest_v3 usage. #962

Added max_bytes option and FetchRequest_v3 usage. #962

Conversation

tvoinarovskyi commented Jan 30, 2017 • edited

tvoinarovskyi commented Jan 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffwidman Jan 30, 2017 • edited

Choose a reason for hiding this comment

jeffwidman Jan 30, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffwidman Jan 30, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffwidman commented Feb 2, 2017

dpkp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvoinarovskyi commented Feb 4, 2017

tvoinarovskyi commented Feb 14, 2017

tvoinarovskyi commented Feb 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpkp commented Mar 3, 2017

tvoinarovskyi commented Mar 7, 2017

Added `max_bytes` option and FetchRequest_v3 usage. #962

Added `max_bytes` option and FetchRequest_v3 usage. #962

tvoinarovskyi commented Jan 30, 2017 •

edited

jeffwidman Jan 30, 2017 •

edited

jeffwidman Jan 30, 2017 •

edited

jeffwidman Jan 30, 2017 •

edited