[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable #18234

markgrover · 2017-06-07T17:33:43Z

What changes were proposed in this pull request?

Add a new property spark.streaming.kafka.consumer.cache.enabled that allows users to enable or disable the cache for Kafka consumers. This property can be especially handy in cases where issues like SPARK-19185 get hit, for which there isn't a solution committed yet. By default, the cache is still on, so this change doesn't change any out-of-box behavior.

How was this patch tested?

Running unit tests

markgrover · 2017-06-07T17:36:58Z

This is related to but is a stripped down version of #16629.

SparkQA · 2017-06-07T17:52:22Z

Test build #77799 has finished for PR 18234 at commit 68ca3f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-06-07T18:03:26Z

docs/streaming-kafka-0-10-integration.md

@@ -91,7 +91,7 @@ The new Kafka consumer API will pre-fetch messages into buffers.  Therefore it i

 In most cases, you should use `LocationStrategies.PreferConsistent` as shown above.  This will distribute partitions evenly across available executors.  If your executors are on the same hosts as your Kafka brokers, use `PreferBrokers`, which will prefer to schedule partitions on the Kafka leader for that partition.  Finally, if you have a significant skew in load among partitions, use `PreferFixed`. This allows you to specify an explicit mapping of partitions to hosts (any unspecified partitions will use a consistent location).

-The cache for consumers has a default maximum size of 64.  If you expect to be handling more than (64 * number of executors) Kafka partitions, you can change this setting via `spark.streaming.kafka.consumer.cache.maxCapacity`
+The cache for consumers has a default maximum size of 64.  If you expect to be handling more than (64 * number of executors) Kafka partitions, you can change this setting via `spark.streaming.kafka.consumer.cache.maxCapacity`. If you would like to disable the caching for Kafka consumers, you can set `spark.streaming.kafka.consumer.cache.enabled` to `false`.


It might be more conservative to not even publicly document this option, if it's intended as a fairly temporary safety-valve. But I'm not against it. At least it should be clear it's not necessarily going to be guaranteed to be there going forward.

I am open to either option. I'd slightly prefer documenting that option and saying something about no guarantee.

Thanks for reviewing, Sean. @koeninger would appreciate your review as well, thanks.

Code change LGTM.

I'd prefer clarifying / adding caveats to the documentation, rather than leaving it undocumented.

ok, thanks I have updated the doc to add a little more context. If you could review and help this get committed, I'd really appreciate that. Thanks again!

SparkQA · 2017-06-08T01:12:17Z

Test build #77802 has finished for PR 18234 at commit 2f60741.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-06-08T01:39:42Z

If there are no more comments I'll push this in the morning.

I'd have filed a separate bug since now SPARK-19185 will forever be "in progress" (unless an admin sees this and changes its status), but too late now.

koeninger · 2017-06-08T01:41:14Z

LGTM, thanks Mark

markgrover · 2017-06-08T01:46:34Z

Thanks all.

…

On Jun 7, 2017 6:41 PM, "Cody Koeninger" ***@***.***> wrote: LGTM, thanks Mark — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18234 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABoVi-xu4UnJIgzldw0p3CMSFqDao7kKks5sB1FGgaJpZM4NzCo1> .

srowen · 2017-06-08T09:42:41Z

@vanzin I think it's appropriate to attach this to the existing issue, because it's inherently connected to any other changes that follow. We can definitely un-mark it as In Progress.

vanzin · 2017-06-08T16:55:19Z

Merging to master / 2.2.

## What changes were proposed in this pull request? Add a new property `spark.streaming.kafka.consumer.cache.enabled` that allows users to enable or disable the cache for Kafka consumers. This property can be especially handy in cases where issues like SPARK-19185 get hit, for which there isn't a solution committed yet. By default, the cache is still on, so this change doesn't change any out-of-box behavior. ## How was this patch tested? Running unit tests Author: Mark Grover <mark@apache.org> Author: Mark Grover <grover.markgrover@gmail.com> Closes #18234 from markgrover/spark-19185. (cherry picked from commit 55b8cfe) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

markgrover · 2017-06-08T16:59:22Z

Thanks @vanzin

[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable

68ca3f3

srowen reviewed Jun 7, 2017

View reviewed changes

Create streaming-kafka-0-10-integration.md

2f60741

asfgit closed this in 55b8cfe Jun 8, 2017

markgrover deleted the spark-19185 branch June 8, 2017 17:01

gaborgsomogyi mentioned this pull request Mar 1, 2018

[SPARK-19185][SS] Make Kafka consumer cache configurable #20703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable #18234

[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable #18234

markgrover commented Jun 7, 2017

markgrover commented Jun 7, 2017

SparkQA commented Jun 7, 2017

srowen Jun 7, 2017

markgrover Jun 7, 2017

koeninger Jun 7, 2017

markgrover Jun 8, 2017

SparkQA commented Jun 8, 2017

vanzin commented Jun 8, 2017

koeninger commented Jun 8, 2017

markgrover commented Jun 8, 2017 via email

srowen commented Jun 8, 2017

vanzin commented Jun 8, 2017

markgrover commented Jun 8, 2017

[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable #18234

[SPARK-19185][DSTREAM] Make Kafka consumer cache configurable #18234

Conversation

markgrover commented Jun 7, 2017

What changes were proposed in this pull request?

How was this patch tested?

markgrover commented Jun 7, 2017

SparkQA commented Jun 7, 2017

srowen Jun 7, 2017

Choose a reason for hiding this comment

markgrover Jun 7, 2017

Choose a reason for hiding this comment

koeninger Jun 7, 2017

Choose a reason for hiding this comment

markgrover Jun 8, 2017

Choose a reason for hiding this comment

SparkQA commented Jun 8, 2017

vanzin commented Jun 8, 2017

koeninger commented Jun 8, 2017

markgrover commented Jun 8, 2017 via email

srowen commented Jun 8, 2017

vanzin commented Jun 8, 2017

markgrover commented Jun 8, 2017