[FLINK-4022] [kafka] Partition and topic pattern discovery for FlinkKafkaConsumer #3476

tzulitai · 2017-03-06T09:43:31Z

This PR adds the required internals to allow partition and topic regex pattern discovery in the FlinkKafkaConsumer.

It doesn't expose a new constructor that accepts regex topic patterns yet. I propose to expose that with https://issues.apache.org/jira/browse/FLINK-5704 (deprecate the original FlinkKafkaConsumer constructors in favor of new ones with new offset behaviours). For this reason, I also propose to update the Kafka documentation when the new constructors are added.

Design

Some description to ease review:

AbstractPartitionDiscoverer:
A AbstractPartitionDiscoverer is a stateful utility instance that remembers what partitions are discovered already. It also wraps the logic for partition-to-subtask assignment. The main run() method now has a discovery loop that calls AbstractPartitionDiscoverer#discoverPartitions() on a fixed interval. This method returns only new partitions that should be subscribed by the subtask.
The returned partitions are used to invoke AbstractFetcher#addDiscoveredPartitions(...) on the fetcher.
On a fresh startup, AbstractPartitionDiscoverer#discoverPartitions() is also used to fetch the initial seed startup partitions in open().
AbstractFetcher#addDiscoveredPartitions(...)
The fetcher now has a unassignedPartitionsQueue that contains discovered partitions not yet consumed by concrete Kafka clients. Whenever addDiscoveredPartitions(...) is called on the fetcher, the fetcher will create the state holders for the partitions, and add the partitions to the queue.
Concrete implementations of the fetcher should continuously poll this queue in the fetch loop. If partitions are found from the queue, they should be assigned for consuming.
Concrete fetchers continuously polls the queue in runFetchLoop()
For 0.8, this simply means that the original unassignedPartitionsQueue in Kafka08Fetcher is moved to the base abstract fetcher class. Nothing else is touched.
For 0.9+, queue polling and partition reassignment for the high-level consumer happens in KafkaConsumerThread.

TODOs

This PR serves as a preview for the new functionality and additional internals. Below are some pending TODOs.

Currently, partition discovery will not work correctly after restore. The reason for this is explained with TODO comments within the FlinkKafkaConsumerBase#open() method. For this to work correctly, @rmetzger and I are considering 2 options: 1) use broadcast state, or 2) assign partitions using maxParallelism and assignedKeyGroupIds instead of subtask index / number of subtasks.
The PR still lacks exactly-once integration tests with Kafka repartitioning / dynamic topics.

tzulitai · 2017-03-06T12:21:29Z

Seems like some Kafka tests are failing .. looking into it.

tzulitai · 2017-04-20T15:15:06Z

Closing this PR in a favor of an updated version ..

tzulitai mentioned this pull request Apr 6, 2017

[FLINK-6079] [kafka] Provide meaningful error message if TopicPartitions are null #3685

Closed

tzulitai added 2 commits April 20, 2017 23:09

[FLINK-4022] [kafka] Partition / topic discovery for FlinkKafkaConsumer

67c3b87

[FLINK-4022] Migrate to union list state

bf2dd78

tzulitai force-pushed the FLINK-4022 branch from 469a3c9 to bf2dd78 Compare April 20, 2017 15:15

tzulitai closed this Apr 20, 2017

tzulitai deleted the FLINK-4022 branch April 20, 2017 15:17

tzulitai restored the FLINK-4022 branch April 20, 2017 15:17

rmetzger added component=Connectors/Kafka component=Connectors/Common labels Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-4022] [kafka] Partition and topic pattern discovery for FlinkKafkaConsumer #3476

[FLINK-4022] [kafka] Partition and topic pattern discovery for FlinkKafkaConsumer #3476

tzulitai commented Mar 6, 2017 •

edited

Loading

tzulitai commented Mar 6, 2017

tzulitai commented Apr 20, 2017

[FLINK-4022] [kafka] Partition and topic pattern discovery for FlinkKafkaConsumer #3476

[FLINK-4022] [kafka] Partition and topic pattern discovery for FlinkKafkaConsumer #3476

Conversation

tzulitai commented Mar 6, 2017 • edited Loading

Design

TODOs

tzulitai commented Mar 6, 2017

tzulitai commented Apr 20, 2017

tzulitai commented Mar 6, 2017 •

edited

Loading