Skip to content

[BEAM-2134] expose AUTO_COMMIT to KafkaIO.read()#2837

Closed
mingmxu wants to merge 3 commits intoapache:masterfrom
mingmxu:BEAM-2134
Closed

[BEAM-2134] expose AUTO_COMMIT to KafkaIO.read()#2837
mingmxu wants to merge 3 commits intoapache:masterfrom
mingmxu:BEAM-2134

Conversation

@mingmxu
Copy link

@mingmxu mingmxu commented May 2, 2017

…e visible.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify.
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

@mingmxu
Copy link
Author

mingmxu commented May 2, 2017

R: @rangadi

@rangadi
Copy link
Contributor

rangadi commented May 2, 2017

I agree with updating JavaDoc.

I am not so sure about adding another method. e.g. auto_commit alone is not enough, user also needs to set a group.id.

@mingmxu
Copy link
Author

mingmxu commented May 2, 2017

I see, without group.id different consumers will impact each others. I think the updated doc should be clear enough, thoughts?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 69.881% when pulling e2a8d1c on XuMingmin:BEAM-2134 into c2c89ed on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 69.905% when pulling 0896331 on XuMingmin:BEAM-2134 into c2c89ed on apache:master.

* also set the initial offset of next run as last committed when checkpoint is not
* provided by runners.
*/
public Read<K, V> withOffsetAutoCommit(boolean autoCommit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return withUpdatedConfing(...); would be simpler.

* consuming from the <em>latest</em> offsets. You can override this behavior to consume from the
* beginning by setting appropriate appropriate properties in {@link ConsumerConfig}, through
* {@link Read#updateConsumerProperties(Map)}.
* In this case, you can also enable offset auto_commit in Kafka to resume from last commited,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove 'In this case,'?

* <p>In summary, KafkaIO.read follows below sequence to set initial offset:<br>
* 1. {@link KafkaCheckpointMark} provided by runner;<br>
* 2. Consumer offset stored in Kafka when {@code withOffsetAutoCommit(true)};<br>
* 3. Start from <em>latest</em> offset by default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the last two are not explicitly enforced by KafkaIO. It is the KafkaConnsumer that behaves this way. I think we should make it explicit. Could be something like:

Starts from KafkaCheckpointMark if it provided runner (most users don't know when runner would provide this, some link about this would be useful). Otherwise, the initial offset is dictated by the KafkaConsumer configuration (link to KafkaConsumer description about offiset initialization). The users have complete control on this configuration, including enabling `auto_commit` of offsets. "auto.offset.reset" is set to "latest" by default, and can be overridden by the user (see withUpdatedConfigProperties()).   

* committed in the background. This information can be used by external monitoring system,
* also set the initial offset of next run as last committed when checkpoint is not
* provided by runners.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config also requires users to set group.id. Should we enforce it here? We should certainly mention it in the documentation.

* {@link UnboundedKafkaSource#split(int, PipelineOptions)} for more details on
*
* <p>Checkpointing is fully supported and each split can resume from previous checkpoint,
* if it's supported and enabled in runner configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be rephrased [Optional]:
"Checkpointing is fully supported and each split can resume from previous checkpoint (to the extent supported a runner)"...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the extent supported by runner?

Copy link
Contributor

@rangadi rangadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Two minor suggestions above.

* consuming from the <em>latest</em> offsets. You can override this behavior to consume from the
* beginning by setting appropriate appropriate properties in {@link ConsumerConfig}, through
* {@link Read#updateConsumerProperties(Map)}.
* In this case, you can also enable offset auto_commit in Kafka to resume from last committed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove 'In this case,'.

@rangadi
Copy link
Contributor

rangadi commented May 3, 2017

Thanks @xumingmin!. LGTM.

@mingmxu
Copy link
Author

mingmxu commented May 3, 2017

Thank you @rangadi !

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 69.954% when pulling ffcff7f on XuMingmin:BEAM-2134 into c2c89ed on apache:master.

@mingmxu
Copy link
Author

mingmxu commented May 4, 2017

retest this please

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 70.552% when pulling ffcff7f on XuMingmin:BEAM-2134 into c2c89ed on apache:master.

@asfgit asfgit closed this in f8ae118 May 4, 2017
@mingmxu mingmxu deleted the BEAM-2134 branch May 5, 2017 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants