[BEAM-9439] Return split instead of total backlog size in KinesisIO #11377

sgraca · 2020-04-10T11:21:50Z

This merge request changes how backlog bytes are reported by KinesisReader.

Instead of total backlog bytes, which could result in Dataflow pipelines not scaling up,
split backlog is now reported.

The split backlog size can be over-estimated as it reports the size
of the records across all shards (and also assumes that all shards
in the split have the same progress).
This can lead to unnecessary decisions to scale up the number of workers
but will never fail to scale up when this is necessary.

Also the watermark policy of the reader is user configurable and we
don't have a control over it. The calculation of backlog bytes should
always take into account the time that the record was inserted into
Kinesis as this gives accurate results which are independent of
current processing time or any other conditions that reader watermark
can be using. Therefore a separate, fixed policy is used to track
processed records for the backlog calculation.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

The split backlog size can be over-estimated as it reports the size of the records across all shards (and also assumes that all shards in the split have the same progress). This can lead to unnecessary decisions to scale up the number of workers but will never fail to scale up when this is necessary. Also the watermark policy of the reader is user configurable and we don't have a control over it. The calculation of backlog bytes should always take into account the time that the record was inserted into Kinesis as this gives accurate results which are independent of current processing time or any other conditions that reader watermark can be using. Therefore a separate, fixed policy is used to track processed records for the backlog calculation.

sgraca · 2020-04-10T11:22:12Z

R: @aromanenko-dev

aromanenko-dev · 2020-04-10T15:36:19Z

@sgraca Thank you for contribution! I'll take a look on this next week.

sgraca · 2020-05-06T11:32:01Z

Hi @aromanenko-dev Have you been able to make any progress with the review?

aromanenko-dev · 2020-05-06T12:17:48Z

Run Java PreCommit

aromanenko-dev · 2020-05-06T12:23:51Z

@sgraca Thank you for pinging me, sorry for delay. I'll try to review this asap.

aromanenko-dev

Thanks, it LGTM in general, I just left one minor question.

sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java

aromanenko-dev

LGTM, thank you @sgraca for working on this.

sgraca · 2020-05-11T18:58:41Z

@aromanenko-dev Thank you for the review and the merge.

probot-autolabeler bot added io java kinesis labels Apr 10, 2020

aromanenko-dev self-requested a review April 10, 2020 15:33

aromanenko-dev approved these changes May 7, 2020

View reviewed changes

sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/ShardRecordsIterator.java Show resolved Hide resolved

aromanenko-dev approved these changes May 11, 2020

View reviewed changes

aromanenko-dev merged commit 8fdc9ce into apache:master May 11, 2020

aromanenko-dev changed the title ~~[BEAM-9439] Return split instead of total backlog size~~ [BEAM-9439] KinesisIOreturn split instead of total backlog size in KinesisIO May 11, 2020

aromanenko-dev changed the title ~~[BEAM-9439] KinesisIOreturn split instead of total backlog size in KinesisIO~~ [BEAM-9439] Return split instead of total backlog size in KinesisIO May 11, 2020

sgraca deleted the BEAM-9439-fix-backlog-estimate branch May 11, 2020 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BEAM-9439] Return split instead of total backlog size in KinesisIO #11377

[BEAM-9439] Return split instead of total backlog size in KinesisIO #11377

Uh oh!

sgraca commented Apr 10, 2020

Uh oh!

sgraca commented Apr 10, 2020

Uh oh!

aromanenko-dev commented Apr 10, 2020

Uh oh!

sgraca commented May 6, 2020

Uh oh!

aromanenko-dev commented May 6, 2020

Uh oh!

aromanenko-dev commented May 6, 2020

Uh oh!

aromanenko-dev left a comment •

edited

Loading

Uh oh!

Uh oh!

aromanenko-dev left a comment

Uh oh!

sgraca commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BEAM-9439] Return split instead of total backlog size in KinesisIO #11377

[BEAM-9439] Return split instead of total backlog size in KinesisIO #11377

Uh oh!

Conversation

sgraca commented Apr 10, 2020

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

sgraca commented Apr 10, 2020

Uh oh!

aromanenko-dev commented Apr 10, 2020

Uh oh!

sgraca commented May 6, 2020

Uh oh!

aromanenko-dev commented May 6, 2020

Uh oh!

aromanenko-dev commented May 6, 2020

Uh oh!

aromanenko-dev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aromanenko-dev left a comment

Choose a reason for hiding this comment

Uh oh!

sgraca commented May 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aromanenko-dev left a comment •

edited

Loading