Dynamic auto scale Kinesis-Stream ingest tasks #10985

zhangyue19921010 · 2021-03-12T08:25:37Z

Followed PR for #10524.
The core logic is the same, difference lies in the implementation of computeLagStats between Kafka and Kinesis.

Description

This PR implements and documents the autoscaler based on ingest/kinesis/lag/time metrics for kinesis indexing service.
Also only LagBased autoScalerStrategy is supported for now.

Key changed/added classes in this PR

KinesisSupervisor.java
KinesisSupervisorIOConfig.java

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

zhangyue19921010 · 2021-03-12T10:51:34Z

Failed CI jobs are not this PR related. Maybe retry will be passed :)

suneet-s · 2021-03-12T18:13:13Z

#10691 documents a few known flaky tests. Any ideas on how to make them less flaky will be much appreciated. I've re-triggered the failing tests - just to see if that fixes it

zhangyue19921010 · 2021-03-13T05:06:00Z

#10691 documents a few known flaky tests. Any ideas on how to make them less flaky will be much appreciated. I've re-triggered the failing tests - just to see if that fixes it

Thanks for your help. Sure I will keep an eye on these jobs and try to do some tunning work if I can.

zhangyue19921010 · 2021-04-01T03:35:27Z

This Change has been running in our Dev cluster for 2 weeks and works fine. So that I believe it is ready to be reviewed.

Hi @pjain1 Sorry to bother you. Are you available to help me review this code? Since you have review the original design and may be more familiar with Stream Tasks autoscaler.

Will appreciate it very much if you could lend me a hand :)

zhangyue19921010 · 2021-04-07T10:01:56Z

Also kinesis-index and kinesis-data-format are passed on my laptop.

kinesis-index :

kinesis-data-format :

suneet-s · 2021-07-13T01:35:49Z

@zhangyue19921010 any learnings from running in your dev cluster over the last few months? I've just started reviewing the change now. But hearing feedback first hand of anything you've noticed is very helpful! Thanks for your patience on this review.

zhangyue19921010 · 2021-07-13T05:57:02Z

Hi @suneet-s Thanks a lot for your attention. As far as I know this patch works fine on our dev/stg cluster and will deploy to PRD cluster soon.

pjain1

code changes looks good to me, never used kinesis indexing service so not so sure about the time based lag thing and how it behaves. If someone is using kinesis service, it would be good if they can try it on their cluster.

techdocsmith

I added some suggestions as far as language and style the scaleOutThreshold and scaleInThreshold logic is confusing to me.

docs/development/extensions-core/kafka-ingestion.md

docs/development/extensions-core/kinesis-ingestion.md

techdocsmith · 2021-08-10T00:08:18Z

docs/development/extensions-core/kinesis-ingestion.md

+| `lagCollectionRangeMillis` | The total time window of lag collection, Use with `lagCollectionIntervalMillis`，it means that in the recent `lagCollectionRangeMillis`, collect lag metric points every `lagCollectionIntervalMillis`. | no (default == 600000) |
+| `scaleOutThreshold` | The Threshold of scale out action | no (default == 6000000) |
+| `triggerScaleOutFractionThreshold` | If `triggerScaleOutFractionThreshold` percent of lag points are higher than `scaleOutThreshold`, then do scale out action. | no (default == 0.3) |
+| `scaleInThreshold` | The Threshold of scale in action | no (default == 1000000) |


same comments as for scale out

docs/development/extensions-core/kinesis-ingestion.md

clintropolis

code changes lgtm 👍

@techdocsmith do the doc changes need to be fixed or is it ok to do as a follow-up?

techdocsmith · 2021-08-26T20:16:21Z

@clintropolis , assuming I didn't break anything w/ my suggested edits, docs changes LGTM.

remove leading `

add missing `

yuezhang added 4 commits March 11, 2021 13:50

ready to test

f8834d4

revert misc.xml

2d70cc3

document kinesis md

fc89da1

Merge branch 'master' into kinesis-dynamic-scale-ingest-task

712ff53

zhangyue19921010 changed the title ~~Kinesis Dynamic Scale Ingest Task~~ Dynamic auto scale Kinesis-Stream ingest tasks Mar 12, 2021

suneet-s added Area - Streaming Ingestion Release Notes labels Mar 12, 2021

zhangyue19921010 requested a review from pjain1 April 1, 2021 03:35

suneet-s added the Area - Documentation label Jul 13, 2021

pjain1 approved these changes Jul 13, 2021

View reviewed changes

techdocsmith reviewed Aug 10, 2021

View reviewed changes

clintropolis approved these changes Aug 26, 2021

View reviewed changes

techdocsmith added 11 commits August 26, 2021 12:39

Update docs/development/extensions-core/kafka-ingestion.md

830aa8c

Update docs/development/extensions-core/kinesis-ingestion.md

0de88ae

Update docs/development/extensions-core/kinesis-ingestion.md

56ab0ff

Update docs/development/extensions-core/kinesis-ingestion.md

7f00511

Update docs/development/extensions-core/kinesis-ingestion.md

6ab73a4

Update docs/development/extensions-core/kinesis-ingestion.md

ac721c9

Update docs/development/extensions-core/kinesis-ingestion.md

c402e9e

Update docs/development/extensions-core/kinesis-ingestion.md

47cb43a

Update docs/development/extensions-core/kinesis-ingestion.md

ba0ba3c

Update docs/development/extensions-core/kinesis-ingestion.md

dbe6c2d

Update docs/development/extensions-core/kinesis-ingestion.md

101a802

techdocsmith added 2 commits August 26, 2021 16:59

Update kafka-ingestion.md

9a667c3

remove leading `

Update kinesis-ingestion.md

96ab362

add missing `

clintropolis merged commit 6d14ea2 into apache:master Aug 30, 2021

clintropolis added this to the 0.22.0 milestone Sep 3, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic auto scale Kinesis-Stream ingest tasks #10985

Dynamic auto scale Kinesis-Stream ingest tasks #10985

zhangyue19921010 commented Mar 12, 2021 •

edited

Loading

zhangyue19921010 commented Mar 12, 2021

suneet-s commented Mar 12, 2021

zhangyue19921010 commented Mar 13, 2021

zhangyue19921010 commented Apr 1, 2021

zhangyue19921010 commented Apr 7, 2021

suneet-s commented Jul 13, 2021

zhangyue19921010 commented Jul 13, 2021

pjain1 left a comment •

edited

Loading

techdocsmith left a comment

techdocsmith Aug 10, 2021

clintropolis left a comment

techdocsmith commented Aug 26, 2021

Dynamic auto scale Kinesis-Stream ingest tasks #10985

Dynamic auto scale Kinesis-Stream ingest tasks #10985

Conversation

zhangyue19921010 commented Mar 12, 2021 • edited Loading

Description

Key changed/added classes in this PR

zhangyue19921010 commented Mar 12, 2021

suneet-s commented Mar 12, 2021

zhangyue19921010 commented Mar 13, 2021

zhangyue19921010 commented Apr 1, 2021

zhangyue19921010 commented Apr 7, 2021

suneet-s commented Jul 13, 2021

zhangyue19921010 commented Jul 13, 2021

pjain1 left a comment • edited Loading

Choose a reason for hiding this comment

techdocsmith left a comment

Choose a reason for hiding this comment

techdocsmith Aug 10, 2021

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

techdocsmith commented Aug 26, 2021

zhangyue19921010 commented Mar 12, 2021 •

edited

Loading

pjain1 left a comment •

edited

Loading