-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic auto scale Kinesis-Stream ingest tasks #10985
Dynamic auto scale Kinesis-Stream ingest tasks #10985
Conversation
Failed CI jobs are not this PR related. Maybe retry will be passed :) |
#10691 documents a few known flaky tests. Any ideas on how to make them less flaky will be much appreciated. I've re-triggered the failing tests - just to see if that fixes it |
Thanks for your help. Sure I will keep an eye on these jobs and try to do some tunning work if I can. |
This Change has been running in our Dev cluster for 2 weeks and works fine. So that I believe it is ready to be reviewed. Hi @pjain1 Sorry to bother you. Are you available to help me review this code? Since you have review the original design and may be more familiar with Stream Tasks autoscaler. Will appreciate it very much if you could lend me a hand :) |
@zhangyue19921010 any learnings from running in your dev cluster over the last few months? I've just started reviewing the change now. But hearing feedback first hand of anything you've noticed is very helpful! Thanks for your patience on this review. |
Hi @suneet-s Thanks a lot for your attention. As far as I know this patch works fine on our dev/stg cluster and will deploy to PRD cluster soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes looks good to me, never used kinesis indexing service so not so sure about the time based lag thing and how it behaves. If someone is using kinesis service, it would be good if they can try it on their cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some suggestions as far as language and style the scaleOutThreshold and scaleInThreshold logic is confusing to me.
| `lagCollectionRangeMillis` | The total time window of lag collection, Use with `lagCollectionIntervalMillis`,it means that in the recent `lagCollectionRangeMillis`, collect lag metric points every `lagCollectionIntervalMillis`. | no (default == 600000) | | ||
| `scaleOutThreshold` | The Threshold of scale out action | no (default == 6000000) | | ||
| `triggerScaleOutFractionThreshold` | If `triggerScaleOutFractionThreshold` percent of lag points are higher than `scaleOutThreshold`, then do scale out action. | no (default == 0.3) | | ||
| `scaleInThreshold` | The Threshold of scale in action | no (default == 1000000) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as for scale out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes lgtm 👍
@techdocsmith do the doc changes need to be fixed or is it ok to do as a follow-up?
@clintropolis , assuming I didn't break anything w/ my suggested edits, docs changes LGTM. |
remove leading `
add missing `
Followed PR for #10524.
The core logic is the same, difference lies in the implementation of
computeLagStats
between Kafka and Kinesis.Description
This PR implements and documents the autoscaler based on
ingest/kinesis/lag/time
metrics for kinesis indexing service.Also only LagBased autoScalerStrategy is supported for now.
Key changed/added classes in this PR
KinesisSupervisor.java
KinesisSupervisorIOConfig.java
This PR has: