Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Kinesis based service scaling #673

Open
smazurov opened this issue Jul 13, 2023 · 7 comments
Open

[FR] Kinesis based service scaling #673

smazurov opened this issue Jul 13, 2023 · 7 comments
Assignees
Labels
autoscaling enhancement New feature or request kinesis AWS Kinesis

Comments

@smazurov
Copy link

Is your feature request related to a problem? Please describe.
I noticed there is ability to define scaling based on sqs backlog. Unfortunately, that is currently missing from kinesis implementation and thus the dream is incomplete.

Describe the solution you'd like
Ideally, we can define a scaling target similarly to sqs based on how far behind we are in processing or if read/write throughput limits are hit.

Describe alternatives you've considered
Slugging it by myself up a hill both ways (Custom CF/API calls)

@smazurov smazurov added the enhancement New feature or request label Jul 13, 2023
@JohnPreston
Copy link
Member

Hello @smazurov
Thanks for opening this PR.

I haven't used Kinesis as much as I have used SQS, and SQS was one of the very first services supported, so sorry about that.
With that said, you have the ability still to create alarms with x-alarms which you could in theory create pointing to your Kinesis data stream, and x-alarms does allow to create scaling rules for the services. We use Kafka a lot - similar to Kinesis - and scale services based on consumer lag. So I know this temporary solution would work.

See https://docs.compose-x.io/syntax/compose_x/common.html#x-resource-service-scaling-def

What I can certainly do though too is to create the Scaling section of Kinesis streams which would automatically create the alarms and scaling steps for you just like SQS.
For that, to make sure that the feature answers your needs, can you just confirm for me the metrics you would like to scale the containers from?

Let me know!

@smazurov
Copy link
Author

since thats supported, maybe just a doc can cover it. The relevant metrics are probably GetRecords.IteratorAgeMilliseconds, ReadProvisionedThroughputExceeded, and WriteProvisionedThroughputExceeded

@JohnPreston
Copy link
Member

Well if find the time to test with x-alarms and report whether it's working or not that'd be very helpful.
Also it you think that there should be a Math function to aggregate these metrics?

WriteProvisionedThroughputExceeded my understanding with "what to do" when you'd hit this limit is to autoscale the data stream itself and add shards. Same for ReadProvisionedThroughputExceeded I suppose, but you can't read from shards that haven't been written too, so the producer needs to have more shards to write to

GetRecords.IteratorAgeMilliseconds looks like the closest thing indeed from the equivalent in Kafka, called consumer lag, which represents the number of messages to read by a consumer, here I suppose the lower value means your consumers are keeping up with the volume of data to consume, if I am correct?

So here I see 2 features potentially

  • scale the ECS Services based on the volume of messages to read
  • scale the data stream to add shards when the throughput is exceeded.

Have you got autoscaling on your data stream already?

@smazurov
Copy link
Author

scaling of streams is taken care of by using on-demand kinesis. I've seen some auto scaling solutions that involve lambdas on "provisioned" mode streams.

For WriteProvisionedThroughputExceeded, it is per shard, but i think we'd still potentially want to add producers, that way whatever generates the traffic scales out horizontally and new shards are created. Same for read. This is a bit theoretical, one I would use immediately is iterator age.

@JohnPreston
Copy link
Member

Okay great, I will work to add scaling based on GetRecords.IteratorAgeMilliseconds as a default option/way to setup scaling for a data stream then.

I am going to aim for integrating StepScaling as that'd be my default way, but do you think that TargetTracking might be more appropriate here?

StepScaling when you want X containers given a range the metric value falls into.
TargetTracking is when you want AWS to auto-compute how many containers are needed to achieve the target.

@smazurov
Copy link
Author

hmmm, how would it work against IteratorAge? inverse (so older the age, more "loaded" service is)? That would work great, I think.

@JohnPreston
Copy link
Member

Yeah that's what I think makes most sense from reading the docs.
I might try myself when I get around to it, but with x-alarms if you fill in the dimensions of your data stream and set Scaling up for it to scale your service that should work.

I will try to share a feature branch in the week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoscaling enhancement New feature or request kinesis AWS Kinesis
Projects
None yet
Development

No branches or pull requests

2 participants