Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscale of hosts and containers based on thresholds on the metrics #3893

Closed
LRancez opened this issue Mar 9, 2016 · 68 comments
Closed

Autoscale of hosts and containers based on thresholds on the metrics #3893

LRancez opened this issue Mar 9, 2016 · 68 comments
Assignees
Labels
kind/enhancement Issues that improve or augment existing functionality status/autoclosed

Comments

@LRancez
Copy link

LRancez commented Mar 9, 2016

This enhancement request is based on:
https://forums.rancher.com/t/rancher-host-autoscaling/1098

The idea is that we could configure the some thresholds on the metrics to scale up or scale down. This can be applied to the containers, and also to the hosts using the apis for host creation in the clouds.
We should probably need to be able to configure a minimum of containers and/or host when the autoscale is enabled so it doesn't necessarily kill everything on very low workload. Also a maximum so memory leaks or internal error don't make everything grow indefinitely.

Thinking in the current structure of Rancher, I imagine that the autoscale for host can be at the environment level of configuration and monitoring thresholds that apply to the hosts metrics. Note that autoscaling the host not necessarily means to autoscale the containers in the host. This probably should be combined with the containers autoscaling to move or scale the containers around to leverage the workload.
The containers autoscale can be simply at the service levels monitoring thresholds of the containers that had this enabled.

The flow that I imagine is:

  • The user configures a min threshold and a max threshold to one or more metrics that he want to react to.
  • The user configures a scale activation group of thresholds.
  • When the metric past the min or max threshold on a metric, activates a warning signal.
  • When the metric stabilizes, the warning signal is deactivated.
  • When all the metrics in the scale activation group are in a warning state, Rancher will react scaling up or down according to the configuration and it will set the scale activation group with a warning signal. Only one scale up or down should go at the same time to maintain consistency.
  • When the scale is completed, the metrics warn signals should be reviewed. If all the warn of the thresholds are still in a warning state, an no minimum or maximum scale had been reached, Rancher should scale again.

I personally believe that this feature will be extremely useful for cloud environments. What do you think?

Best,

@Ilya-Kuchaev
Copy link

+1

@will-chan will-chan added kind/enhancement Issues that improve or augment existing functionality release/future labels Mar 9, 2016
@gdurand-globallogic
Copy link

+1

9 similar comments
@CrystalMethod
Copy link

👍

@sbehrends
Copy link

+1

@patodk
Copy link

patodk commented Mar 10, 2016

+1

@fernandoneto
Copy link

+1

@dbones
Copy link

dbones commented Mar 11, 2016

+1

@Snake4life
Copy link

+1

@hwinkel
Copy link

hwinkel commented Mar 15, 2016

👍

@pabloval
Copy link

+1

@dahendel
Copy link

+1

@boedy
Copy link

boedy commented Mar 16, 2016

Thanks for opening this Request LRancez. Any ideas on how this could be best implemented into Rancher. Could this be a catalog entry or should it be part of the core? Currently rancher doesn't store any tokens or credentials related to cloud providers.

I did some small test with the websockets the rancher api exposes using Digital Ocean. It's pretty strait forward creating and destroying hosts using the api. I'll have another look at this.

@LRancez
Copy link
Author

LRancez commented Mar 17, 2016

Hi @boedy , thanks for the response.
It could well be a catalog entry instead of messing with the core. It could be an approach like the one used for kubernetes. It depends of each implementation if you want to scale automatically, manually or if you don't want to scale at all.

As for the credentials, I personally like and trust the way that containers handles internal connection and only exposes the things that need to be exposed. So maybe is just simply add a small db as part of the stack is more that enough for this.

@oggthemiffed
Copy link

+1

1 similar comment
@xlight
Copy link

xlight commented Apr 5, 2016

+1

@webwurst
Copy link

webwurst commented Apr 5, 2016

Please use the Add Reactions feature to "+1". This way not everyone gets notified. Thanks :)

@LRancez
Copy link
Author

LRancez commented Apr 13, 2016

I'd just realize that I didn't mention it, but the ability to scale up and down on a pre defined timed schedule, in parallel to the system metrics mechanism, will also be mostly useful.
If you want, I could generate a new issue for this schedule-based scale.

@frekele
Copy link

frekele commented May 26, 2016

The best feature. \o/

And rebalance containers between nodes. It is important also.

A help example, the closest I found to these features is:

Did you see this @alena1108 @vincent99?
Add this functionality to the roadmap would be important. :)

@marsanla
Copy link

+1

3 similar comments
@djaccedo
Copy link

+1

@silviupanaite
Copy link

+1

@jhelbling
Copy link

+1

@mrserverless
Copy link

Release v1.2.0-pre1 has experiemental support for Kubernetes 1.3, so does that mean we get autoscaling out of the box if we spin up a Rancher Kubernetes environment? I'll have a go when I get some time and report back.

@mrserverless
Copy link

Kubernetes Horizontal Pod Autoscaling doesn't work either due to #5578

@arkka
Copy link

arkka commented Aug 5, 2016

+1

@ecliptik
Copy link

ecliptik commented Jan 9, 2017

@xaka we're mainly scaling up a cluster automatically if we load so many stacks up the CPU comes under pressure. This happens rarely since we've tuned each cluster instance type to accommodate the number of containers we're using. We don't scale hundreds of the same container out so it's either one container per cluster instance (global) or just a single container that gets deployed somewhere onto the cluster. Right now it's just a much easier way of adding/removing hosts to a cluster automatically and saves on cost since they're all Spot Instances.

For an application that needs more dedicated resources, we create an application specific cluster, since Rancher labels don't have a way for a host to ONLY run containers with a specific label currently.

This way if SpotInst sees high CPU usage it auto-scales the application cluster, and since the stack is setup globally, it brings up more hosts/containers to support it. It's like a modified method of an AWS auto-scaling-group but instead of using an AMI for the application it uses containers instead.

There are probably much better ways to do this, and as we progress with our use of Rancher and how others are doing things I'm sure it will change.

@mrajashree mrajashree self-assigned this Jan 10, 2017
@borntorock
Copy link

borntorock commented Mar 31, 2017

How good are we on this? Have we implemented some parameters for automated scaling of containers in rancher cattle.

@nittikkin
Copy link

+1.
Would be neat if the metrics and thresholds can be configured as a property/label of some external object. Basically, provide rules to Rancher externally on how/when to scale ?

@gregkeys
Copy link

gregkeys commented Jun 9, 2017

+1 has this been added yet?

@VamshiChaitanya
Copy link

+1

1 similar comment
@devopsairtrumpet
Copy link

+1

@hugodopradofernandes
Copy link

I'm using Prometheus+Grafana, and I set a webhook on Rancher to scale up my webserver, then Grafana sends the webhooks according with the CPU value.
It could be a Rancher service that includes some simple CPU / Memory monitor and performs a curl to send the webhooks. Shouldn't be hard to set it.

@hristovpln
Copy link

+1

3 similar comments
@0xVasconcelos
Copy link

+1

@rsdomingues
Copy link

+1

@vingov
Copy link

vingov commented Mar 9, 2018

+1

@vainkop
Copy link

vainkop commented Mar 9, 2018

@hugodopradofernandes sounds good, need to try that!

+1 for integration of Prometheus+Grafana in a simmilar way out of the box!

@intrasenze-app
Copy link

+1

1 similar comment
@aandac
Copy link

aandac commented Apr 9, 2018

+1

@benyanke
Copy link

benyanke commented Apr 13, 2018

1+

Also, I would mention that one of the most underrated metrics for autoscaling is response time, if you're dealing with a typical web app.

I don't care (that much!) if my cluster is running at 97% CPU usage if the response time is staying within healthy limits. Similarly, if the response time spikes every time the CPU is above 25%, then you will still want to scale up, even though 25% cpu wouldn't be the scale up point in most situations.

In some situations, it makes sense to scale based on what actually matters, the speed of your app, not some misc symptom like cpu.

Just 2c

@cwrau
Copy link

cwrau commented Apr 14, 2018

+1

4 similar comments
@sst1xx
Copy link

sst1xx commented Apr 25, 2018

+1

@paivaric
Copy link

+1

@OdinLin
Copy link

OdinLin commented Apr 29, 2018

+1

@mitchellmaler
Copy link

+1

@vainkop
Copy link

vainkop commented May 18, 2018

@benyanke

Prometheus metrics + webhook trigger based on them should work. But would be nice to have that functionality out of the box! Webhook for pods autoscaling + webhook for ec2 instances / hosts/ worker nodes autoscaling.

@chrisingenhaag
Copy link

+1

@ItsReddi
Copy link

ItsReddi commented Jun 7, 2018

Well since webhook seems a nice idea and could be a valid workaround for this missing feature.
It would be nice if webhooks will be fixed. There are some issues around it, like overriding configurations from other services, not finishing upgrade processes, drain timeouts that are lost while upgrading via webhook.

Well @rancherdev, this issue is one that is open since 2 years and im notified about a +1 minimum twice a week.
What should the community do to get this feature or a running workaround? I mean there are so many people that seem to need this.

@michael-henderson
Copy link

This is obviously never going to happen, its been ignored for 2 years. Its sad, but this is why Rancher is falling out of people's comparison lists when looking at container platforms. It was nice knowing you, Rancher.

@empinator
Copy link

empinator commented Jul 3, 2018

@michael-henderson
AFAIK this is not entirely true. Rancher 2.x is focusing on kubernetes which already has auto scaling built-in.
From that standpoint I would understand not to prioritize this feature atm, even though I'm as frustrated as you're since I'm still running on Rancher 1.6
I haven't fully dived into it, but I assume this should work with 2.x. Maybe I'm wrong?

@benmag
Copy link

benmag commented Jul 9, 2018

Since there's a lot of people wanting this, I built a little side project to act as the missing autoscale functionality for Rancher v1.6: https://autoscale.co

I already implemented autoscaling with Rancher for my own project Codemason and figured I should spin it off as a separate service anyone else who might need it. Hope it helps!

@deniseschannon
Copy link

With the release of Rancher 2.0, development on v1.6 is only limited to critical bug fixes and security patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues that improve or augment existing functionality status/autoclosed
Projects
None yet
Development

No branches or pull requests