Autoscale of hosts and containers based on thresholds on the metrics #3893

LRancez · 2016-03-09T18:42:02Z

This enhancement request is based on:
https://forums.rancher.com/t/rancher-host-autoscaling/1098

The idea is that we could configure the some thresholds on the metrics to scale up or scale down. This can be applied to the containers, and also to the hosts using the apis for host creation in the clouds.
We should probably need to be able to configure a minimum of containers and/or host when the autoscale is enabled so it doesn't necessarily kill everything on very low workload. Also a maximum so memory leaks or internal error don't make everything grow indefinitely.

Thinking in the current structure of Rancher, I imagine that the autoscale for host can be at the environment level of configuration and monitoring thresholds that apply to the hosts metrics. Note that autoscaling the host not necessarily means to autoscale the containers in the host. This probably should be combined with the containers autoscaling to move or scale the containers around to leverage the workload.
The containers autoscale can be simply at the service levels monitoring thresholds of the containers that had this enabled.

The flow that I imagine is:

The user configures a min threshold and a max threshold to one or more metrics that he want to react to.
The user configures a scale activation group of thresholds.
When the metric past the min or max threshold on a metric, activates a warning signal.
When the metric stabilizes, the warning signal is deactivated.
When all the metrics in the scale activation group are in a warning state, Rancher will react scaling up or down according to the configuration and it will set the scale activation group with a warning signal. Only one scale up or down should go at the same time to maintain consistency.
When the scale is completed, the metrics warn signals should be reviewed. If all the warn of the thresholds are still in a warning state, an no minimum or maximum scale had been reached, Rancher should scale again.

I personally believe that this feature will be extremely useful for cloud environments. What do you think?

Best,

Ilya-Kuchaev · 2016-03-09T20:33:05Z

+1

gdurand-globallogic · 2016-03-10T13:12:23Z

+1

CrystalMethod · 2016-03-10T14:00:34Z

👍

sbehrends · 2016-03-10T14:11:30Z

+1

patodk · 2016-03-10T14:12:40Z

+1

fernandoneto · 2016-03-10T15:02:11Z

+1

dbones · 2016-03-11T18:59:40Z

+1

Snake4life · 2016-03-14T10:16:54Z

+1

hwinkel · 2016-03-15T05:43:26Z

👍

pabloval · 2016-03-15T22:21:48Z

+1

dahendel · 2016-03-16T19:03:18Z

+1

boedy · 2016-03-16T22:15:45Z

Thanks for opening this Request LRancez. Any ideas on how this could be best implemented into Rancher. Could this be a catalog entry or should it be part of the core? Currently rancher doesn't store any tokens or credentials related to cloud providers.

I did some small test with the websockets the rancher api exposes using Digital Ocean. It's pretty strait forward creating and destroying hosts using the api. I'll have another look at this.

LRancez · 2016-03-17T15:09:19Z

Hi @boedy , thanks for the response.
It could well be a catalog entry instead of messing with the core. It could be an approach like the one used for kubernetes. It depends of each implementation if you want to scale automatically, manually or if you don't want to scale at all.

As for the credentials, I personally like and trust the way that containers handles internal connection and only exposes the things that need to be exposed. So maybe is just simply add a small db as part of the stack is more that enough for this.

oggthemiffed · 2016-04-05T09:38:15Z

+1

xlight · 2016-04-05T10:04:37Z

+1

webwurst · 2016-04-05T11:27:20Z

Please use the Add Reactions feature to "+1". This way not everyone gets notified. Thanks :)

LRancez · 2016-04-13T19:35:19Z

I'd just realize that I didn't mention it, but the ability to scale up and down on a pre defined timed schedule, in parallel to the system metrics mechanism, will also be mostly useful.
If you want, I could generate a new issue for this schedule-based scale.

frekele · 2016-05-26T13:50:02Z

The best feature. \o/

And rebalance containers between nodes. It is important also.

A help example, the closest I found to these features is:

Did you see this @alena1108 @vincent99?
Add this functionality to the roadmap would be important. :)

marsanla · 2016-07-19T13:54:36Z

+1

djaccedo · 2016-07-19T14:06:02Z

+1

silviupanaite · 2016-07-19T16:49:17Z

+1

jhelbling · 2016-07-28T11:27:59Z

+1

mrserverless · 2016-07-28T11:45:20Z

Release v1.2.0-pre1 has experiemental support for Kubernetes 1.3, so does that mean we get autoscaling out of the box if we spin up a Rancher Kubernetes environment? I'll have a go when I get some time and report back.

mrserverless · 2016-07-28T13:07:26Z

Kubernetes Horizontal Pod Autoscaling doesn't work either due to #5578

arkka · 2016-08-05T18:39:56Z

+1

ecliptik · 2017-01-09T23:36:36Z

@xaka we're mainly scaling up a cluster automatically if we load so many stacks up the CPU comes under pressure. This happens rarely since we've tuned each cluster instance type to accommodate the number of containers we're using. We don't scale hundreds of the same container out so it's either one container per cluster instance (global) or just a single container that gets deployed somewhere onto the cluster. Right now it's just a much easier way of adding/removing hosts to a cluster automatically and saves on cost since they're all Spot Instances.

For an application that needs more dedicated resources, we create an application specific cluster, since Rancher labels don't have a way for a host to ONLY run containers with a specific label currently.

This way if SpotInst sees high CPU usage it auto-scales the application cluster, and since the stack is setup globally, it brings up more hosts/containers to support it. It's like a modified method of an AWS auto-scaling-group but instead of using an AMI for the application it uses containers instead.

There are probably much better ways to do this, and as we progress with our use of Rancher and how others are doing things I'm sure it will change.

borntorock · 2017-03-31T14:46:40Z

How good are we on this? Have we implemented some parameters for automated scaling of containers in rancher cattle.

nittikkin · 2017-05-24T13:56:39Z

+1.
Would be neat if the metrics and thresholds can be configured as a property/label of some external object. Basically, provide rules to Rancher externally on how/when to scale ?

gregkeys · 2017-06-09T05:20:26Z

+1 has this been added yet?

VamshiChaitanya · 2017-06-23T12:29:55Z

+1

devopsairtrumpet · 2017-07-03T16:22:31Z

+1

hugodopradofernandes · 2017-07-14T02:02:06Z

I'm using Prometheus+Grafana, and I set a webhook on Rancher to scale up my webserver, then Grafana sends the webhooks according with the CPU value.
It could be a Rancher service that includes some simple CPU / Memory monitor and performs a curl to send the webhooks. Shouldn't be hard to set it.

hristovpln · 2017-08-10T08:48:30Z

+1

0xVasconcelos · 2017-12-11T08:11:08Z

+1

rsdomingues · 2018-02-02T15:08:52Z

+1

vingov · 2018-03-09T00:53:15Z

+1

vainkop · 2018-03-09T01:00:20Z

@hugodopradofernandes sounds good, need to try that!

+1 for integration of Prometheus+Grafana in a simmilar way out of the box!

intrasenze-app · 2018-04-09T02:59:34Z

+1

aandac · 2018-04-09T10:44:09Z

+1

benyanke · 2018-04-13T20:25:41Z

1+

Also, I would mention that one of the most underrated metrics for autoscaling is response time, if you're dealing with a typical web app.

I don't care (that much!) if my cluster is running at 97% CPU usage if the response time is staying within healthy limits. Similarly, if the response time spikes every time the CPU is above 25%, then you will still want to scale up, even though 25% cpu wouldn't be the scale up point in most situations.

In some situations, it makes sense to scale based on what actually matters, the speed of your app, not some misc symptom like cpu.

Just 2c

cwrau · 2018-04-14T19:09:58Z

+1

sst1xx · 2018-04-25T14:41:52Z

+1

paivaric · 2018-04-29T02:26:47Z

+1

OdinLin · 2018-04-29T03:27:45Z

+1

mitchellmaler · 2018-05-17T23:11:05Z

+1

vainkop · 2018-05-18T08:55:51Z

@benyanke

Prometheus metrics + webhook trigger based on them should work. But would be nice to have that functionality out of the box! Webhook for pods autoscaling + webhook for ec2 instances / hosts/ worker nodes autoscaling.

chrisingenhaag · 2018-05-30T13:28:57Z

+1

ItsReddi · 2018-06-07T11:15:59Z

Well since webhook seems a nice idea and could be a valid workaround for this missing feature.
It would be nice if webhooks will be fixed. There are some issues around it, like overriding configurations from other services, not finishing upgrade processes, drain timeouts that are lost while upgrading via webhook.

Well @rancherdev, this issue is one that is open since 2 years and im notified about a +1 minimum twice a week.
What should the community do to get this feature or a running workaround? I mean there are so many people that seem to need this.

michael-henderson · 2018-06-28T17:26:15Z

This is obviously never going to happen, its been ignored for 2 years. Its sad, but this is why Rancher is falling out of people's comparison lists when looking at container platforms. It was nice knowing you, Rancher.

empinator · 2018-07-03T10:24:00Z

@michael-henderson
AFAIK this is not entirely true. Rancher 2.x is focusing on kubernetes which already has auto scaling built-in.
From that standpoint I would understand not to prioritize this feature atm, even though I'm as frustrated as you're since I'm still running on Rancher 1.6
I haven't fully dived into it, but I assume this should work with 2.x. Maybe I'm wrong?

benmag · 2018-07-09T13:05:14Z

Since there's a lot of people wanting this, I built a little side project to act as the missing autoscale functionality for Rancher v1.6: https://autoscale.co

I already implemented autoscaling with Rancher for my own project Codemason and figured I should spin it off as a separate service anyone else who might need it. Hope it helps!

deniseschannon · 2018-08-22T04:25:25Z

With the release of Rancher 2.0, development on v1.6 is only limited to critical bug fixes and security patches.

will-chan added kind/enhancement Issues that improve or augment existing functionality release/future labels Mar 9, 2016

deniseschannon modified the milestone: Unscheduled Jun 28, 2016

deniseschannon removed the release/future label Jun 28, 2016

mrajashree self-assigned this Jan 10, 2017

deniseschannon added the status/autoclosed label Aug 22, 2018

deniseschannon closed this as completed Aug 22, 2018

Autoscale of hosts and containers based on thresholds on the metrics #3893

Autoscale of hosts and containers based on thresholds on the metrics #3893

Comments

LRancez commented Mar 9, 2016

Ilya-Kuchaev commented Mar 9, 2016

gdurand-globallogic commented Mar 10, 2016

CrystalMethod commented Mar 10, 2016

sbehrends commented Mar 10, 2016

patodk commented Mar 10, 2016

fernandoneto commented Mar 10, 2016

dbones commented Mar 11, 2016

Snake4life commented Mar 14, 2016

hwinkel commented Mar 15, 2016

pabloval commented Mar 15, 2016

dahendel commented Mar 16, 2016

boedy commented Mar 16, 2016

LRancez commented Mar 17, 2016

oggthemiffed commented Apr 5, 2016

xlight commented Apr 5, 2016

webwurst commented Apr 5, 2016

LRancez commented Apr 13, 2016

frekele commented May 26, 2016 • edited

marsanla commented Jul 19, 2016

djaccedo commented Jul 19, 2016

silviupanaite commented Jul 19, 2016

jhelbling commented Jul 28, 2016

mrserverless commented Jul 28, 2016

mrserverless commented Jul 28, 2016

arkka commented Aug 5, 2016

ecliptik commented Jan 9, 2017

borntorock commented Mar 31, 2017 • edited

nittikkin commented May 24, 2017

gregkeys commented Jun 9, 2017

VamshiChaitanya commented Jun 23, 2017

devopsairtrumpet commented Jul 3, 2017

hugodopradofernandes commented Jul 14, 2017

hristovpln commented Aug 10, 2017

0xVasconcelos commented Dec 11, 2017

rsdomingues commented Feb 2, 2018

vingov commented Mar 9, 2018

vainkop commented Mar 9, 2018

intrasenze-app commented Apr 9, 2018

aandac commented Apr 9, 2018

benyanke commented Apr 13, 2018 • edited

cwrau commented Apr 14, 2018

sst1xx commented Apr 25, 2018

paivaric commented Apr 29, 2018

OdinLin commented Apr 29, 2018

mitchellmaler commented May 17, 2018

vainkop commented May 18, 2018

chrisingenhaag commented May 30, 2018

ItsReddi commented Jun 7, 2018

michael-henderson commented Jun 28, 2018

empinator commented Jul 3, 2018 • edited

benmag commented Jul 9, 2018

deniseschannon commented Aug 22, 2018

frekele commented May 26, 2016 •

edited

borntorock commented Mar 31, 2017 •

edited

benyanke commented Apr 13, 2018 •

edited

empinator commented Jul 3, 2018 •

edited