Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Using the Cluster Autoscaler with Agones #368
The overall design to use the open source cluster autoscaler revolves around using the cluster autoscaler
This gives us the following benefits:
Since the autoscaler can be implement with Agones GameServers -- this essentially means that scaling and autoscaling can be essentially managed at a Fleet level.
Write a scheduler that bin packs all our pods into as tight a cluster as possible
A custom scheduler will be built that will prioritise the scheduling of
(Unless there is a way to do this with the default scheduler, but I've not found one so far -- best I could find was PreferredDuringSchedulingIgnoredDuringExecution on HostName)
Prioritise Allocating GameServers from Nodes that already have Allocated GameServers
To also make it easier to scale down, we essentially want to bin-pack as many allocated game servers on a single node as much as possible.
To that end, the
This ensures that we don't end up with (as much as possible) a "swiss cheese" problem, with Allocated game servers spread out across the cluster. Bin packing Allocated
When Fleets get shrunk, prioritise removal from Nodes with the least number of GameServer Pods
Again, to make it easier to create empty nodes when scaling down Fleets, prioritise removing un-allocated
Mark All GameServer Pods as not "safe-to-evict"
If a Pod has the annotation
Since we are bin packing through our custom scheduler, we won't actually have a need to move existing
Mark the Agones controller as
This is great, I think one potential issue is when to scale up. For our servers, we use stateful Unity game servers and having an option where we can always leave one empty server running would be desired. I'm wondering what's the best way for demand to be fed into the autoscaler.
Regarding scale down, the defaults of 10 minutes are fine for me, given that it will ensure that spin up times are reasonably fast.
@GabeBigBoxVR you would essentially control scale up/down by scaling up and down your Fleets - you cluster would then adjust to the space the Fleet was taking up accordingly. It's actually quite a nice model, as you only need to influence one part of your config - the rest happens automatically.
You can see some work on a Fleet autoscaler in #340
referenced this issue
Oct 2, 2018
Updated with what I think this is the working design - much simpler details, and much less room for error.
See history for changes if you want to see previous versions.
The only leftover question I have is about how the scheduler will work, but I think that's a solvable problem, and just requires some research.