New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the Cluster Autoscaler with Agones #368

Closed
markmandel opened this Issue Oct 2, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@markmandel
Collaborator

markmandel commented Oct 2, 2018

Problem

  • It is desirable to autoscale the Kubernetes cluster to account for increases in player count, and need for more game servers, in a way that will work across cloud
  • There currently exists a generic cluster autoscaler project -- but it is mostly targeted at stateless workloads

Design

The overall design to use the open source cluster autoscaler revolves around using the cluster autoscaler
to remove empty nodes as they appear, but specifically disallows the autoscaler from attempting to evict GameServer Pods in an attempt to move them to a new Node.

This gives us the following benefits:

  • Implementations for multiple cloud providers are already written and community tested
  • Load testing (on GKE) has been done for us
  • Existing SLOs already exist for the autoscaler
  • Removes any possibility for the autoscaler to cause race conditions when allocating GameServers.

Since the autoscaler can be implement with Agones GameServers -- this essentially means that scaling and autoscaling can be essentially managed at a Fleet level.
If you want a bigger cluster, increase the size of your Fleet, and your cluster will adjust. If you want a smaller cluster, shrink your fleet, and the cluster will adjust.

Implementation

Write a scheduler that bin packs all our pods into as tight a cluster as possible

A custom scheduler will be built that will prioritise the scheduling of GameServer Pods onto Nodes that have the most GameServer pods.
This will ease scaling down, as it will mean the game servers aren't spread out of many Nodes, and there is wasted resource space.

(Unless there is a way to do this with the default scheduler, but I've not found one so far -- best I could find was PreferredDuringSchedulingIgnoredDuringExecution on HostName)

Prioritise Allocating GameServers from Nodes that already have Allocated GameServers

To also make it easier to scale down, we essentially want to bin-pack as many allocated game servers on a single node as much as possible.

To that end, the allocate() function will order the GameServers it is considering by the number of other GameServers that exist on the same node as it.

This ensures that we don't end up with (as much as possible) a "swiss cheese" problem, with Allocated game servers spread out across the cluster. Bin packing Allocated GameServers
makes it much easier Fleets to scale down in a way that will leave empty nodes for the autoscaler to delete.

When Fleets get shrunk, prioritise removal from Nodes with the least number of GameServer Pods

Again, to make it easier to create empty nodes when scaling down Fleets, prioritise removing un-allocated GameServer Pods from Nodes with the least number
of GameServers Pods currently on them.

Mark All GameServer Pods as not "safe-to-evict"

If a Pod has the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: false, then the Node that the Pod is on cannot be removed.

Therefore, all GameServer Pods should have the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: false, so the autoscaler will not attempt to evict them.

Since we are bin packing through our custom scheduler, we won't actually have a need to move existing GameServer pods when nodes shrink, as we will only be leaving behind empty Nodes.

Mark the Agones controller as safe-to-evict: false

The Agones controller should have the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: false, to ensure the autoscaler
doesn't try and move around the controller.

Documentation

Documentation of the above needs to be implemented, but also pointing to how to setup the autoscaler on different cloud providers, etc.

Research

History

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Oct 2, 2018

Updated with exact details on the Taint being applied to the Node. Worked it out by digging through the code.

@GabeBigBoxVR

This comment has been minimized.

Contributor

GabeBigBoxVR commented Oct 2, 2018

This is great, I think one potential issue is when to scale up. For our servers, we use stateful Unity game servers and having an option where we can always leave one empty server running would be desired. I'm wondering what's the best way for demand to be fed into the autoscaler.

Regarding scale down, the defaults of 10 minutes are fine for me, given that it will ensure that spin up times are reasonably fast.

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Oct 2, 2018

@GabeBigBoxVR you would essentially control scale up/down by scaling up and down your Fleets - you cluster would then adjust to the space the Fleet was taking up accordingly. It's actually quite a nice model, as you only need to influence one part of your config - the rest happens automatically.

You can see some work on a Fleet autoscaler in #340

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Oct 2, 2018

Updated with some more thoughts on Fleet scale down - see history for changes.

markmandel added a commit to markmandel/agones that referenced this issue Oct 2, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

@markmandel markmandel self-assigned this Oct 2, 2018

markmandel added a commit to markmandel/agones that referenced this issue Oct 2, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 2, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.
@markmandel

This comment has been minimized.

Collaborator

markmandel commented Oct 3, 2018

Updated with what I think this is the working design - much simpler details, and much less room for error.

See history for changes if you want to see previous versions.

The only leftover question I have is about how the scheduler will work, but I think that's a solvable problem, and just requires some research.

@markmandel markmandel added this to the 0.5.0 milestone Oct 3, 2018

markmandel added a commit to markmandel/agones that referenced this issue Oct 3, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 3, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 3, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 3, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 5, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 5, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 8, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 8, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 9, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

@markmandel markmandel modified the milestones: 0.5.0, 0.6.0 Oct 9, 2018

markmandel added a commit to markmandel/agones that referenced this issue Oct 10, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 14, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 14, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (GoogleCloudPlatform#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit that referenced this issue Oct 16, 2018

Prioritise Allocation from Nodes with Allocated/Ready GameServers
One of the first parts for Node autoscaling (#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.

markmandel added a commit to markmandel/agones that referenced this issue Oct 22, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards GoogleCloudPlatform#368

markmandel added a commit to markmandel/agones that referenced this issue Oct 22, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards GoogleCloudPlatform#368

markmandel added a commit to markmandel/agones that referenced this issue Oct 22, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards GoogleCloudPlatform#368

markmandel added a commit to markmandel/agones that referenced this issue Oct 23, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards GoogleCloudPlatform#368

markmandel added a commit to markmandel/agones that referenced this issue Oct 25, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards GoogleCloudPlatform#368

markmandel added a commit that referenced this issue Oct 25, 2018

This PR sets a preferredDuringSchedulingIgnoredDuringExecution PodAff…
…inity

with a HostName topology.

This does a pretty decent job of grouping together GameServer Pods. It does
tend to distribute more widely when large groups of GameServer Pods get created,
but it's worth experimenting with the first, before going the more risky
route of a custom scheduler (in which we've already found some issues).

We may also find as GameServers shut down at the end of sessions, they start
to group together when they reschedule, as at lower load, the scheduler tends
to do a better job of packing.

Working towards #368

markmandel added a commit to markmandel/agones that referenced this issue Nov 6, 2018

Cluster Autoscaling: safe-to-evict=false annotations for GameServer Pods
This is the final piece for ensuring that the Kubernetes Autoscaler
works with Agones.

This ensures that `GameServer` Pods cannot be evicted from the cluster, via
annotations that the autoscaler uses to determine that `GameServer` Pods
are unsafe to be evicted.

This annotation has also been placed on the controller, but can be turned off
via Helm chart variables.

I expect that cluster autoscaling, and the backing strategies will get tweaked
for performance and resource usage as we get more real world experience with it,
but this is working relatively nicely right now.

Closes GoogleCloudPlatform#368

markmandel added a commit to markmandel/agones that referenced this issue Nov 6, 2018

Cluster Autoscaling: safe-to-evict=false annotations for GameServer Pods
This is the final piece for ensuring that the Kubernetes Autoscaler
works with Agones.

This ensures that `GameServer` Pods cannot be evicted from the cluster, via
annotations that the autoscaler uses to determine that `GameServer` Pods
are unsafe to be evicted.

This annotation has also been placed on the controller, but can be turned off
via Helm chart variables.

I expect that cluster autoscaling, and the backing strategies will get tweaked
for performance and resource usage as we get more real world experience with it,
but this is working relatively nicely right now.

Closes GoogleCloudPlatform#368

markmandel added a commit that referenced this issue Nov 6, 2018

Cluster Autoscaling: safe-to-evict=false annotations for GameServer Pods
This is the final piece for ensuring that the Kubernetes Autoscaler
works with Agones.

This ensures that `GameServer` Pods cannot be evicted from the cluster, via
annotations that the autoscaler uses to determine that `GameServer` Pods
are unsafe to be evicted.

This annotation has also been placed on the controller, but can be turned off
via Helm chart variables.

I expect that cluster autoscaling, and the backing strategies will get tweaked
for performance and resource usage as we get more real world experience with it,
but this is working relatively nicely right now.

Closes #368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment