New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GameServer Fleets #70

Closed
markmandel opened this Issue Feb 4, 2018 · 20 comments

Comments

Projects
None yet
5 participants
@markmandel
Collaborator

markmandel commented Feb 4, 2018

Design

Description

Fleets are a group of warm servers that are available to be allocated to players when needed.

In Kubernetes parlance, they are the Deployment/ReplicaSet to Pods, but for GameServers

Features

  • Be able to define a Fleet, with an attached GameServerTemplate (much like a PodTemplate)
  • The Fleet ensures there is always replicas number of Healthy GameServers available (assuming resources exist)
  • If a GameServer becomes Unhealthy, then delete it and create it anew (we may add more options at a later date).
  • If you delete a Fleet, this also deletes the backing GameServers (this should be by default in Kubernetes now anyway)
  • A mechanism (TBD) to be able to get an allocated GameServer out of the pool.
    • If a warm server is not available, then a cold one should be started (assuming resources)
    • An allocated GameServer is moved to a Allocated state on allocation.
  • If the replicas are increased in the Fleet, the number of GameServers is increased to match that number (assuming resources)
  • If the replicas are decreased in the Fleet, the number of GameServers is decreased to match that number.
    • GameServers that are in an Allocated state will never be deleted during the decrease
  • The GameServer template is changed, then we will mimic a Deployment in that we can do either a Recreate or a RollingUpdate to switch out the waiting warm servers.
    • GameServers that are in an Allocated state will never be deleted during the decrease
  • Validation of the configuration and changes.

Configuration

apiVersion: "stable.agon.io/v1alpha1"
kind: Fleet
metadata:
  name: "fleet-example"
spec:
  # number of GameServers
  replicas: 10
  # deployment strategy for updating the image
  # Lifted directly from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#deploymentstrategy-v1-apps
  strategy:
    # Recreate or RollingUpdate. Default to RollingUpdate
    type: RollingUpdate
    rollingUpdateDeployment: # optional rolling update config
      maxSurge: "25%"
      maxUnavailable: "25%"
  # A GameServer template
  template:
    # Standard ObjectMeta
    metadata:
      labels:
        mylabel: myvalue
    # GameServer spec
    spec:
      containerPort: 7654
        template:
          spec:
            containers:
            - name: cpp-simple
              image: gcr.io/agon-images/cpp-simple-server:0.1

Allocation

Allocation is done through creating a FleetAllocation record via kubectl or the API.

For example:

apiVersion: "stable.agon.io/v1alpha1"
kind: FleetAllocation
metadata:
  name: "sample-allocation"
spec:
  fleetName: "fleet-example"

The returned value from creating a GameServerAllocation has the details of the allocated server (And moves the GameServer to the state Allocated.)

For example:

apiVersion: "stable.agon.io/v1alpha1"
kind: FleetAllocation
metadata:
  name: "sample-allocation"
spec:
  fleetName: "fleet-example"
status:
  gameserver:
    metadata:
      name: "allocated-game-server"
    spec:
      containerPort: 7654
      template:
        spec:
          containers:
          - name: cpp-simple
            image: gcr.io/agones-images/cpp-simple-server:0.1
    status:
      address: 192.168.99.100
      nodeName: agones
      port: 7373
      state: Allocated
  • Allocations can only be created, the only way to remove an allocation (at this stage) is to delete the backing GameServer.

Work Breakdown

  • GameServerSet (The GameServerSet to Fleet as ReplicaSet to Deployment) (#156)
  • Creating a Fleet creates a GameServerSet (#174)
  • Create an Allocation from a fleet (#193)
  • Recreate update strategy (#199)
  • Rolling update strategy

Out of scope

  • Pre-caching of images. We'll see how testing works without this for now.

Research

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 4, 2018

@EricFortin - I added a section on what to do when the GameServer template changes on a Fleet. I think it should mimic what happens with a Deployment - but we didn't talk explicitly about this. WDYT?

@EricFortin

This comment has been minimized.

Collaborator

EricFortin commented Feb 5, 2018

@markmandel I think it makes sense and is what I would be expecting of something running on a kubernetes cluster. We just need to adapt it to take the new state into consideration. It should recreate/update only game servers that aren't allocated yet. Moreover, once a non-updated game servers becomes free again, it should be updated before being reallocated. At the beginning, we might solve this issue by recreating any container where games have ended but at some point, games will want persistence for their game server process.

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 5, 2018

SGTM. Adjusted the above to make sure it was clear that Allocated GameServers will not get deleted in the rolling update (but we were thinking the same thing)

I had a question about your statement here:

Moreover, once a non-updated game servers becomes free again, it should be updated before being reallocated. At the beginning, we might solve this issue by recreating any container where games have ended but at some point, games will want persistence for their game server process.

My thought here was very much that when a GameServer shutdown, it gets totally deleted, and a new one would replace it. Thus agreeing with you - the update would occur when a replacement gets created in the Fleet's pool.

From my understanding - we were all working from the assumption/opinion that a GameServer will exit when it has completed a session (sounds like this also needs to be documented).

When you said "but at some point, games will want persistence for their game server process" - do you meant the GameServer is going to reset it back to a zero state? rather than exit, or do you mean something else?

@EricFortin

This comment has been minimized.

Collaborator

EricFortin commented Feb 5, 2018

From my understanding - we were all working from the assumption/opinion that a GameServer will exit when it has completed a session (sounds like this also needs to be documented).

TL;DR. The assumption holds

My point was mainly that games might want to control server life cycle themselves and not tie it to a game session. It could mean having the server becoming available again to host another "session", which is a concept that is missing at the moment. Should we add it?

Having a server process die at the end of a match applies well to a FPS but not necessarily to a PvE but I guess we can still consider that the server process will exit when it is ready, or instructed to become so, to be swapped out by an updated container. It might just take a long time(hours, not weeks).

@Kuqd

This comment has been minimized.

Collaborator

Kuqd commented Feb 5, 2018

I'm wondering if the Allocated concept is the same as currently hosting a session ?

@alexandrem

This comment has been minimized.

alexandrem commented Feb 5, 2018

@EricFortin We most likely want to create new game sessions in a clean environment every time and not reuse previous process.

In a PvE, we wouldn't want cheaters to leave any bad states in the engine memory or on disk.

I suppose there's a trade off to be made between good isolation and initialization time with "hot" server processes. I'd rather have better isolation. What do you think?

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 5, 2018

@EricFortin Ah yes - good point (thanks for the explanation). I think when we start looking more at PvE - there are several strategies we could look at - maybe a SDK method of Deallocate() rather than Shutdown() - or maybe it's a configuration option on Shutdown() it returns to the pool rather than gets deleted. But I would defer this to a later date for the moment.

My current thought for getting an Allocated GameServer - is to essentially follow the Deployment paradigm:

  • We have a GameServerAllocation CRD that you can then create a via the yaml / Kubernetes api.
  • When the GameServerAllocation is created, it's the GameServerAllocation that is returned is populated with the details of the GameServer that is allocated (need 1.9 webhooks for this).
  • The GameServer state is also moved to Allocated at the same time.
  • GameServerAllocations can't be edited - only created and deleted (current thought being if you delete the GameServerAllocation or the GameServer they are both deleted )?

This stays nicely within the K8s paradigm. WDYT? (I can also write it up above with some examples if that helps)

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 5, 2018

@alexandrem I'm more inclined as well to isolation - but I feel like we could try with better isolation (processes get killed), and see what the actual implications are with a real system, and make pragmatic decisions once we have more data (i.e. then add options to move out of Allocation if needed), rather than decide at this stage. WDYT?

@alexandrem

This comment has been minimized.

alexandrem commented Feb 5, 2018

It's probably too early to decide on the fate of this feature. We certainly can have this discussion about recycling terminated game session resources later.

I'd focus on keeping things really simple at this stage though. Meaning that game server allocation represents a new scheduled pod, then when players leave (or game terminated) we delete the GS pod (fleet will ensure to replace it with a fresh one).

@EricFortin

This comment has been minimized.

Collaborator

EricFortin commented Feb 5, 2018

Like I said, I am OK with using the simple approach for now. I just wanted to mention some use cases where I know it might cause some problems so we don't make an early mistake. I am all for isolation but in some cases, this goes against certain game mode where the players come and go and game "lives" in between interactions with players so recreating it anew might defeat the feature. It all depends on the level of statefulness of game servers which I do not control.

@alexandrem

This comment has been minimized.

alexandrem commented Feb 5, 2018

@EricFortin Those longer lived game sessions where players come and go (and remain non populated for some time) are an interesting use case.

I suppose in the end this is controlled solely by the SDK; is game terminated or not. If not terminated, then we definitely want to keep the game session in "allocated" and not touch them. Actually, they probably should have a different state so we can track those who are empty so we can possibly remove them when the fleet size is being decreased.

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 5, 2018

Actually, they probably should have a different state so we can track those who are empty so we can possibly remove them when the fleet size is being decreased.

This is an interesting question - one could argue, that if the GameServer becomes empty (maybe for a certain time) this is implementation specific, and the actual game server code should just call SDK.Shutdown() when it's been empty for a while - and then the Fleet will just do what it needs to do at that point forward.

@EricFortin

This comment has been minimized.

Collaborator

EricFortin commented Feb 6, 2018

For those cases where life cycle is not totally dependant on game sessions, there will most probably be a "god" service that manages it all and can replace an instance by directing players to the new instance and ultimately instructs the old instance to shutdown. This use case also affects scale down event since even if we taint the node, it still can't be recycled if it's hosting a game server. This is out of scope for now but I think we will need a way to provide this feedback so something can react.

@markmandel

This comment has been minimized.

Collaborator

markmandel commented Feb 7, 2018

Added some notes about how I think an Allocation should occur. These will definitely need 1.9 features.

markmandel added a commit that referenced this issue Feb 12, 2018

Move Agones resources into a `agones-system` namespace.
This is setup work for #10 and #70, since the service for
webhooks need to be namespaced.

Closes #89

markmandel added a commit that referenced this issue Feb 16, 2018

MutatingWebHookConfiguration for GameServer creation.
Webhook library to make k8s webhooks easy(er) to use,
as well as setting default values on GameServers via it
for when they are first created.

Some refactoring of GameServer sync in the controller
was required and a new PortAllocation state was created.

This is also makes #70 and #10 possible to implement.

markmandel added a commit that referenced this issue Feb 16, 2018

MutatingWebHookConfiguration for GameServer creation & Validation.
Webhook library to make k8s webhooks easy(er) to use,
as well as setting default values on GameServers via it
for when they are first created.

Some refactoring of GameServer sync in the controller
was required and a new PortAllocation state was created.

This is also makes #70 and #10 possible to implement.

markmandel added a commit that referenced this issue Feb 16, 2018

MutatingWebHookConfiguration for GameServer creation & Validation.
Webhook library to make k8s webhooks easy(er) to use,
as well as setting default values on GameServers via it
for when they are first created.

Some refactoring of GameServer sync in the controller
was required and a new PortAllocation state was created.

This is also makes #70 and #10 possible to implement.

markmandel added a commit that referenced this issue Feb 17, 2018

MutatingWebHookConfiguration for GameServer creation & Validation.
Webhook library to make k8s webhooks easy(er) to use,
as well as setting default values on GameServers via it
for when they are first created.

Some refactoring of GameServer sync in the controller
was required and a new PortAllocation state was created.

This is also makes #70 and #10 possible to implement.

markmandel added a commit that referenced this issue Feb 17, 2018

MutatingWebHookConfiguration for GameServer creation & Validation.
Webhook library to make k8s webhooks easy(er) to use,
as well as setting default values on GameServers via it
for when they are first created.

Some refactoring of GameServer sync in the controller
was required and a new PortAllocation state was created.

This is also makes #70 and #10 possible to implement.

markmandel added a commit to markmandel/agones that referenced this issue Apr 4, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 4, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 4, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 4, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 6, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 6, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 6, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 6, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 10, 2018

GameServerSet Implementation
GameServerSets are the basic building block for Fleets.
GameServerSets will be allow Fleet migrations to occur, similarly
to how ReplicaSets allow Deployments to migrate one image type
to another.

This has not been formally documented, as this will likely be an
internal CRD, and not (widely) used externally.

Parent ticket: GoogleCloudPlatform#70
@markmandel

This comment has been minimized.

Collaborator

markmandel commented Apr 11, 2018

Start of "Creating a Fleet creates a GameServerSet" first PR has started over in this repository and branch

markmandel added a commit to markmandel/agones that referenced this issue Apr 12, 2018

Creating a Fleet creates a GameServerSet
All this commit is doing is create a GameServerSet
when a Fleet is created. Also, if the Replicas in
a Fleet are updated, then the owned GameServerSet's
Replicas are also updated.

For this commit, we are not worrying about handling
updates to the Spec (e.g. new image) to the Fleet,
and doing updates to a live Fleet, we are just concerned
with scaling up and down.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 12, 2018

Creating a Fleet creates a GameServerSet
All this commit is doing is create a GameServerSet
when a Fleet is created. Also, if the Replicas in
a Fleet are updated, then the owned GameServerSet's
Replicas are also updated.

For this commit, we are not worrying about handling
updates to the Spec (e.g. new image) to the Fleet,
and doing updates to a live Fleet, we are just concerned
with scaling up and down.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 12, 2018

Creating a Fleet creates a GameServerSet
All this commit is doing is create a GameServerSet
when a Fleet is created. Also, if the Replicas in
a Fleet are updated, then the owned GameServerSet's
Replicas are also updated.

For this commit, we are not worrying about handling
updates to the Spec (e.g. new image) to the Fleet,
and doing updates to a live Fleet, we are just concerned
with scaling up and down.

Parent ticket: GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue Apr 13, 2018

Creating a Fleet creates a GameServerSet
All this commit is doing is create a GameServerSet
when a Fleet is created. Also, if the Replicas in
a Fleet are updated, then the owned GameServerSet's
Replicas are also updated.

For this commit, we are not worrying about handling
updates to the Spec (e.g. new image) to the Fleet,
and doing updates to a live Fleet, we are just concerned
with scaling up and down.

Parent ticket: GoogleCloudPlatform#70
@markmandel

This comment has been minimized.

Collaborator

markmandel commented Apr 18, 2018

Fleet Allocation work started over here

@markmandel

This comment has been minimized.

Collaborator

markmandel commented May 8, 2018

Work on doing a straight replace when updating Fleet's GameServer details, can be found over here:
https://github.com/markmandel/agones/tree/feature/fleet-replace

@markmandel

This comment has been minimized.

Collaborator

markmandel commented May 14, 2018

markmandel added a commit to markmandel/agones that referenced this issue May 18, 2018

Rolling updates for Fleets
This implements a configurable rolling update strategy
for Fleets that also ensures that Allocated GameServers
are not interuppted.

Also includes updates to documentation.

Closes GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue May 18, 2018

Rolling updates for Fleets
This implements a configurable rolling update strategy
for Fleets that also ensures that Allocated GameServers
are not interuppted.

Also includes updates to documentation.

Closes GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue May 21, 2018

Rolling updates for Fleets
This implements a configurable rolling update strategy
for Fleets that also ensures that Allocated GameServers
are not interuppted.

Also includes updates to documentation.

Closes GoogleCloudPlatform#70

markmandel added a commit to markmandel/agones that referenced this issue May 23, 2018

Rolling updates for Fleets
This implements a configurable rolling update strategy
for Fleets that also ensures that Allocated GameServers
are not interuppted.

Also includes updates to documentation.

Closes GoogleCloudPlatform#70

@enocom enocom closed this in #213 May 23, 2018

enocom added a commit that referenced this issue May 23, 2018

Rolling updates for Fleets
This implements a configurable rolling update strategy
for Fleets that also ensures that Allocated GameServers
are not interuppted.

Also includes updates to documentation.

Closes #70
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment