Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add controller performance metrics #391

Merged

Conversation

dsmith111
Copy link
Contributor

@dsmith111 dsmith111 commented Sep 15, 2022

Problem:
Currently we do not have any metrics in place to track the performance of controlling/reconciling GameServers: #361

Solution:
This PR adds in 2 new Prometheus metrics:

  • GameServerReachedInitializingDuration
    The 5-minute average time to reach Initialization from all new GameServers.

  • GameServerReachedStandingByDuration
    The 5-minute average time to reach StandBy from all new GameServers.

These new metrics can potentially show any issues in the controller itself (time taken to begin GameServer creation) as well as issues related to the servers themselves, or performance of the cluster (time taken to complete GameServer initialization).

Testing used to verify the change:
I created a temporary custom Docker image to build the thundernetes controller manager; using the netcore sample GameServerBuild, I proceeded to:

  1. Scale StandBy GS up
  2. Allocated a server
  3. Gracefully terminated an active server
  4. Scaled StandBy GS back down

All of these events were monitored within the modified Grafana dashboard:
image

@dsmith111
Copy link
Contributor Author

@dgkanatsios In the excel file the description for this issue was:

We should add some performance metrics to controllers, like how much time it takes to reconcile, how many servers were processed etc. We should also include the relevant Geneva charts.

Did you mean Grafana charts, or should this connect to an existing (or to be built) Geneva/Jarvis system?

@dgkanatsios dgkanatsios self-requested a review September 15, 2022 22:37
@dgkanatsios
Copy link
Collaborator

dgkanatsios commented Sep 15, 2022

@dsmith111 thank you for the PR! The charts were mean to be Grafana, which aligns with your PR, thanks!

@dsmith111 dsmith111 marked this pull request as ready for review September 19, 2022 00:48
cmd/nodeagent/types.go Outdated Show resolved Hide resolved
@dgkanatsios
Copy link
Collaborator

A couple of comments, then I think we're good to merge! Appreciate all the work and the communication, thank you!

@dgkanatsios
Copy link
Collaborator

Also FYI @abbasahmed and @ghov since they have #414 open with changes to the Grafana dashboards. You'll need to rebase after we merge this one.

@dsmith111
Copy link
Contributor Author

@dgkanatsios I believe that's all of the current comments wrapped up

Copy link
Collaborator

@dgkanatsios dgkanatsios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@dgkanatsios
Copy link
Collaborator

Really appreciate all the effort and the discussion, will merge as soon as tests pass! Thank you!

@dgkanatsios dgkanatsios merged commit 15336f7 into PlayFab:main Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants