Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pool manager loop #112

Merged
merged 4 commits into from
Jun 23, 2023

Conversation

gabriel-samfira
Copy link
Member

This PR brings in a number of changes. The largest of which is the refactor of the pool manager loop. The large loop that launches various operations at different intervals has been replaced with more granular individual loops for those same operations.

The consolidate loop has been split into 4 individual loops. The rest of the operations also have their own loop. This means that each operation is now self contained and will not be delayed by other operations that may take a long time.

To address potential race conditions there is now a global lock per runner that is grabbed before a potentially long operation is about to be executed. For example, adding pending instances or removing instances that are in pending_delete will first acquire a lock for that instance.

Given that we now have locks for such operations, we no longer wait for instance creation to finish before returning from addPendingInstances() or deletePendingInstances().

Other changes include:

  • changed curl options to accommodate older versions of curl
  • update github client to latest version

This commit:

  * swaps WaitGroups with errgroups
  * wraps errgroup.Wait() in a select to prevent situations in which an
    operation takes a long time and prevents garm from being restarted.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
This commit adds:

  * more granular loops for various operations
  * update go-github to latest version
  * skip trying to fetch runner info for canceled or skipped jobs
  * loops use waitgroups to signal exit

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
Lock operations per instance name. This should avoid go routines trying
to update the same instance when operations may be slow.

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
@gabriel-samfira gabriel-samfira merged commit 442e76e into cloudbase:main Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant