Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 49 additions & 2 deletions doc/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,13 +374,60 @@ This is one of the features in GARM that I really love having. For one thing, it

| Metric name | Type | Labels | Description |
|---------------------------------------|---------|------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| `garm_github_operations_total` | Counter | `operation`=&lt;ListRunners\|CreateRegistrationToken\|...&gt; <br>`scope`=&lt;Organization\|Repository\|Enterprise&gt; | This is a counter that increments every time a github operation is performed |
| `garm_github_errors_total` | Counter | `operation`=&lt;ListRunners\|CreateRegistrationToken\|...&gt; <br>`scope`=&lt;Organization\|Repository\|Enterprise&gt; | This is a counter that increments every time a github operation errored |
| `garm_github_operations_total` | Counter | `operation`=&lt;see [operations list](#github-operations) below&gt; <br>`scope`=&lt;Organization\|Repository\|Enterprise&gt; | This is a counter that increments every time a github operation is performed |
| `garm_github_errors_total` | Counter | `operation`=&lt;see [operations list](#github-operations) below&gt; <br>`scope`=&lt;Organization\|Repository\|Enterprise&gt; | This is a counter that increments every time a github operation errored |
| `garm_github_rate_limit_limit` | Gauge | `credential_name`=&lt;credential name&gt; <br>`credential_id`=&lt;credential id&gt; <br>`endpoint`=&lt;endpoint name&gt; | The maximum number of requests allowed per hour for GitHub API |
| `garm_github_rate_limit_remaining` | Gauge | `credential_name`=&lt;credential name&gt; <br>`credential_id`=&lt;credential id&gt; <br>`endpoint`=&lt;endpoint name&gt; | The number of requests remaining in the current rate limit window |
| `garm_github_rate_limit_used` | Gauge | `credential_name`=&lt;credential name&gt; <br>`credential_id`=&lt;credential id&gt; <br>`endpoint`=&lt;endpoint name&gt; | The number of requests used in the current rate limit window |
| `garm_github_rate_limit_reset_timestamp` | Gauge | `credential_name`=&lt;credential name&gt; <br>`credential_id`=&lt;credential id&gt; <br>`endpoint`=&lt;endpoint name&gt; | Unix timestamp when the rate limit resets |

#### GitHub operations

The following operation names are used as values for the `operation` label in `garm_github_operations_total` and `garm_github_errors_total`:

**GitHub client operations:**

| Operation | Description |
|---|---|
| `ListHooks` | List webhooks on an entity |
| `GetHook` | Get a single webhook |
| `CreateHook` | Create a webhook |
| `DeleteHook` | Delete a webhook |
| `PingHook` | Ping a webhook |
| `ListEntityRunners` | List runners for an entity |
| `ListEntityRunnerApplicationDownloads` | List runner application downloads |
| `RemoveEntityRunner` | Remove a runner |
| `CreateEntityRegistrationToken` | Create a runner registration token |
| `ListOrganizationRunnerGroups` | List organization runner groups |
| `ListRunnerGroups` | List enterprise runner groups |
| `GetEntityJITConfig` | Generate a JIT runner configuration |
| `GetRateLimit` | Get API rate limit information |

**Scale set operations:**

| Operation | Description |
|---|---|
| `GetRunnerScaleSetByNameAndRunnerGroup` | Get a runner scale set by name and runner group |
| `GetRunnerScaleSetByID` | Get a runner scale set by ID |
| `ListRunnerScaleSets` | List all runner scale sets |
| `CreateRunnerScaleSet` | Create a runner scale set |
| `UpdateRunnerScaleSet` | Update a runner scale set |
| `DeleteRunnerScaleSet` | Delete a runner scale set |
| `GetRunnerGroupByName` | Get a runner group by name |
| `GenerateJitRunnerConfig` | Generate a JIT runner config for a scale set |
| `GetRunner` | Get a runner by ID |
| `ListAllRunners` | List all runners |
| `GetRunnerByName` | Get a runner by name |
| `RemoveRunner` | Remove a runner |
| `AcquireJobs` | Acquire jobs for a scale set |
| `GetAcquirableJobs` | Get acquirable jobs for a scale set |
| `GetActionServiceInfo` | Get actions service admin info |
| `CreateMessageSession` | Create a message queue session |
| `DeleteMessageSession` | Delete a message queue session |
| `RefreshMessageSession` | Refresh a message queue session token |
| `GetMessage` | Get a message from the message queue |
| `DeleteMessage` | Delete a message from the message queue |

### Enabling metrics

Metrics are disabled by default. To enable them, add the following to your config file:
Expand Down
15 changes: 15 additions & 0 deletions util/github/scalesets/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"github.com/google/go-github/v72/github"

runnerErrors "github.com/cloudbase/garm-provider-common/errors"
"github.com/cloudbase/garm/metrics"
"github.com/cloudbase/garm/params"
"github.com/cloudbase/garm/runner/common"
)
Expand Down Expand Up @@ -51,6 +52,20 @@ type ScaleSetClient struct {
mux sync.Mutex
}

func (s *ScaleSetClient) recordOperation(operation string) {
metrics.GithubOperationCount.WithLabelValues(
operation,
s.ghCli.GetEntity().LabelScope(),
).Inc()
}

func (s *ScaleSetClient) recordFailedOperation(operation string) {
metrics.GithubOperationFailedCount.WithLabelValues(
operation,
s.ghCli.GetEntity().LabelScope(),
).Inc()
}

func (s *ScaleSetClient) SetGithubClient(cli common.GithubClient) {
s.mux.Lock()
defer s.mux.Unlock()
Expand Down
18 changes: 16 additions & 2 deletions util/github/scalesets/jobs.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,14 @@ type acquireJobsResult struct {
Value []int64 `json:"value"`
}

func (s *ScaleSetClient) AcquireJobs(ctx context.Context, runnerScaleSetID int, messageQueueAccessToken string, requestIDs []int64) ([]int64, error) {
func (s *ScaleSetClient) AcquireJobs(ctx context.Context, runnerScaleSetID int, messageQueueAccessToken string, requestIDs []int64) (_ []int64, err error) {
s.recordOperation("AcquireJobs")
defer func() {
if err != nil {
s.recordFailedOperation("AcquireJobs")
}
}()

u := fmt.Sprintf("%s/%d/acquirejobs?api-version=6.0-preview", scaleSetEndpoint, runnerScaleSetID)

body, err := json.Marshal(requestIDs)
Expand Down Expand Up @@ -60,7 +67,14 @@ func (s *ScaleSetClient) AcquireJobs(ctx context.Context, runnerScaleSetID int,
return acquiredJobs.Value, nil
}

func (s *ScaleSetClient) GetAcquirableJobs(ctx context.Context, runnerScaleSetID int) (params.AcquirableJobList, error) {
func (s *ScaleSetClient) GetAcquirableJobs(ctx context.Context, runnerScaleSetID int) (_ params.AcquirableJobList, err error) {
s.recordOperation("GetAcquirableJobs")
defer func() {
if err != nil {
s.recordFailedOperation("GetAcquirableJobs")
}
}()

path := fmt.Sprintf("%d/acquirablejobs", runnerScaleSetID)

req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
Expand Down
45 changes: 40 additions & 5 deletions util/github/scalesets/message_sessions.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,14 @@ func (m *MessageSession) SessionsRelativeURL() (string, error) {
return relativePath, nil
}

func (m *MessageSession) Refresh(ctx context.Context) error {
func (m *MessageSession) Refresh(ctx context.Context) (err error) {
m.ssCli.recordOperation("RefreshMessageSession")
defer func() {
if err != nil {
m.ssCli.recordFailedOperation("RefreshMessageSession")
}
}()

slog.DebugContext(ctx, "refreshing message session token", "session_id", m.session.SessionID.String())
m.mux.Lock()
defer m.mux.Unlock()
Expand Down Expand Up @@ -163,7 +170,14 @@ func (m *MessageSession) maybeRefreshToken(ctx context.Context) error {
return nil
}

func (m *MessageSession) GetMessage(ctx context.Context, lastMessageID int64, maxCapacity uint) (params.RunnerScaleSetMessage, error) {
func (m *MessageSession) GetMessage(ctx context.Context, lastMessageID int64, maxCapacity uint) (_ params.RunnerScaleSetMessage, err error) {
m.ssCli.recordOperation("GetMessage")
defer func() {
if err != nil {
m.ssCli.recordFailedOperation("GetMessage")
}
}()

u, err := url.Parse(m.session.MessageQueueURL)
if err != nil {
return params.RunnerScaleSetMessage{}, err
Expand Down Expand Up @@ -206,7 +220,14 @@ func (m *MessageSession) GetMessage(ctx context.Context, lastMessageID int64, ma
return message, nil
}

func (m *MessageSession) DeleteMessage(ctx context.Context, messageID int64) error {
func (m *MessageSession) DeleteMessage(ctx context.Context, messageID int64) (err error) {
m.ssCli.recordOperation("DeleteMessage")
defer func() {
if err != nil {
m.ssCli.recordFailedOperation("DeleteMessage")
}
}()

u, err := url.Parse(m.session.MessageQueueURL)
if err != nil {
return err
Expand All @@ -231,7 +252,14 @@ func (m *MessageSession) DeleteMessage(ctx context.Context, messageID int64) err
return nil
}

func (s *ScaleSetClient) CreateMessageSession(ctx context.Context, runnerScaleSetID int, owner string) (*MessageSession, error) {
func (s *ScaleSetClient) CreateMessageSession(ctx context.Context, runnerScaleSetID int, owner string) (_ *MessageSession, err error) {
s.recordOperation("CreateMessageSession")
defer func() {
if err != nil {
s.recordFailedOperation("CreateMessageSession")
}
}()

path := fmt.Sprintf("%s/%d/sessions", scaleSetEndpoint, runnerScaleSetID)

newSession := params.RunnerScaleSetSession{
Expand Down Expand Up @@ -274,7 +302,14 @@ func (s *ScaleSetClient) CreateMessageSession(ctx context.Context, runnerScaleSe
return sess, nil
}

func (s *ScaleSetClient) DeleteMessageSession(ctx context.Context, session *MessageSession) error {
func (s *ScaleSetClient) DeleteMessageSession(ctx context.Context, session *MessageSession) (err error) {
s.recordOperation("DeleteMessageSession")
defer func() {
if err != nil {
s.recordFailedOperation("DeleteMessageSession")
}
}()

path, err := session.SessionsRelativeURL()
if err != nil {
return fmt.Errorf("failed to delete session: %w", err)
Expand Down
45 changes: 40 additions & 5 deletions util/github/scalesets/runners.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,14 @@ type scaleSetJitRunnerConfig struct {
WorkFolder string `json:"workFolder"`
}

func (s *ScaleSetClient) GenerateJitRunnerConfig(ctx context.Context, runnerName string, scaleSetID int) (params.RunnerScaleSetJitRunnerConfig, error) {
func (s *ScaleSetClient) GenerateJitRunnerConfig(ctx context.Context, runnerName string, scaleSetID int) (_ params.RunnerScaleSetJitRunnerConfig, err error) {
s.recordOperation("GenerateJitRunnerConfig")
defer func() {
if err != nil {
s.recordFailedOperation("GenerateJitRunnerConfig")
}
}()

runnerSettings := scaleSetJitRunnerConfig{
Name: runnerName,
WorkFolder: "_work",
Expand Down Expand Up @@ -64,7 +71,14 @@ func (s *ScaleSetClient) GenerateJitRunnerConfig(ctx context.Context, runnerName
return runnerJitConfig, nil
}

func (s *ScaleSetClient) GetRunner(ctx context.Context, runnerID int64) (params.RunnerReference, error) {
func (s *ScaleSetClient) GetRunner(ctx context.Context, runnerID int64) (_ params.RunnerReference, err error) {
s.recordOperation("GetRunner")
defer func() {
if err != nil {
s.recordFailedOperation("GetRunner")
}
}()

path := fmt.Sprintf("%s/%d", runnerEndpoint, runnerID)

req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
Expand All @@ -86,7 +100,14 @@ func (s *ScaleSetClient) GetRunner(ctx context.Context, runnerID int64) (params.
return runnerReference, nil
}

func (s *ScaleSetClient) ListAllRunners(ctx context.Context) (params.RunnerReferenceList, error) {
func (s *ScaleSetClient) ListAllRunners(ctx context.Context) (_ params.RunnerReferenceList, err error) {
s.recordOperation("ListAllRunners")
defer func() {
if err != nil {
s.recordFailedOperation("ListAllRunners")
}
}()

req, err := s.newActionsRequest(ctx, http.MethodGet, runnerEndpoint, nil)
if err != nil {
return params.RunnerReferenceList{}, fmt.Errorf("failed to construct request: %w", err)
Expand All @@ -106,7 +127,14 @@ func (s *ScaleSetClient) ListAllRunners(ctx context.Context) (params.RunnerRefer
return runnerList, nil
}

func (s *ScaleSetClient) GetRunnerByName(ctx context.Context, runnerName string) (params.RunnerReference, error) {
func (s *ScaleSetClient) GetRunnerByName(ctx context.Context, runnerName string) (_ params.RunnerReference, err error) {
s.recordOperation("GetRunnerByName")
defer func() {
if err != nil {
s.recordFailedOperation("GetRunnerByName")
}
}()

path := fmt.Sprintf("%s?agentName=%s", runnerEndpoint, runnerName)

req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
Expand Down Expand Up @@ -136,7 +164,14 @@ func (s *ScaleSetClient) GetRunnerByName(ctx context.Context, runnerName string)
return runnerList.RunnerReferences[0], nil
}

func (s *ScaleSetClient) RemoveRunner(ctx context.Context, runnerID int64) error {
func (s *ScaleSetClient) RemoveRunner(ctx context.Context, runnerID int64) (err error) {
s.recordOperation("RemoveRunner")
defer func() {
if err != nil {
s.recordFailedOperation("RemoveRunner")
}
}()

path := fmt.Sprintf("%s/%d", runnerEndpoint, runnerID)

req, err := s.newActionsRequest(ctx, http.MethodDelete, path, nil)
Expand Down
63 changes: 56 additions & 7 deletions util/github/scalesets/scalesets.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ const (
HeaderGitHubRequestID = "X-GitHub-Request-Id"
)

func (s *ScaleSetClient) GetRunnerScaleSetByNameAndRunnerGroup(ctx context.Context, runnerGroupID int, name string) (params.RunnerScaleSet, error) {
func (s *ScaleSetClient) GetRunnerScaleSetByNameAndRunnerGroup(ctx context.Context, runnerGroupID int, name string) (_ params.RunnerScaleSet, err error) {
s.recordOperation("GetRunnerScaleSetByNameAndRunnerGroup")
defer func() {
if err != nil {
s.recordFailedOperation("GetRunnerScaleSetByNameAndRunnerGroup")
}
}()

path := fmt.Sprintf("%s?runnerGroupId=%d&name=%s", scaleSetEndpoint, runnerGroupID, name)
req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
if err != nil {
Expand All @@ -62,7 +69,14 @@ func (s *ScaleSetClient) GetRunnerScaleSetByNameAndRunnerGroup(ctx context.Conte
return runnerScaleSetList.RunnerScaleSets[0], nil
}

func (s *ScaleSetClient) GetRunnerScaleSetByID(ctx context.Context, runnerScaleSetID int) (params.RunnerScaleSet, error) {
func (s *ScaleSetClient) GetRunnerScaleSetByID(ctx context.Context, runnerScaleSetID int) (_ params.RunnerScaleSet, err error) {
s.recordOperation("GetRunnerScaleSetByID")
defer func() {
if err != nil {
s.recordFailedOperation("GetRunnerScaleSetByID")
}
}()

path := fmt.Sprintf("%s/%d", scaleSetEndpoint, runnerScaleSetID)
req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
if err != nil {
Expand All @@ -83,7 +97,14 @@ func (s *ScaleSetClient) GetRunnerScaleSetByID(ctx context.Context, runnerScaleS
}

// ListRunnerScaleSets lists all runner scale sets in a github entity.
func (s *ScaleSetClient) ListRunnerScaleSets(ctx context.Context) (*params.RunnerScaleSetsResponse, error) {
func (s *ScaleSetClient) ListRunnerScaleSets(ctx context.Context) (_ *params.RunnerScaleSetsResponse, err error) {
s.recordOperation("ListRunnerScaleSets")
defer func() {
if err != nil {
s.recordFailedOperation("ListRunnerScaleSets")
}
}()

req, err := s.newActionsRequest(ctx, http.MethodGet, scaleSetEndpoint, nil)
if err != nil {
return nil, err
Expand All @@ -107,7 +128,14 @@ func (s *ScaleSetClient) ListRunnerScaleSets(ctx context.Context) (*params.Runne
}

// CreateRunnerScaleSet creates a new runner scale set in the target GitHub entity.
func (s *ScaleSetClient) CreateRunnerScaleSet(ctx context.Context, runnerScaleSet *params.RunnerScaleSet) (params.RunnerScaleSet, error) {
func (s *ScaleSetClient) CreateRunnerScaleSet(ctx context.Context, runnerScaleSet *params.RunnerScaleSet) (_ params.RunnerScaleSet, err error) {
s.recordOperation("CreateRunnerScaleSet")
defer func() {
if err != nil {
s.recordFailedOperation("CreateRunnerScaleSet")
}
}()

body, err := json.Marshal(runnerScaleSet)
if err != nil {
return params.RunnerScaleSet{}, err
Expand All @@ -131,7 +159,14 @@ func (s *ScaleSetClient) CreateRunnerScaleSet(ctx context.Context, runnerScaleSe
return createdRunnerScaleSet, nil
}

func (s *ScaleSetClient) UpdateRunnerScaleSet(ctx context.Context, runnerScaleSetID int, runnerScaleSet params.RunnerScaleSet) (params.RunnerScaleSet, error) {
func (s *ScaleSetClient) UpdateRunnerScaleSet(ctx context.Context, runnerScaleSetID int, runnerScaleSet params.RunnerScaleSet) (_ params.RunnerScaleSet, err error) {
s.recordOperation("UpdateRunnerScaleSet")
defer func() {
if err != nil {
s.recordFailedOperation("UpdateRunnerScaleSet")
}
}()

path := fmt.Sprintf("%s/%d", scaleSetEndpoint, runnerScaleSetID)

body, err := json.Marshal(runnerScaleSet)
Expand All @@ -157,7 +192,14 @@ func (s *ScaleSetClient) UpdateRunnerScaleSet(ctx context.Context, runnerScaleSe
return ret, nil
}

func (s *ScaleSetClient) DeleteRunnerScaleSet(ctx context.Context, runnerScaleSetID int) error {
func (s *ScaleSetClient) DeleteRunnerScaleSet(ctx context.Context, runnerScaleSetID int) (err error) {
s.recordOperation("DeleteRunnerScaleSet")
defer func() {
if err != nil {
s.recordFailedOperation("DeleteRunnerScaleSet")
}
}()

path := fmt.Sprintf("%s/%d", scaleSetEndpoint, runnerScaleSetID)
req, err := s.newActionsRequest(ctx, http.MethodDelete, path, nil)
if err != nil {
Expand All @@ -178,7 +220,14 @@ func (s *ScaleSetClient) DeleteRunnerScaleSet(ctx context.Context, runnerScaleSe
return nil
}

func (s *ScaleSetClient) GetRunnerGroupByName(ctx context.Context, runnerGroup string) (params.RunnerGroup, error) {
func (s *ScaleSetClient) GetRunnerGroupByName(ctx context.Context, runnerGroup string) (_ params.RunnerGroup, err error) {
s.recordOperation("GetRunnerGroupByName")
defer func() {
if err != nil {
s.recordFailedOperation("GetRunnerGroupByName")
}
}()

path := fmt.Sprintf("_apis/runtime/runnergroups/?groupName=%s", runnerGroup)
req, err := s.newActionsRequest(ctx, http.MethodGet, path, nil)
if err != nil {
Expand Down
Loading
Loading