Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make agent stats' cardinality configurable #12468

Merged
merged 24 commits into from
Mar 11, 2024

Conversation

dannykopping
Copy link
Contributor

Implements #12221

When stats from agents are collected, they are aggregated by 4 dimensions: agent_name, template_name, workspace_name, and username. This can result in some very high cardinality metrics being scraped by Prometheus in large environments.

This PR adds the ability to tune which labels are include in this aggregation, therefore reducing the cardinality.

For example:

agent_sessions_total{agent_name="main",magic_type="ssh",pty="yes",template_name="docker",username="danny",workspace_name="test1"} 6
agent_sessions_total{agent_name="main",magic_type="ssh",pty="yes",template_name="docker",username="danny",workspace_name="test2"} 5

With hundreds of active workspaces, each having a unique name, the cardinality may be unacceptable. In this case the operator may choose to configure CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=username to rather aggregate by user, summing all the metrics' values and only producing a single metric series:

agent_sessions_total{magic_type="ssh",pty="yes",username="danny"} 11

Multiple labels can be provided, e.g. CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY=username,agent_name

agent_sessions_total{agent_name="main",magic_type="ssh",pty="yes",username="danny"} 11

The current behaviour remains the default; if no value is passed to CODER_PROMETHEUS_AGGREGATE_AGENT_STATS_BY there will be no change.

Note to reviewer: I made a sweeping refactor to define the label names in a single location instead of duplicating them, which may make the PR larger than it seems.

flake.nix Show resolved Hide resolved
@dannykopping dannykopping changed the title feat: configurable agent stats cardinality feat: make agent stats cardinality configurable Mar 8, 2024
@dannykopping dannykopping changed the title feat: make agent stats cardinality configurable feat: make agent stats' cardinality configurable Mar 8, 2024
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed some minor nits, but in general look good, nice work!

"golang.org/x/xerrors"

"github.com/coder/coder/v2/coderd/agentmetrics"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately our formatter doesn't handle merging import groups and leaves things in a messy state (depending on what program injected them). 😔

If you notice these, please feel free to fix, but the standard is we try our best but sometimes these slip through, so don't worry too much.

coderd/prometheusmetrics/aggregator.go Outdated Show resolved Hide resolved
coderd/prometheusmetrics/aggregator.go Outdated Show resolved Hide resolved
return nil
}

acceptable := make(map[string]any, len(AcceptedMetricAggregationLabels))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using map[string]bool would be preferable/clearer here, also simplifies the map lookup later. Typically I'd use either bool or if we're looking for space savings, I'd use the zero struct (map[string]struct{}).

dannykopping and others added 8 commits March 8, 2024 14:33
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
…ebug works

Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
codersdk/deployment.go Outdated Show resolved Hide resolved
coderd/agentmetrics/labels.go Outdated Show resolved Hide resolved
coderd/prometheusmetrics/aggregator.go Show resolved Hide resolved
codersdk/deployment.go Show resolved Hide resolved
codersdk/deployment.go Outdated Show resolved Hide resolved
coderd/prometheusmetrics/prometheusmetrics.go Outdated Show resolved Hide resolved
coderd/prometheusmetrics/aggregator.go Show resolved Hide resolved
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I know tests aren't passing now but looks unrelated so I don't need to re-review, nice work!

Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I know tests aren't passing now but looks unrelated so I don't need to re-review, nice work!

Agree, feel free to t.Skip them for now.

@dannykopping dannykopping enabled auto-merge (squash) March 11, 2024 13:43
@dannykopping dannykopping merged commit 21d1873 into coder:main Mar 11, 2024
28 checks passed
@dannykopping dannykopping deleted the dk/configurable-cardinality branch March 11, 2024 14:04
@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants