Add OTEL MetricsV2 standard cluster integration tests#671
Merged
Conversation
d27798c to
1b0028f
Compare
Contributor
|
I know these tests run in us-west-2 but could be worth putting a region guard. If anyone adds a different region to the matrix that doesn't support metrics v2 it'll all just silently fail |
louisall
reviewed
Apr 21, 2026
089afbe to
d9cba8f
Compare
sky333999
reviewed
Apr 23, 2026
d6e91d4 to
6af180b
Compare
sky333999
reviewed
Apr 28, 2026
| } | ||
| qc.mu.RUnlock() | ||
|
|
||
| qc.mu.Lock() |
Contributor
There was a problem hiding this comment.
🤖 :
The write lock is held for the entire miss path, which includes the blocking HTTP calls. While the tests run sequentially today (no t.Parallel()), this design is fragile: if anyone enables t.Parallel() on these subtests, the cache becomes a single-flight serialization point across all metrics, not just per metric.
The intra-metric fan-out via qc.sem ends up executing inside the held lock anyway, so the semaphore's inter-metric budget is effectively unused.
Fix options:
- Double-check the miss path with per-metric sentinels (e.g. x/sync/singleflight) so different metric names proceed concurrently.
- Or release the lock after adding a placeholder entry, do the HTTP work, then re-acquire to store.
Either approach keeps the RLock fast-path on cache hits intact.
Contributor
There was a problem hiding this comment.
🤖 :
Once ^ is addressed, adding t.Parallel() at the outer TestXxx level would significantly cut wall-clock time — almost every test is cache-backed reads of the same metric set, and the ground-truth client is already guarded by sync.Once.
Contributor
Author
There was a problem hiding this comment.
Fixed, lemme know if this works
b3f595c to
a129439
Compare
Port OTEL integration tests into the agent test framework for the
standard EKS cluster (2x t3.medium). Tests validate metric correctness
by querying the CloudWatch PromQL API with SigV4 auth and cross-
validating against K8s API ground truth.
Components:
- util/otelmetrics/: Shared PromQL client library with SigV4 signing,
query cache, ZIP-0006 label parsing, and rate limiting
- terraform/eks/daemon/otel/: Ephemeral EKS cluster with Helm chart
(release-6.1.0), Pod Identity, and nginx test workload
- test/otel/standard/: 145 integration tests covering cadvisor,
node_exporter, kubeletstats, KSM, control plane, dedup, resolution,
host validation, and cross-source label correctness
- generator/test_case_generator.go: Added to eks_daemon matrix
Validated in CI: 145/145 PASS via existing EKSIntegrationTest workflow.
a129439 to
e39240b
Compare
sky333999
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Port OTEL integration tests into the agent test framework for the
standard EKS cluster (2x t3.medium). Tests validate metric correctness
by querying the CloudWatch PromQL API with SigV4 auth and cross-
validating against K8s API ground truth.
Components:
Validated in CI: 145/145 PASS via existing EKSIntegrationTest workflow.
Workflow Run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/24726297813