-
Notifications
You must be signed in to change notification settings - Fork 941
Description
Describe the bug
When publishing a detailed metric into CloudWatch using CloudWatchMetricPublisher, cloudWatchClient.putMetricData API throws a request validation exception:
WARN cloudwatch:117 - Failed while publishing some or all AWS SDK client-side metrics to CloudWatch.
java.util.concurrent.CompletionException: software.amazon.awssdk.services.cloudwatch.model.InvalidParameterValueException: The collection MetricData.member.13.Values must not have a size greater than 150. (Service: CloudWatch, Status Code: 400, Request ID: a84e1dcb-bfba-495b-97f5-60d7316c1800)
I believe this is due to there being more than 150 data points for a single metric, though I'm not sure as this seems to be internal to AWS servers.
This limitation seems to be documented here:
For example, a single PutMetricData call can include 20 metrics and 150 data points.
Expected behavior
Client library should construct the request object while taking into account this limitation either by sending the additional data points into later requests or by dropping the additional data points.
Current behavior
cloudWatchClient.putMetricData fails and a logging message is recorded.
All metric values for this high QPS service are lost.
WARN cloudwatch:117 - Failed while publishing some or all AWS SDK client-side metrics to CloudWatch.
java.util.concurrent.CompletionException: software.amazon.awssdk.services.cloudwatch.model.InvalidParameterValueException: The collection MetricData.member.13.Values must not have a size greater than 150. (Service: CloudWatch, Status Code: 400, Request ID: a84e1dcb-bfba-495b-97f5-60d7316c1800)
Steps to Reproduce
I haven't validated but I believe this should cause the error to be triggered.
var publisher = CloudWatchMetricPublisher.builder()
.namespace("Test")
.cloudWatchClient(asyncClient)
.detailedMetrics(CoreMetric.API_CALL_DURATION)
.dimensions(CoreMetric.OPERATION_NAME)
.build();
for (int i=0;i<1000;i++) {
MetricCollector methodCollector = MetricCollector.create("RPC");
methodCollector.reportMetric(CoreMetric.API_CALL_DURATION, i);
methodCollector.reportMetric(CoreMetric.OPERATION_NAME, "YourRPC");
metricPublisher.publish(methodCollector.collect());
}
// Wait for the periodic metric flush
Thread.sleep(120*1000);
Possible Solution
software/amazon/awssdk/metrics/publishers/cloudwatch/internal/transform/MetricCollectionAggregator.java:128
could be updated from:
MetricDatum data = detailedMetricDatum(timeBucket, detailedAggregator, startIndex, MAX_VALUES_PER_REQUEST - valuesInRequestCounter.get());
To something like:
MetricDatum data = detailedMetricDatum(timeBucket, detailedAggregator, startIndex, Math.min(MAX_VALUES_PER_REQUEST - valuesInRequestCounter.get(), 150));
Context
Trying to implement metrics to monitor the performance of my API server using custom CloudWatch metrics.
AWS Java SDK version used
17
JDK version used
Whatever is in docker image openjdk:17-oracle
Operating System and version
Docker image: openjdk:17-oracle linux-oracle?