Metrics #23

millems · 2017-07-03T17:03:17Z

Review the inherited state of V1 metrics and determine which changes are necessary for V2.

(Feel free to comment on this issue with desired changes).

brharrington · 2019-02-22T22:08:57Z

I'm curious if this has gotten any more thought, is there a current recommendation?

For our use-cases, we mostly care about the request volume and latency. I found that I can get most of what I care about using the pluggable http client support. I'm creating a wrapper for an SdkHttpClient or SdkAsyncHttpClient that captures basic metrics about the requests. The main gap is that I don't see a good way to get the following information that was easily accessible from the V1 metrics collector:

Service name (e.g. EC2)
Request type (e.g., DescribeInstances)
AWS error code, we get the HTTP status code, but that doesn't always map as we would expect to the AWS error

For synchronous clients we have a hacky way to get the first two by looking at the stacktrace and seeing what the entry point to the SDK was. For async clients that is not viable. Is there a better way to access that type of information at the HTTP client layer?

More generally, is there a better integration point that could be used?

millems · 2019-02-22T22:36:50Z

@brharrington

For metrics, volume and latency the ExecutionInterceptor is probably the best integration point for a custom metrics implementation.

brharrington · 2019-02-23T00:59:50Z

Thanks for the pointer @millems . Looks like it solves most of my problems.

millems · 2019-02-23T01:05:47Z

@brharrington LMK if you have any questions.

swaranga · 2019-09-10T08:41:01Z

@millems What is the right way to log latencies for a request using the Interceptor API. One way could be to add the start timestamp in a beforeExecution call to the ExecutionAttributes instance and in the afterExecution method, get the start time from the attributes and calculate the elapsed time and log the latency.

Is that the recommended approach?

millems · 2019-09-10T17:11:33Z

@swaranga That's the recommended way to log the latency for the entire execution (which includes marshalling and retries), yes.

swaranga · 2019-09-10T18:22:52Z

@millems One problem with this approach is that if I want to execute a different interceptor per request so that I can tie the metrics logging with my parent unit of work I cannot do that since there is no way to add a interceptor for a request object.

This is useful for instance when I am serving a request for my callers and as part of that I need to call an AWS API, I would like to log metrics for that particular call along with other metrics for the incoming request to my service. I could probably achieve this by using hacky ThreadLocals but that makes the whole code brittle to refactoring.

This could be addressed by adding support for per request interceptors but assuming both the SDK level interceptors and the request level interceptors are executed for a given request, for metrics logging that may not be the most optimal API design and here is why:

Consider an an application that processes messages from Kinesis using the KCL library which are then transformed and written to a different Kinesis stream. In such a case, I will want to use the same SDK instance for both the read and write to/from Kinesis as recommended by SDK docs. And since I have no way of assigning a request level interceptor for the get records call from KCL (since the call is made directly from the KCL code), I will have to add an SDK level interceptor. At the same time, I will also want to add a request level interceptor for the part of my code that does the Kinesis write so that the Kinesis write call metrics are tied to my unit of work which is the processing of each message.

In such a setup, if both the interceptors are executed, I will have incorrect metrics logged (call counts, failures, avg latency etc). I thus think the interceptor API is too low level to be useful for flexible metrics logging and I ask that you reconsider the design you had for the V1 SDK that only executed one RequestMetricCollector even if multiple collectors were configured at different levels (Http Client, SDK, Request instance - in the order of preference).

A dedicated API is definitely needed, purpose-built for metrics logging and should be addressed before the V2 SDK become more ubiquitous.

millems · 2019-09-13T21:49:09Z

We agree that the interceptor API isn't optimal for someone just caring about metrics. There will be a separate interface for metrics. Here's our latest thoughts: #1304 #1362

swaranga · 2019-09-15T21:18:18Z

Thank you, I have added my comments on the design PR

jeffsteinmetz · 2019-10-11T15:22:48Z

An incremental implementation that at least adds the Java JVM Memory, JVM Threads and OS File Descriptors would be beneficial while the interfaces to the AWS specific metrics are worked out.

swaranga · 2019-10-12T06:05:54Z

@jeffsteinmetz Why should the SDK provide metrics on your application's JVM memory usage, file descriptors and threads? Do you mean memory, file descriptors and threads used by the SDK itself? If so, threads might be doable but I am not sure about memory usage. For file descriptors, may be it can provide number of open Http connections from the SDK but beyond that I am not very sure this should be in scope.

jeffsteinmetz · 2019-10-12T17:51:13Z

@swaranga, feature parity with the 1.x Java SDK to provide JVM and Machine Metrics. We use this in the 1.x java sdk,

see
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/metrics/package-summary.html

and

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/generating-sdk-metrics.html

Machine Metrics

Memory

TotalMemory - Total amount of memory currently available to the JVM for current and future objects, measured in bytes. This value may vary over time, depending on the host environment.
FreeMemory - An approximation to the total amount of memory currently available to the JVM for future allocated objects, measured in bytes.
UsedMemory - TotalMemory minus FreeMemory.
SpareMemory - The maximum amount of memory that the JVM will attempt to use, measured in bytes, minus UsedMemory.
Threads

ThreadCount - The current number of live threads including both daemon and non-daemon threads.
DeadLockThreadCount - The number of threads that are deadlocked waiting for object monitors or ownable synchronizers, if any. Threads are deadlocked in a cycle waiting for a lock of these two types if each thread owns one lock while trying to acquire another lock already held by another thread in the cycle. No metrics is generated when the value is zero.
DaemonThreadCount - The current number of live daemon threads. No metrics is generated when the value is zero.
PeakThreadCount - The peak live thread count since the JVM started or since the peak was reset.
TotalStartedThreadCount - The total number of threads created and also started since the JVM started.
File Descriptors

OpenFileDescriptorCount - Number of opened file descriptors of the operating system.
SpareFileDescriptorCount - Maximum number of file descriptors of the operating system minus OpenFileDescriptorCount.

and from the 1.x docs

The AWS SDK for Java can generate metrics for visualization and monitoring with CloudWatch that measure:

your application’s performance when accessing AWS
the performance of your JVMs when used with AWS
runtime environment details such as heap memory, number of threads, and opened file descriptors

millems · 2019-10-14T17:31:27Z

+1, I could see it being nice to get JVM metrics into cloudwatch. It's more of a "Java on AWS" feature than a "Java SDK" feature, but it was included in the Java SDK 1.11.x, so it makes sense as a feature request.

swaranga · 2019-10-15T18:54:53Z

Understood. Although there seems to be no new development on the related PR's. Is this still a priority?

millems · 2019-10-15T19:10:49Z

We recognize this feature to be a blocker for many customers to migrate from 1.x to 2.x. We still believe this feature to be important and intend to implement it.

debora-ito · 2019-12-31T20:20:39Z

Feature request from v1: Allow a user to add a JVM shutdown hook to drain metrics queue - aws/aws-sdk-java#1296

gigiigig · 2020-05-15T13:32:18Z

Hello! Do you have any ETA for this? Thanks.

millems · 2020-05-15T16:28:52Z

@gigiigig Sorry, we unfortunately can't provide ETAs, but we will keep this issue up to date as we progress.

dagnir · 2020-07-07T22:57:28Z

Hi all,

We're happy to announce that the preview of the client-side metrics feature is now released! You can try it out starting with version 2.13.52. Please note if you want to use the cloudwatch-metric-publisher, the version to use is 2.13.52-PREVIEW. For more information please take a look at our announcement blog post.

swaranga · 2020-07-08T05:10:06Z

From the blog post, it seems like the metrics publisher is configured at the SDK level. Can I also attach one at the per request level? The V1 SDK had that option to override it at the per request level that allowed us to publish the metrics for the request along with the current unit of work.

Is that feasible?

dagnir · 2020-07-08T17:15:37Z

@swaranga Yep, absolutely. You can set publishers at the request level as well using the RequestOverrideConfiguration

GetObjectRequest.builder()
                ...
                .overrideConfiguration(o -> o.addMetricPublisher(myPublisher))
                .build();

gmariotti · 2020-07-08T17:25:08Z

Is there any plan to provide a Micrometer based implementation for the metric publisher?

dagnir · 2020-07-08T23:54:13Z

Hi @gmariotti we don't have any plans currently for a Micrometer implementation.

dagnir · 2020-09-02T18:08:00Z

Hi everyone, the Metrics feature is now GA so I'm going to ahead and close this issue. Please check it out and feel free to open an Issue for any problems or feedback you have. https://aws.amazon.com/blogs/developer/client-side-metrics-for-the-aws-sdk-for-java-v2-is-now-generally-available/

rehevkor5 · 2021-09-30T15:56:56Z

It's possible to implement your own MetricPublisher in order to produce metrics in Micrometer format. However, there are some wrinkles due to differences between CloudWatch metrics and the approach used by eg. Prometheus metrics. Here are some details which people might find helpful.

Duration type metrics are directly convertible to Timer.

Boolean type metrics can be converted into two Counters: for example one which increments for true and one which increments for false.

Number type metrics can be tricky. Some of them (such as RetryCount) can be recorded as a Counter (which increments with each given value). However, metrics such as HttpMetric#AVAILABLE_CONCURRENCY, HttpMetric#LEASED_CONCURRENCY, and HttpMetric#PENDING_CONCURRENCY_ACQUIRES behave more like gauges. Also, they are difficult because they are are only sent (via MetricCollection passed to MetricPublisher#publish()) if a request was made. Therefore, if you treat those metrics as a Gauge they will become "stuck" at their last value until another request occurs. CloudWatch aggregates metrics like this once per minute, recording the sum of the values and the count of how many values were given (as well as min & max value). If the metric isn't reported, CloudWatch seems to assume that the value was zero during that time. Therefore, what you can get from CloudWatch is the average value (by dividing sum / count) or min/max of the metric in each time period. We could record these as a DistributionSummary if we need more details (though it might be somewhat misleading). However, a simpler and more reliable way is to imitate the CloudWatch approach and record the value sum as a Counter along with the value count as its own Counter. This allows us to perform calculations such as rate(...sum_count{...}[...]) / rate(...sample_count{...}[...]) (or, equivalently, increase() / increase()), which basically give us the average value for the time range. It's not necessarily 100% accurate when no requests are happening, but at least it will show zero instead of the last value. I'm about to implement this approach, hopefully it works as expected.

ginkel · 2022-12-15T13:44:20Z

@rehevkor5 Has your attempt to expose metrics to Micrometer been successful? Thanks!

PatrykGala · 2023-01-13T11:10:50Z

@rehevkor5 @ginkel this is my solution: https://gist.github.com/PatrykGala/e4aec004eb55cd8cbdee328f217771c7
The code creates gauge and counter for Integer metrics, one of them is not necessary for specific metric.

millems added Feature Request 1.x Parity labels Jul 3, 2017

dagnir mentioned this issue Sep 6, 2017

Improved client side introspection of the SDK #143

Closed

brharrington mentioned this issue Nov 8, 2017

metrics for aws sdk 2.x preview Netflix/spectator#496

Closed

dagnir mentioned this issue Nov 30, 2018

[Feature Request] Create overloaded methods in DynamoDBMapper to set RequestMetricCollector for each operation #703

Closed

amcp mentioned this issue Jan 17, 2019

Preliminary AWS SDK 2 support awslabs/amazon-dynamodb-lock-client#19

Merged

justnance added feature-request A feature should be added or improved. and removed Feature Request labels Apr 19, 2019

millems changed the title ~~Refactor: Metrics~~ Metrics Jul 8, 2019

swaranga mentioned this issue Sep 15, 2019

Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

Merged

dagnir closed this as completed Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics #23

Metrics #23

millems commented Jul 3, 2017

brharrington commented Feb 22, 2019

millems commented Feb 22, 2019

brharrington commented Feb 23, 2019

millems commented Feb 23, 2019

swaranga commented Sep 10, 2019

millems commented Sep 10, 2019

swaranga commented Sep 10, 2019 •

edited

Loading

millems commented Sep 13, 2019

swaranga commented Sep 15, 2019

jeffsteinmetz commented Oct 11, 2019 •

edited

Loading

swaranga commented Oct 12, 2019

jeffsteinmetz commented Oct 12, 2019

millems commented Oct 14, 2019

swaranga commented Oct 15, 2019

millems commented Oct 15, 2019

debora-ito commented Dec 31, 2019

gigiigig commented May 15, 2020

millems commented May 15, 2020

dagnir commented Jul 7, 2020

swaranga commented Jul 8, 2020

dagnir commented Jul 8, 2020

gmariotti commented Jul 8, 2020

dagnir commented Jul 8, 2020 •

edited

Loading

dagnir commented Sep 2, 2020

rehevkor5 commented Sep 30, 2021

ginkel commented Dec 15, 2022

PatrykGala commented Jan 13, 2023

Metrics #23

Metrics #23

Comments

millems commented Jul 3, 2017

brharrington commented Feb 22, 2019

millems commented Feb 22, 2019

brharrington commented Feb 23, 2019

millems commented Feb 23, 2019

swaranga commented Sep 10, 2019

millems commented Sep 10, 2019

swaranga commented Sep 10, 2019 • edited Loading

millems commented Sep 13, 2019

swaranga commented Sep 15, 2019

jeffsteinmetz commented Oct 11, 2019 • edited Loading

swaranga commented Oct 12, 2019

jeffsteinmetz commented Oct 12, 2019

millems commented Oct 14, 2019

swaranga commented Oct 15, 2019

millems commented Oct 15, 2019

debora-ito commented Dec 31, 2019

gigiigig commented May 15, 2020

millems commented May 15, 2020

dagnir commented Jul 7, 2020

swaranga commented Jul 8, 2020

dagnir commented Jul 8, 2020

gmariotti commented Jul 8, 2020

dagnir commented Jul 8, 2020 • edited Loading

dagnir commented Sep 2, 2020

rehevkor5 commented Sep 30, 2021

ginkel commented Dec 15, 2022

PatrykGala commented Jan 13, 2023

swaranga commented Sep 10, 2019 •

edited

Loading

jeffsteinmetz commented Oct 11, 2019 •

edited

Loading

dagnir commented Jul 8, 2020 •

edited

Loading