Adding a new metric: Application-level MillisBehindLatest #868

QAQJ · 2021-11-23T01:06:54Z

Feature request discussion: #249

According the document, MillisBehindLatest metric is used for monitoring how far the KCL application is behind to the latest record in the shard it's working with, and this is a shard-level metric. However a KCL application may work with multiple shards (or even streams), and as per customer's request, an application-level MillisBehindLatest metric would be helpful so that they only need to setup 1 alarm for monitoring the latencies of their KCL application, instead of setting an alarm for each shard.

This PR implements the application-level MillisBehindLatest metric, simply by creating a new application-level metrics scope, along with the existing shard-level metrics scope, in ProcessTask, and use it to collect and publish the MillisBehindLatest data. This scope doesn't have ShardId or StreamId as its dimensions, so all the MillisBehindLatest will be published to a general metric in CloudWatch.

To view this metric, customer can log into their AWS console -> CloudWatch -> All Metrics -> name-of-their-application under Custom Namespaces section -> Operation -> then they should see the metric like below.

The metric level is set to SUMMARY, same as shard-level MillisBehindLatest metric.

avahuang0429 · 2021-12-01T19:51:39Z

amazon-kinesis-client/src/main/java/software/amazon/kinesis/lifecycle/ProcessTask.java

@@ -45,6 +45,7 @@
 @KinesisClientInternalApi
 public class ProcessTask implements ConsumerTask {
    private static final String PROCESS_TASK_OPERATION = "ProcessTask";
+    private static final String APPLICATION_LEVEL_METRICS = "ApplicationLevelMetrics";


Naming: Read through the scope of ProcessTask.java and MetricsUtil.java. For consistency, let's keep the naming *_METRIC reserved for data fields. What we want to achieve here is a different scope hence it cannot be categorized as *_METRIC. You could either rename to operation or scope.
Suggestion: either use OPERATION or SCOPE

I read the public documentation and found "ApplicationLevelMetrics" not descriptive enough. This new metric we are adding belongs to https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-kcl.html#kcl-metrics-per-app. Refer to this naming convention on Operation field, and get feedback from PM on the naming.

I showed a demo to the PM previously, but yeah I agree, will talk with the PM to confirm the names.

After talking with Nihar, we decided to use ApplicationTracker as the operation, and the code is updated now.

avahuang0429 · 2021-12-01T20:43:05Z

General comment, can we try test the correctness of the metrics? I tried to look for test cases but there's none. Setting up brand new test case could be an option, but evaluate either that or manual testing

QAQJ · 2021-12-02T19:07:01Z

@avahuang0429 I do have a testing script in my personal repo: https://github.com/QAQJ/amazon-kinesis-client/blob/dev-jiqilin/amazon-kinesis-client/src/test/java/software/amazon/kinesis/ApplicationTest.java, which runs a 2-stream application and the metric can be published to the cloudwatch. However the concern is that I wasn't able to create any latency to that application, so all MillsBehindLatest (including all shards & app-level) are all 0 my test. But theoretically, if any latency happens in any shard, by go to Graphed Metrics and choose "Maximum" in the Statistic dimension, like the second screenshot, customer should be able to see it.

o

avahuang0429

LGTM, make sure to paste offline comments + retry build till pass before merging

QAQJ · 2021-12-06T19:46:47Z

In my personal branch I've manually created some MillisBehindLatest latency: https://github.com/QAQJ/amazon-kinesis-client/tree/dev-jiqilin
When we set the Statistics to Average the metrics will show as the first image, we can see the ApplicationTracker.MillisBehindLatest is the average of the 2 shards.

When we set the Statistics to Maximum, ApplicationTracker.MillisBehindLatest will be the same as the shard with largest latency.

Thus, KCL users can decide which statistic they want for their alarms.

QAQJ added 2 commits November 22, 2021 16:10

first commit for app-level mills_behind_latest metric

b43c144

fix some naming convention issue

66e5dfe

avahuang0429 reviewed Dec 1, 2021

View reviewed changes

add some comments to explain the metric scopes

a2e0269

change the operation name for the metric

e220791

avahuang0429 approved these changes Dec 6, 2021

View reviewed changes

QAQJ closed this Dec 6, 2021

QAQJ reopened this Dec 6, 2021

avahuang0429 merged commit bedae95 into awslabs:master Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a new metric: Application-level MillisBehindLatest #868

Adding a new metric: Application-level MillisBehindLatest #868

QAQJ commented Nov 23, 2021

avahuang0429 Dec 1, 2021 •

edited

avahuang0429 Dec 1, 2021

QAQJ Dec 1, 2021

QAQJ Dec 2, 2021

avahuang0429 commented Dec 1, 2021

QAQJ commented Dec 2, 2021

avahuang0429 left a comment

QAQJ commented Dec 6, 2021

Adding a new metric: Application-level MillisBehindLatest #868

Adding a new metric: Application-level MillisBehindLatest #868

Conversation

QAQJ commented Nov 23, 2021

avahuang0429 Dec 1, 2021 • edited

Choose a reason for hiding this comment

avahuang0429 Dec 1, 2021

Choose a reason for hiding this comment

QAQJ Dec 1, 2021

Choose a reason for hiding this comment

QAQJ Dec 2, 2021

Choose a reason for hiding this comment

avahuang0429 commented Dec 1, 2021

QAQJ commented Dec 2, 2021

avahuang0429 left a comment

Choose a reason for hiding this comment

QAQJ commented Dec 6, 2021

avahuang0429 Dec 1, 2021 •

edited