Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a new metric: Application-level MillisBehindLatest #868

Merged
merged 4 commits into from
Dec 6, 2021

Conversation

QAQJ
Copy link
Contributor

@QAQJ QAQJ commented Nov 23, 2021

Feature request discussion: #249

According the document, MillisBehindLatest metric is used for monitoring how far the KCL application is behind to the latest record in the shard it's working with, and this is a shard-level metric. However a KCL application may work with multiple shards (or even streams), and as per customer's request, an application-level MillisBehindLatest metric would be helpful so that they only need to setup 1 alarm for monitoring the latencies of their KCL application, instead of setting an alarm for each shard.

This PR implements the application-level MillisBehindLatest metric, simply by creating a new application-level metrics scope, along with the existing shard-level metrics scope, in ProcessTask, and use it to collect and publish the MillisBehindLatest data. This scope doesn't have ShardId or StreamId as its dimensions, so all the MillisBehindLatest will be published to a general metric in CloudWatch.

To view this metric, customer can log into their AWS console -> CloudWatch -> All Metrics -> name-of-their-application under Custom Namespaces section -> Operation -> then they should see the metric like below.
Screen Shot 2021-11-22 at 4 54 38 PM

The metric level is set to SUMMARY, same as shard-level MillisBehindLatest metric.

@@ -45,6 +45,7 @@
@KinesisClientInternalApi
public class ProcessTask implements ConsumerTask {
private static final String PROCESS_TASK_OPERATION = "ProcessTask";
private static final String APPLICATION_LEVEL_METRICS = "ApplicationLevelMetrics";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: Read through the scope of ProcessTask.java and MetricsUtil.java. For consistency, let's keep the naming *_METRIC reserved for data fields. What we want to achieve here is a different scope hence it cannot be categorized as *_METRIC. You could either rename to operation or scope.
Suggestion: either use OPERATION or SCOPE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the public documentation and found "ApplicationLevelMetrics" not descriptive enough. This new metric we are adding belongs to https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-kcl.html#kcl-metrics-per-app. Refer to this naming convention on Operation field, and get feedback from PM on the naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I showed a demo to the PM previously, but yeah I agree, will talk with the PM to confirm the names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with Nihar, we decided to use ApplicationTracker as the operation, and the code is updated now.

@avahuang0429
Copy link
Contributor

General comment, can we try test the correctness of the metrics? I tried to look for test cases but there's none. Setting up brand new test case could be an option, but evaluate either that or manual testing

@QAQJ
Copy link
Contributor Author

QAQJ commented Dec 2, 2021

@avahuang0429 I do have a testing script in my personal repo: https://github.com/QAQJ/amazon-kinesis-client/blob/dev-jiqilin/amazon-kinesis-client/src/test/java/software/amazon/kinesis/ApplicationTest.java, which runs a 2-stream application and the metric can be published to the cloudwatch. However the concern is that I wasn't able to create any latency to that application, so all MillsBehindLatest (including all shards & app-level) are all 0 my test. But theoretically, if any latency happens in any shard, by go to Graphed Metrics and choose "Maximum" in the Statistic dimension, like the second screenshot, customer should be able to see it.
Screen Shot 2021-12-02 at 10 58 08 AM
Screen Shot 2021-12-02 at 11 04 22 AM
o

Copy link
Contributor

@avahuang0429 avahuang0429 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, make sure to paste offline comments + retry build till pass before merging

@QAQJ
Copy link
Contributor Author

QAQJ commented Dec 6, 2021

In my personal branch I've manually created some MillisBehindLatest latency: https://github.com/QAQJ/amazon-kinesis-client/tree/dev-jiqilin
When we set the Statistics to Average the metrics will show as the first image, we can see the ApplicationTracker.MillisBehindLatest is the average of the 2 shards.
Screen Shot 2021-12-06 at 8 47 13 AM
When we set the Statistics to Maximum, ApplicationTracker.MillisBehindLatest will be the same as the shard with largest latency.
Screen Shot 2021-12-06 at 8 47 27 AM
Thus, KCL users can decide which statistic they want for their alarms.

@QAQJ QAQJ closed this Dec 6, 2021
@QAQJ QAQJ reopened this Dec 6, 2021
@avahuang0429 avahuang0429 merged commit bedae95 into awslabs:master Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants