Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-7255. Add metrics for container reports events #3936

Merged
merged 5 commits into from Nov 16, 2022

Conversation

sumitagrawl
Copy link
Contributor

What changes were proposed in this pull request?

Added metrics for

  • FCR/ICR report waiting in queue crosses threshold
  • FCR/ICR End to End execution time crosses threshold

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7255

How was this patch tested?

Tested using Unit case and count is verified

@kerneltime
Copy link
Contributor

Will these metrics be exposed via prometheus?

@sumitagrawl
Copy link
Contributor Author

sumitagrawl commented Nov 7, 2022

Will these metrics be exposed via prometheus?

Using JMX endpoint, its exposed as below,
image

Using prometheus, its exposed, but coming only for FCR, But not coming for ICR as metric name is created based on class name, this is old problem and will be tracked as part of JIRA https://issues.apache.org/jira/browse/HDDS-7464

@sumitagrawl
Copy link
Contributor Author

@ChenSammi @nandakumar131 Plz review
@devmadhuu Plz review

@@ -490,7 +490,8 @@ public final class ScmConfigKeys {
public static final String OZONE_SCM_EVENT_CONTAINER_REPORT_THREAD_POOL_SIZE =
OZONE_SCM_EVENT_PREFIX + "ContainerReport.thread.pool.size";
public static final int OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT = 10;

public static final int OZONE_SCM_EVENT_REPORT_QUEUE_WAIT_DEFAULT = 60000;
public static final int OZONE_SCM_EVENT_REPORT_EXEC_WAIT_DEFAULT = 120000;
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OZONE_SCM_EVENT_REPORT_EXEC_WAIT_DEFAULT -> OZONE_SCM_EVENT_REPORT_LONG_EXEC_DEFAULT?
Also please add some comment about the time unit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment, its in milli second

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add threshold in the name, that is

OZONE_SCM_EVENT_REPORT_QUEUE_WAIT_DEFAULT -> OZONE_SCM_EVENT_REPORT_QUEUE_WAIT_THRESHOLD_DEFAULT

OZONE_SCM_EVENT_REPORT_EXEC_WAIT_DEFAULT -> OZONE_SCM_EVENT_REPORT_EXEC_WAIT_THRESHOLD_DEFAULT

OZONE_SCM_EVENT_REPORT_QUEUE_WAIT_DEFAULT);
long execWaitThreshold = ozoneConfiguration.getInt(
ScmUtils.getContainerReportConfPrefix() + ".execute.wait.threshold",
OZONE_SCM_EVENT_REPORT_EXEC_WAIT_DEFAULT);
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the full name of these two property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the comments in StorageContainerManager.java too?

@ChenSammi
Copy link
Contributor

LGTM + 1. Thanks @sumitagrawl.

@ChenSammi ChenSammi merged commit b46f961 into apache:master Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants