[AMORO-1766] Implement table self-optimizing metric collection by huyuanfeng2018 · Pull Request #1913 · apache/amoro

huyuanfeng2018 · 2023-09-04T11:48:50Z

Why are the changes needed?

Close #1766

Brief change log

Add self-optimizing-metric indicator collection logic

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

codecov · 2023-09-04T11:58:07Z

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (d48c0e0) 52.94% compared to head (fe40238) 52.75%.
Report is 8 commits behind head on master.

Files	Patch %	Lines
...metrics/SelfOptimizingStatusDurationMsContent.java	60.71%	11 Missing ⚠️
...rver/metrics/SelfOptimizingTotalCostMsContent.java	52.63%	9 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #1913      +/-   ##
============================================
- Coverage     52.94%   52.75%   -0.20%     
- Complexity     3612     4235     +623     
============================================
  Files           465      513      +48     
  Lines         24502    29412    +4910     
  Branches       2340     2853     +513     
============================================
+ Hits          12973    15515    +2542     
- Misses        10510    12653    +2143     
- Partials       1019     1244     +225

Flag	Coverage Δ
core	`53.10% <76.19%> (+0.16%)`	⬆️
trino	`50.87% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zhongqishang · 2023-09-05T05:49:47Z

ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java

          TableOptimizingProcess optimizingProcess = new TableOptimizingProcess(planner);
          LOG.info("{} after plan get {} tasks", tableRuntime.getTableIdentifier(),
              optimizingProcess.getTaskMap().size());
+          SelfOptimizingReport selfOptimizingReport =


Only report planner.isNecessary() == true. I think == false should also be reported.

Do we need keep the cost value same with printed in the com.netease.arctic.server.optimizing.plan.OptimizingPlanner#planTasks ?

LOG.info("{} finish plan, type = {}, get {} tasks, cost {} ns, {} ms", tableRuntime.getTableIdentifier(), getOptimizingType(), tasks.size(), endTime - startTime, (endTime - startTime) / 1_000_000);

make sence

I'm also not sure, maybe see if anyone else has an opinion on this issue

I agree report metric even when planner.isNecessary() == false

ams/api/src/main/java/com/netease/arctic/ams/api/metrics/SelfOptimizingReport.java

hameizi · 2023-09-05T05:51:38Z

ams/api/src/main/java/com/netease/arctic/ams/api/metrics/SelfOptimizingReport.java

+  @TaggedMetrics.Metric(name = TABLE_OPTIMIZING_COMMIT_DURATION_SECOND)
+  public Counter tableOptimizingCommitDurationSecond() {
+    return this.tableOptimizingCommitDurationSecond;
+  }


I think we should use a timer to record the cost time and duration.

hameizi · 2023-09-05T06:43:05Z

ams/server/src/main/java/com/netease/arctic/server/table/TableRuntime.java

+        selfOptimizingReport.setTargetSnapshotId(optimizingProcess.getTargetSnapshotId());
+        selfOptimizingReport.setOptimizingStatus(optimizingProcess.getStatus().name());
+        selfOptimizingReport.tableOptimizingCommitDurationSecond()
+            .inc(currentStatesDurationSecond());


We can use Timer.time(() -> {//do something}) to calculate the cost time of a certain operation.

hameizi · 2023-09-08T03:11:18Z

ams/server/src/main/java/com/netease/arctic/server/optimizing/plan/OptimizingPlanner.java

+    SelfOptimizingPlanDurationReport selfOptimizingPlanDurationReport =
+        new SelfOptimizingPlanDurationReport(tableRuntime.getTableIdentifier().toString());
+    Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration();
+    Timer.Context context = timer.time();


I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?

I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?

How to understand this logic, is it also counted in the plan cost?

if (this.tasks != null) { return this.tasks; }

Perhaps we should separate the logic of return this.tasks; and actually planting the task. For example:

public List<TaskDescriptor> findTask() { if (this.tasks != null) { return this.tasks; } else { Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration(); timer.time(planTasks()); metricsReporters.report(selfOptimizingPlanDurationReport); return this.tasks; } }

Perhaps we should separate the logic of return this.tasks; and actually planting the task. For example:

public List<TaskDescriptor> findTask() { if (this.tasks != null) { return this.tasks; } else { Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration(); timer.time(planTasks()); metricsReporters.report(selfOptimizingPlanDurationReport); return this.tasks; } }

+1 ,solevd PTAL

huyuanfeng2018 · 2023-09-08T06:24:51Z

ams/api/src/main/java/com/netease/arctic/ams/api/metrics/SelfOptimizingPlanDurationReport.java

+
+import com.codahale.metrics.Timer;
+
+public class SelfOptimizingPlanDurationReport implements MetricReport {


Well, these interface definitions are all determined in another issus #1765

I also feel that these namings cause some ambiguity, maybe some changes are needed.

WDYT? @hameizi

ams/server/src/main/java/com/netease/arctic/server/ArcticServiceContainer.java

hameizi · 2023-11-07T11:21:20Z

...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java

+  public SelfOptimizingPlanDurationContent data() {
+    return this;
+  }
+}


To support printing the metric content with LoggingMetricsEmitter, we need to override the toString() of all metrics. This can be achieved by using MoreObjects.toStringHelper.

hameizi · 2023-11-07T11:48:28Z

ams/server/src/main/java/com/netease/arctic/server/table/TableRuntime.java

+      selfOptimizingStatusDurationMsContent.setTargetSnapshotId(
+          optimizingProcess.getTargetSnapshotId());
+    }
+    metricsManager.emit(selfOptimizingStatusDurationMsContent);


Do we should calculate the relevant metrics for the optimizing process in completeProcess()?

I think I should filter out the logic from optimizing to COMMITTING when the state changes. The optimizing cost is calculated separately in the SelfOptimizingTotalCostMsContent indicator.
WDYT @hameizi

I think that the duration of the previous state can be calculated for every state transition, including the transition from "optimizing" to "COMMITTING". The total duration of "optimizing" should include the duration of the "COMMITTING" state, so I previously suggested calculating the total duration of the optimizing process in the completeProcess() method.

baiyangtx · 2023-11-09T03:57:14Z

ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java

+      SelfOptimizingTotalCostMsContent selfOptimizingTotalCostMsContent =
+          new SelfOptimizingTotalCostMsContent(
+              tableRuntime.getTableIdentifier().toString(), processId, optimizingType.name());
+      taskMap
+          .values()
+          .forEach(
+              taskRuntime ->
+                  selfOptimizingTotalCostMsContent
+                      .tableOptimizingTotalCostMs()
+                      .inc(taskRuntime.getCostTime()));
+      tableRuntime.getMetricsManager().emit(selfOptimizingTotalCostMsContent);


In the self-optimizing total cost project, when it comes to statistics, only the sum of task execution times is considered, without considering concurrency. What does this metric mean for users? Shouldn't the process execution time be reported instead?

In the previous design document, this metrics is the sum of the time consumption of all tasks that define statistics. I think This metrics can express the amount of resources consumed by optimizing at one time. SelfOptimizingStatusDurationMsContent can get the execution time of optimizing.

...server/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingTotalCostMsContent.java

...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java

Aireed

i left some comment

...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java

...r/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingStatusDurationMsContent.java

hameizi · 2023-11-21T11:58:00Z

ams/server/src/main/java/com/netease/arctic/server/optimizing/OptimizingQueue.java

+                      .tableOptimizingTotalCostMs()
+                      .inc(taskRuntime.getCostTime()));
+
+      MetricsManager.instance().emit(selfOptimizingTotalCostMsContent);


I think we should calculate the total optimization time after the commit is completed.

CLAassistant · 2023-11-22T08:42:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

CLAassistant · 2023-11-22T08:42:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

baiyangtx · 2024-04-27T05:33:30Z

I think this PR could be closed due to #2674 is merged.

@zhoujinsong @huyuanfeng2018

baiyangtx · 2024-05-09T01:53:34Z

Closed due to #2674

github-actions bot added module:ams-server Ams server module module:ams-dashboard Ams dashboard module labels Sep 4, 2023

huyuanfeng2018 changed the title ~~init~~ [AMORO-1766]Implement table self-optimizing metric collection Sep 4, 2023

huyuanfeng2018 requested review from hameizi, zhongqishang and zhoujinsong September 4, 2023 12:18

huyuanfeng2018 changed the title ~~[AMORO-1766]Implement table self-optimizing metric collection~~ [AMORO-1766] Implement table self-optimizing metric collection Sep 4, 2023

zhongqishang reviewed Sep 5, 2023

View reviewed changes

ams/api/src/main/java/com/netease/arctic/ams/api/metrics/SelfOptimizingReport.java Outdated Show resolved Hide resolved

hameizi reviewed Sep 5, 2023

View reviewed changes

huyuanfeng2018 changed the title ~~[AMORO-1766] Implement table self-optimizing metric collection~~ [AMORO-1766] [WIP] Implement table self-optimizing metric collection Sep 5, 2023

huyuanfeng2018 force-pushed the self-optimizing_metric branch from 732c4de to 5fb2a7e Compare September 7, 2023 10:45

huyuanfeng2018 changed the title ~~[AMORO-1766] [WIP] Implement table self-optimizing metric collection~~ [AMORO-1766] Implement table self-optimizing metric collection Sep 7, 2023

huyuanfeng2018 requested review from hameizi, shidayang and zhongqishang September 8, 2023 02:14

hameizi reviewed Sep 8, 2023

View reviewed changes

huyuanfeng2018 commented Sep 8, 2023

View reviewed changes

huyuanfeng2018 force-pushed the self-optimizing_metric branch from 4a671ee to b09c58e Compare October 24, 2023 13:10

github-actions bot added type:build and removed module:ams-server Ams server module labels Oct 30, 2023

huyuanfeng2018 force-pushed the self-optimizing_metric branch from c0de775 to 0ef14b3 Compare October 30, 2023 11:54

github-actions bot removed the type:build label Oct 30, 2023

huyuanfeng2018 requested a review from hameizi October 30, 2023 11:54

zhongqishang reviewed Nov 7, 2023

View reviewed changes

ams/server/src/main/java/com/netease/arctic/server/ArcticServiceContainer.java Outdated Show resolved Hide resolved

hameizi reviewed Nov 7, 2023

View reviewed changes

huyuanfeng2018 force-pushed the self-optimizing_metric branch from aa2fb9d to af38280 Compare November 8, 2023 12:33

baiyangtx reviewed Nov 9, 2023

View reviewed changes

chenyuzhi459 reviewed Nov 10, 2023

View reviewed changes

...server/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingTotalCostMsContent.java Outdated Show resolved Hide resolved

chenyuzhi459 reviewed Nov 10, 2023

View reviewed changes

...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java Outdated Show resolved Hide resolved

Aireed reviewed Nov 10, 2023

View reviewed changes

...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java Outdated Show resolved Hide resolved

...r/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingStatusDurationMsContent.java Show resolved Hide resolved

huyuanfeng added 7 commits November 19, 2023 19:45

init

884de4d

fixed

81607db

add test

8084dff

Unified naming

3140e4a

fix

031dec0

Add the total Optimizing processing time

52e0c69

remove plan

30038da

huyuanfeng2018 force-pushed the self-optimizing_metric branch from 24f7684 to 30038da Compare November 19, 2023 11:57

hameizi reviewed Nov 21, 2023

View reviewed changes

fixed

fe40238

hameizi approved these changes Nov 22, 2023

View reviewed changes

baiyangtx closed this May 9, 2024


		import com.codahale.metrics.Timer;

		public class SelfOptimizingPlanDurationReport implements MetricReport {

Conversation

huyuanfeng2018 commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

Brief change log

How was this patch tested?

Documentation

Uh oh!

codecov bot commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hameizi Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyuanfeng2018 Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Aireed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Nov 22, 2023

Uh oh!

CLAassistant commented Nov 22, 2023

Uh oh!

baiyangtx commented Apr 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huyuanfeng2018 commented Sep 4, 2023 •

edited

Loading

codecov bot commented Sep 4, 2023 •

edited

Loading

huyuanfeng2018 Sep 5, 2023 •

edited

Loading

hameizi Sep 8, 2023 •

edited

Loading

huyuanfeng2018 Sep 8, 2023 •

edited

Loading

huyuanfeng2018 Nov 9, 2023 •

edited

Loading

baiyangtx commented Apr 27, 2024 •

edited

Loading