Skip to content

[AMORO-1766] Implement table self-optimizing metric collection#1913

Closed
huyuanfeng2018 wants to merge 8 commits intoapache:masterfrom
huyuanfeng2018:self-optimizing_metric
Closed

[AMORO-1766] Implement table self-optimizing metric collection#1913
huyuanfeng2018 wants to merge 8 commits intoapache:masterfrom
huyuanfeng2018:self-optimizing_metric

Conversation

@huyuanfeng2018
Copy link
Copy Markdown
Contributor

@huyuanfeng2018 huyuanfeng2018 commented Sep 4, 2023

Why are the changes needed?

Close #1766

Brief change log

Add self-optimizing-metric indicator collection logic

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added module:ams-server Ams server module module:ams-dashboard Ams dashboard module labels Sep 4, 2023
@huyuanfeng2018 huyuanfeng2018 changed the title init [AMORO-1766]Implement table self-optimizing metric collection Sep 4, 2023
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 4, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (d48c0e0) 52.94% compared to head (fe40238) 52.75%.
Report is 8 commits behind head on master.

Files Patch % Lines
...metrics/SelfOptimizingStatusDurationMsContent.java 60.71% 11 Missing ⚠️
...rver/metrics/SelfOptimizingTotalCostMsContent.java 52.63% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1913      +/-   ##
============================================
- Coverage     52.94%   52.75%   -0.20%     
- Complexity     3612     4235     +623     
============================================
  Files           465      513      +48     
  Lines         24502    29412    +4910     
  Branches       2340     2853     +513     
============================================
+ Hits          12973    15515    +2542     
- Misses        10510    12653    +2143     
- Partials       1019     1244     +225     
Flag Coverage Δ
core 53.10% <76.19%> (+0.16%) ⬆️
trino 50.87% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@huyuanfeng2018 huyuanfeng2018 changed the title [AMORO-1766]Implement table self-optimizing metric collection [AMORO-1766] Implement table self-optimizing metric collection Sep 4, 2023
TableOptimizingProcess optimizingProcess = new TableOptimizingProcess(planner);
LOG.info("{} after plan get {} tasks", tableRuntime.getTableIdentifier(),
optimizingProcess.getTaskMap().size());
SelfOptimizingReport selfOptimizingReport =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Only report planner.isNecessary() == true. I think == false should also be reported.
  2. Do we need keep the cost value same with printed in the com.netease.arctic.server.optimizing.plan.OptimizingPlanner#planTasks ?
    LOG.info("{} finish plan, type = {}, get {} tasks, cost {} ns, {} ms", tableRuntime.getTableIdentifier(),
        getOptimizingType(), tasks.size(), endTime - startTime, (endTime - startTime) / 1_000_000);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. make sence
  2. I'm also not sure, maybe see if anyone else has an opinion on this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree report metric even when planner.isNecessary() == false

@TaggedMetrics.Metric(name = TABLE_OPTIMIZING_COMMIT_DURATION_SECOND)
public Counter tableOptimizingCommitDurationSecond() {
return this.tableOptimizingCommitDurationSecond;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a timer to record the cost time and duration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

selfOptimizingReport.setTargetSnapshotId(optimizingProcess.getTargetSnapshotId());
selfOptimizingReport.setOptimizingStatus(optimizingProcess.getStatus().name());
selfOptimizingReport.tableOptimizingCommitDurationSecond()
.inc(currentStatesDurationSecond());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Timer.time(() -> {//do something}) to calculate the cost time of a certain operation.

Copy link
Copy Markdown
Contributor Author

@huyuanfeng2018 huyuanfeng2018 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@huyuanfeng2018 huyuanfeng2018 changed the title [AMORO-1766] Implement table self-optimizing metric collection [AMORO-1766] [WIP] Implement table self-optimizing metric collection Sep 5, 2023
@huyuanfeng2018 huyuanfeng2018 changed the title [AMORO-1766] [WIP] Implement table self-optimizing metric collection [AMORO-1766] Implement table self-optimizing metric collection Sep 7, 2023
SelfOptimizingPlanDurationReport selfOptimizingPlanDurationReport =
new SelfOptimizingPlanDurationReport(tableRuntime.getTableIdentifier().toString());
Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration();
Timer.Context context = timer.time();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?

How to understand this logic, is it also counted in the plan cost?

    if (this.tasks != null) {
      return this.tasks;
    }

Copy link
Copy Markdown
Contributor

@hameizi hameizi Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should separate the logic of return this.tasks; and actually planting the task. For example:

public List<TaskDescriptor> findTask() {
    if (this.tasks != null) {
      return this.tasks;
    } else {
      Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration();
      timer.time(planTasks());
      metricsReporters.report(selfOptimizingPlanDurationReport);
      return this.tasks;
   }
}

Copy link
Copy Markdown
Contributor Author

@huyuanfeng2018 huyuanfeng2018 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should separate the logic of return this.tasks; and actually planting the task. For example:

public List<TaskDescriptor> findTask() {
    if (this.tasks != null) {
      return this.tasks;
    } else {
      Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration();
      timer.time(planTasks());
      metricsReporters.report(selfOptimizingPlanDurationReport);
      return this.tasks;
   }
}

+1 ,solevd PTAL


import com.codahale.metrics.Timer;

public class SelfOptimizingPlanDurationReport implements MetricReport {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, these interface definitions are all determined in another issus #1765

I also feel that these namings cause some ambiguity, maybe some changes are needed.

WDYT? @hameizi

@github-actions github-actions bot added type:build and removed module:ams-server Ams server module labels Oct 30, 2023
public SelfOptimizingPlanDurationContent data() {
return this;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support printing the metric content with LoggingMetricsEmitter, we need to override the toString() of all metrics. This can be achieved by using MoreObjects.toStringHelper.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

selfOptimizingStatusDurationMsContent.setTargetSnapshotId(
optimizingProcess.getTargetSnapshotId());
}
metricsManager.emit(selfOptimizingStatusDurationMsContent);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we should calculate the relevant metrics for the optimizing process in completeProcess()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should filter out the logic from optimizing to COMMITTING when the state changes. The optimizing cost is calculated separately in the SelfOptimizingTotalCostMsContent indicator.
WDYT @hameizi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the duration of the previous state can be calculated for every state transition, including the transition from "optimizing" to "COMMITTING". The total duration of "optimizing" should include the duration of the "COMMITTING" state, so I previously suggested calculating the total duration of the optimizing process in the completeProcess() method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved.

Comment on lines +588 to +598
SelfOptimizingTotalCostMsContent selfOptimizingTotalCostMsContent =
new SelfOptimizingTotalCostMsContent(
tableRuntime.getTableIdentifier().toString(), processId, optimizingType.name());
taskMap
.values()
.forEach(
taskRuntime ->
selfOptimizingTotalCostMsContent
.tableOptimizingTotalCostMs()
.inc(taskRuntime.getCostTime()));
tableRuntime.getMetricsManager().emit(selfOptimizingTotalCostMsContent);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the self-optimizing total cost project, when it comes to statistics, only the sum of task execution times is considered, without considering concurrency. What does this metric mean for users? Shouldn't the process execution time be reported instead?

Copy link
Copy Markdown
Contributor Author

@huyuanfeng2018 huyuanfeng2018 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous design document, this metrics is the sum of the time consumption of all tasks that define statistics. I think This metrics can express the amount of resources consumed by optimizing at one time. SelfOptimizingStatusDurationMsContent can get the execution time of optimizing.

Copy link
Copy Markdown
Contributor

@Aireed Aireed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left some comment

.tableOptimizingTotalCostMs()
.inc(taskRuntime.getCostTime()));

MetricsManager.instance().emit(selfOptimizingTotalCostMsContent);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should calculate the total optimization time after the commit is completed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@baiyangtx
Copy link
Copy Markdown
Contributor

baiyangtx commented Apr 27, 2024

I think this PR could be closed due to #2674 is merged.

@zhoujinsong @huyuanfeng2018

@baiyangtx
Copy link
Copy Markdown
Contributor

Closed due to #2674

@baiyangtx baiyangtx closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-dashboard Ams dashboard module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask]: Implement table self-optimizing metric collection

8 participants