[AMORO-1766] Implement table self-optimizing metric collection#1913
[AMORO-1766] Implement table self-optimizing metric collection#1913huyuanfeng2018 wants to merge 8 commits intoapache:masterfrom
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1913 +/- ##
============================================
- Coverage 52.94% 52.75% -0.20%
- Complexity 3612 4235 +623
============================================
Files 465 513 +48
Lines 24502 29412 +4910
Branches 2340 2853 +513
============================================
+ Hits 12973 15515 +2542
- Misses 10510 12653 +2143
- Partials 1019 1244 +225
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| TableOptimizingProcess optimizingProcess = new TableOptimizingProcess(planner); | ||
| LOG.info("{} after plan get {} tasks", tableRuntime.getTableIdentifier(), | ||
| optimizingProcess.getTaskMap().size()); | ||
| SelfOptimizingReport selfOptimizingReport = |
There was a problem hiding this comment.
- Only report
planner.isNecessary() == true. I think== falseshould also be reported. - Do we need keep the cost value same with printed in the
com.netease.arctic.server.optimizing.plan.OptimizingPlanner#planTasks?
LOG.info("{} finish plan, type = {}, get {} tasks, cost {} ns, {} ms", tableRuntime.getTableIdentifier(),
getOptimizingType(), tasks.size(), endTime - startTime, (endTime - startTime) / 1_000_000);
There was a problem hiding this comment.
- make sence
- I'm also not sure, maybe see if anyone else has an opinion on this issue
There was a problem hiding this comment.
I agree report metric even when planner.isNecessary() == false
ams/api/src/main/java/com/netease/arctic/ams/api/metrics/SelfOptimizingReport.java
Outdated
Show resolved
Hide resolved
| @TaggedMetrics.Metric(name = TABLE_OPTIMIZING_COMMIT_DURATION_SECOND) | ||
| public Counter tableOptimizingCommitDurationSecond() { | ||
| return this.tableOptimizingCommitDurationSecond; | ||
| } |
There was a problem hiding this comment.
I think we should use a timer to record the cost time and duration.
| selfOptimizingReport.setTargetSnapshotId(optimizingProcess.getTargetSnapshotId()); | ||
| selfOptimizingReport.setOptimizingStatus(optimizingProcess.getStatus().name()); | ||
| selfOptimizingReport.tableOptimizingCommitDurationSecond() | ||
| .inc(currentStatesDurationSecond()); |
There was a problem hiding this comment.
We can use Timer.time(() -> {//do something}) to calculate the cost time of a certain operation.
732c4de to
5fb2a7e
Compare
| SelfOptimizingPlanDurationReport selfOptimizingPlanDurationReport = | ||
| new SelfOptimizingPlanDurationReport(tableRuntime.getTableIdentifier().toString()); | ||
| Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration(); | ||
| Timer.Context context = timer.time(); |
There was a problem hiding this comment.
I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?
There was a problem hiding this comment.
I think we can use selfOptimizingPlanDurationReport.tableOptimizingPlanDuration().time(planTasks()) to collect statistics. This way, we don't have to invade the internal logic of the planTasks() method. What do you think?
How to understand this logic, is it also counted in the plan cost?
if (this.tasks != null) {
return this.tasks;
}
There was a problem hiding this comment.
Perhaps we should separate the logic of return this.tasks; and actually planting the task. For example:
public List<TaskDescriptor> findTask() {
if (this.tasks != null) {
return this.tasks;
} else {
Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration();
timer.time(planTasks());
metricsReporters.report(selfOptimizingPlanDurationReport);
return this.tasks;
}
}
There was a problem hiding this comment.
Perhaps we should separate the logic of
return this.tasks;and actually planting the task. For example:public List<TaskDescriptor> findTask() { if (this.tasks != null) { return this.tasks; } else { Timer timer = selfOptimizingPlanDurationReport.tableOptimizingPlanDuration(); timer.time(planTasks()); metricsReporters.report(selfOptimizingPlanDurationReport); return this.tasks; } }
+1 ,solevd PTAL
|
|
||
| import com.codahale.metrics.Timer; | ||
|
|
||
| public class SelfOptimizingPlanDurationReport implements MetricReport { |
4a671ee to
b09c58e
Compare
c0de775 to
0ef14b3
Compare
ams/server/src/main/java/com/netease/arctic/server/ArcticServiceContainer.java
Outdated
Show resolved
Hide resolved
| public SelfOptimizingPlanDurationContent data() { | ||
| return this; | ||
| } | ||
| } |
There was a problem hiding this comment.
To support printing the metric content with LoggingMetricsEmitter, we need to override the toString() of all metrics. This can be achieved by using MoreObjects.toStringHelper.
| selfOptimizingStatusDurationMsContent.setTargetSnapshotId( | ||
| optimizingProcess.getTargetSnapshotId()); | ||
| } | ||
| metricsManager.emit(selfOptimizingStatusDurationMsContent); |
There was a problem hiding this comment.
Do we should calculate the relevant metrics for the optimizing process in completeProcess()?
There was a problem hiding this comment.
I think I should filter out the logic from optimizing to COMMITTING when the state changes. The optimizing cost is calculated separately in the SelfOptimizingTotalCostMsContent indicator.
WDYT @hameizi
There was a problem hiding this comment.
I think that the duration of the previous state can be calculated for every state transition, including the transition from "optimizing" to "COMMITTING". The total duration of "optimizing" should include the duration of the "COMMITTING" state, so I previously suggested calculating the total duration of the optimizing process in the completeProcess() method.
aa2fb9d to
af38280
Compare
| SelfOptimizingTotalCostMsContent selfOptimizingTotalCostMsContent = | ||
| new SelfOptimizingTotalCostMsContent( | ||
| tableRuntime.getTableIdentifier().toString(), processId, optimizingType.name()); | ||
| taskMap | ||
| .values() | ||
| .forEach( | ||
| taskRuntime -> | ||
| selfOptimizingTotalCostMsContent | ||
| .tableOptimizingTotalCostMs() | ||
| .inc(taskRuntime.getCostTime())); | ||
| tableRuntime.getMetricsManager().emit(selfOptimizingTotalCostMsContent); |
There was a problem hiding this comment.
In the self-optimizing total cost project, when it comes to statistics, only the sum of task execution times is considered, without considering concurrency. What does this metric mean for users? Shouldn't the process execution time be reported instead?
There was a problem hiding this comment.
In the previous design document, this metrics is the sum of the time consumption of all tasks that define statistics. I think This metrics can express the amount of resources consumed by optimizing at one time. SelfOptimizingStatusDurationMsContent can get the execution time of optimizing.
...server/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingTotalCostMsContent.java
Outdated
Show resolved
Hide resolved
...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java
Outdated
Show resolved
Hide resolved
...erver/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingPlanDurationContent.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/com/netease/arctic/server/metrics/SelfOptimizingStatusDurationMsContent.java
Show resolved
Hide resolved
24f7684 to
30038da
Compare
| .tableOptimizingTotalCostMs() | ||
| .inc(taskRuntime.getCostTime())); | ||
|
|
||
| MetricsManager.instance().emit(selfOptimizingTotalCostMsContent); |
There was a problem hiding this comment.
I think we should calculate the total optimization time after the commit is completed.
|
huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
1 similar comment
|
huyuanfeng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
I think this PR could be closed due to #2674 is merged. |
|
Closed due to #2674 |
Why are the changes needed?
Close #1766
Brief change log
Add self-optimizing-metric indicator collection logic
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request
Documentation