Skip to content

[HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode#1647

Closed
pratyakshsharma wants to merge 1 commit intoapache:masterfrom
pratyakshsharma:hudi-867
Closed

[HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode#1647
pratyakshsharma wants to merge 1 commit intoapache:masterfrom
pratyakshsharma:hudi-867

Conversation

@pratyakshsharma
Copy link
Contributor

Tips

What is the purpose of the pull request

Added a way to create metric names with updated table name in every iteration so that IllegalArgumentException does not comes up.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codecov-commenter
Copy link

codecov-commenter commented May 20, 2020

Codecov Report

Merging #1647 into master will increase coverage by 1.76%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1647      +/-   ##
============================================
+ Coverage     16.60%   18.37%   +1.76%     
- Complexity      800      855      +55     
============================================
  Files           344      344              
  Lines         15172    15165       -7     
  Branches       1512     1512              
============================================
+ Hits           2520     2786     +266     
+ Misses        12320    12026     -294     
- Partials        332      353      +21     
Impacted Files Coverage Δ Complexity Δ
...java/org/apache/hudi/config/HoodieWriteConfig.java 43.91% <ø> (+1.65%) 48.00 <0.00> (ø)
...in/java/org/apache/hudi/metrics/HoodieMetrics.java 18.86% <0.00%> (-0.37%) 6.00 <0.00> (ø)
...le/view/IncrementalTimelineSyncFileSystemView.java 4.51% <0.00%> (+0.56%) 4.00% <0.00%> (+1.00%)
...common/table/view/AbstractTableFileSystemView.java 8.59% <0.00%> (+2.34%) 6.00% <0.00%> (+1.00%)
...i/common/table/timeline/HoodieDefaultTimeline.java 52.30% <0.00%> (+4.61%) 28.00% <0.00%> (+4.00%)
.../main/java/org/apache/hudi/common/util/Option.java 51.35% <0.00%> (+5.40%) 18.00% <0.00%> (+1.00%)
...udi/timeline/service/handlers/BaseFileHandler.java 11.11% <0.00%> (+11.11%) 1.00% <0.00%> (+1.00%)
...common/table/view/PriorityBasedFileSystemView.java 11.94% <0.00%> (+11.94%) 4.00% <0.00%> (+4.00%)
...i/common/table/view/HoodieTableFileSystemView.java 37.93% <0.00%> (+13.79%) 9.00% <0.00%> (+2.00%)
...common/table/view/FileSystemViewStorageConfig.java 63.49% <0.00%> (+14.28%) 8.00% <0.00%> (+3.00%)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 244d474...464aa36. Read the comment docs.

try {
long start = System.currentTimeMillis();
Option<String> scheduledCompactionInstant = deltaSync.syncOnce();
HoodieMetrics.setTableName(cfg.metricsTableName + "_" + iteration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't quite get why we need to set table name here? Next line will in turn rely on the arg passed for tablename. So, don't really understand why we need static fix (i.e. setTableName). From the diff, I see that cfg.tableName is set passed into DeltaSync.syncOnce(tblName) and HoodieDeltaStreamerMetrics(HoodieWriteConfig tableName). Can you help me understand the case where the static set method for table name is required.

Copy link
Contributor Author

@pratyakshsharma pratyakshsharma May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException is happening because the metrics name are generated in same way using tableName in each run. So we need some way of differentiating metrics names for every run and the easiest way to do that is altering the table name like "tableName_iteration". We need to do this change at 2 places, for HoodieDeltaStreamerMetrics and for HoodieMetrics.
The table name getting passed with syncOnce() method takes care of HoodieDeltaStreamerMetrics only. For all the other metrics, we need to reset table name in HoodieMetrics class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratyakshsharma Thanks for the fix, I have a small question, when using Spark Streaming writing to Hudi (which seems like the continuous mode of deltastreamer), the exception will happen again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leesf yes. This PR only fixes this for continuous mode of HoodieDeltaStreamer. If you can point me to relevant code of spark datasource from where this can be executed, I can try fixing there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratyakshsharma just a raw idea, how about adding a static variable (iterator) in HoodieMetrics, and ++iterator after updateCommitMetrics. And back to the PR, if we only fix the HoodieDeltaStreamer, I think we would simply pass the iteration_time to syncOnce method to create a new metric name instead of adding a static tableName to HoodieMetrics, I think it is a little bit weird.

@nsivabalan
Copy link
Contributor

I will let @leesf reivew this patch.

@leesf leesf assigned leesf and unassigned nsivabalan May 30, 2020
@xushiyan
Copy link
Member

xushiyan commented Jun 5, 2020

@pratyakshsharma IIUC, this will introduce infinite number of metrics being sent to monitoring system in theory when it's set to continuous mode? Normally for a Hudi table, we'd like to monitor all kinds of metrics named like <prefix>.<hudi table>.*, and aggregate over <prefix>.<hudi table>.<a fixed hudi metric> for sum, avg, p95, etc. If metric name becomes <prefix>.<hudi table>_i.<a fixed hudi metric>, then the aggregation is not possible, which is undesirable.

Haven't thought about the solution yet. Just trying to raise the concern.
cc @leesf

@leesf
Copy link
Contributor

leesf commented Jun 6, 2020

@pratyakshsharma IIUC, this will introduce infinite number of metrics being sent to monitoring system in theory when it's set to continuous mode? Normally for a Hudi table, we'd like to monitor all kinds of metrics named like <prefix>.<hudi table>.*, and aggregate over <prefix>.<hudi table>.<a fixed hudi metric> for sum, avg, p95, etc. If metric name becomes <prefix>.<hudi table>_i.<a fixed hudi metric>, then the aggregation is not possible, which is undesirable.

Haven't thought about the solution yet. Just trying to raise the concern.
cc @leesf

Thanks @xushiyan for your thoughts, yes, current solution will send metrics like <prefix>.<hudi table>_i.<a fixed hudi metric>, is it suitable to change to <prefix>.<hudi table>.<i>.<a fixed hudi metric>?

@xushiyan
Copy link
Member

xushiyan commented Jun 7, 2020

@leesf There is some markdown format issue with your typings...not sure what is suggested.

Just to clarify from user perspective. Say a user runs a delta streamer, he denotes the job with prefix foo, and the job writes to a Hudi table bar. He wants to monitor a metric named deltastreamer.duration. So the final metric name is foo.bar.deltastreamer.duration. A monitoring system normally aggregates the emitted values for the same metric name. Aggregation like avg() won't work if delta streamer emits values for different metric named like foo.bar_1.deltastreamer.duration, foo.bar_2.deltastreamer.duration, and so on.

@xushiyan
Copy link
Member

xushiyan commented Jun 7, 2020

is it suitable to change to <prefix>.<hudi table>.<i>.<a fixed hudi metric>

@leesf This is also problematic as _i since the metric name is changed. Aggregation works on the same metric name.

@sbernauer
Copy link
Contributor

If i read https://stackoverflow.com/a/55753138 correctly, normally you register an gauge only at startup (or first metric write) and than just update the value in every loop. Currently Deltastreamer tries to register the gauge in every loop. (

Metrics.registerGauge(getMetricsName("deltastreamer", "duration"), getDurationInMs(durationInNs));
)
It would be necessary to close the gauge at shutdown of the Deltastreamer, so if the Deltastreamer gets restarted the metric can be registered again.

@shenh062326
Copy link
Contributor

@pratyakshsharma I will add PR to support update metrics this weekend.

@pratyakshsharma
Copy link
Contributor Author

I guess this issue is already solved. Can we close it now @xushiyan ?

@xushiyan
Copy link
Member

@pratyakshsharma yes I think so. Thanks for taking on this!

@vinothchandar
Copy link
Member

@pratyakshsharma @xushiyan should this PR be closed?

@xushiyan
Copy link
Member

yup closing.

@xushiyan xushiyan closed this Aug 31, 2020
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Dec 14, 2025
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants