Skip to content

[HUDI-8647] Support Clustering Timeline Instant Metrics in HoodieMetrics#12466

Merged
zhangyue19921010 merged 13 commits intoapache:masterfrom
fhan688:support-timeline-metrics
Jan 3, 2025
Merged

[HUDI-8647] Support Clustering Timeline Instant Metrics in HoodieMetrics#12466
zhangyue19921010 merged 13 commits intoapache:masterfrom
fhan688:support-timeline-metrics

Conversation

@fhan688
Copy link
Contributor

@fhan688 fhan688 commented Dec 11, 2024

Change Logs

add hudi clustering timeline instant related metrics:
earliest_inflight_clustering_instant
latest_completed_clustering_instant
clustering_pendingClusteringCount

Impact

hudi-flink-datasource

Risk level (write none, low medium or high below)

Low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 11, 2024
@fhan688 fhan688 changed the title Support Timeline Metrics [HUDI-8647] Support Timeline Metrics Dec 11, 2024
import org.apache.flink.runtime.operators.coordination.CoordinatorStore;
import org.apache.flink.runtime.operators.coordination.OperatorCoordinator;

public class ContextAdapter implements OperatorCoordinator.Context {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the coordinator always reference this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperatorCoordinator.Context is OK for flink version 1.18、1.19、1.20, define ContextAdapter for adapting flink version 1.14、1.15、1.16、1.17.

metricGroup.gauge(PENDING_CLUSTERING_COUNT, () -> clusteringPendingClusteringCount);
}

public void updateTimeLineMetrics(HoodieActiveTimeline activeTimeline) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when the metrics is updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for Teacher Danny's reminding. forget to call update metrics in StreamWriteOperatorCoordinator, metrics will be updated after do commit success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 PTAL

reset();
this.ckpMetadata.commitInstant(instant);
LOG.info("Commit instant [{}] success!", instant);
this.flinkTimeLineMetrics.updateTimeLineMetrics(this.writeClient.getHoodieTable().getActiveTimeline());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BaseHoodieClient already got a HoodieMetrics instance, we can just enhance it with new items or add the FlinkTimeLineMetrics in HoodieFlinkWriteClient, and update it in HoodieFlinkWriteClient.postCommit, you need to override the #postCommit method so that we can eliminate the redundant timeline loading(file listing) for each commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for advising, finally enhanced in HoodieMetrics . @danny0405

@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] and removed size:L PR with lines of changes in (300, 1000] labels Dec 18, 2024
public static final String FINALIZE_ACTION = "finalize";
public static final String INDEX_ACTION = "index";
public static final String SOURCE_READ_AND_INDEX_ACTION = "source_read_and_index";
public static final String CLUSTERTING_INSTANT_ACTION = "replacecommit";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need to take care of clustering commit instead of replace commit

// Delete the marker directory for the instant.
WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
.quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
if (metrics.getClusteringInstantTimerCtx() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check Config config.isMetricsOn() instead of trying to create a Timer.
What we want is to monitor clustering and compaction instant here rat

Comment on lines +575 to +592
Set<String> validActions = CollectionUtils.createSet(HoodieMetrics.CLUSTERTING_INSTANT_ACTION);
HoodieActiveTimeline activeTimeline = table.getActiveTimeline();
HoodieTimeline inflightAndRequested = activeTimeline.filterInflightsAndRequested()
.filter(instant -> validActions.contains(instant.getAction()));
long pendingClusteringInstantCount = Long.valueOf(inflightAndRequested.countInstants());
long earliestInflightClusteringInstant = 0L;
Option<HoodieInstant> firstInstant = inflightAndRequested.firstInstant();
if (firstInstant.isPresent()) {
earliestInflightClusteringInstant = Long.valueOf(firstInstant.get().requestedTime());
}

HoodieTimeline completed = activeTimeline.filterCompletedInstants()
.filter(instant -> validActions.contains(instant.getAction()));
long latestCompletedClusteringInstant = 0L;
Option<HoodieInstant> lastInstant = completed.lastInstant();
if (lastInstant.isPresent()) {
latestCompletedClusteringInstant = Long.valueOf(lastInstant.get().requestedTime());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this logic into metrics.updateTimeLineClusteringInstantMetrics ? it's a little confusing that we do metrics computing in post commit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for advising, updated according to all above comments.

@fhan688 fhan688 changed the title [HUDI-8647] Support Timeline Metrics [HUDI-8647] Support Clustering Timeline Instant Metrics in HoodieMetrics Dec 18, 2024
Copy link
Contributor

@zhangyue19921010 zhangyue19921010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fhan688
Copy link
Contributor Author

fhan688 commented Dec 30, 2024

Teacher Danny, PTAL @danny0405

@danny0405
Copy link
Contributor

@fhan688 Can you resolve the conflicts.

@zhangyue19921010
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 3, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@zhangyue19921010 zhangyue19921010 merged commit 72d17e5 into apache:master Jan 3, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants