Skip to content

Add lastCommittedSnapshotId commit metric and document missing metrics#7589

Open
junmuz wants to merge 3 commits into
apache:masterfrom
junmuz:adding_additional_metrics
Open

Add lastCommittedSnapshotId commit metric and document missing metrics#7589
junmuz wants to merge 3 commits into
apache:masterfrom
junmuz:adding_additional_metrics

Conversation

@junmuz
Copy link
Copy Markdown
Contributor

@junmuz junmuz commented Apr 2, 2026

Purpose

  • Add lastCommittedSnapshotId gauge to commit metrics, exposing the snapshot ID produced by the most recent commit (defaults to -1 before any commit)
  • Capture the snapshot ID directly from FileStoreCommitImpl after successful commit, avoiding race conditions with concurrent writers
  • Document lastScannedSnapshotId and lastCommittedSnapshotId in the metrics reference page

Tests

The metrics have been verified to be emitted correctly from Flink Jobs. Also new test cases have been added.

  • CommitStatsTest verifies lastCommittedSnapshotId is correctly stored for empty, non-empty, and compaction commits
  • CommitMetricsTest verifies the gauge registers with default -1, then updates correctly across multiple report cycles

@junmuz junmuz force-pushed the adding_additional_metrics branch from 87cc332 to 08324fc Compare April 2, 2026 15:02
@junmuz junmuz marked this pull request as ready for review April 2, 2026 17:44
@junmuz
Copy link
Copy Markdown
Contributor Author

junmuz commented Apr 2, 2026

@JingsongLi Can I get a review please.

<td>Distributions of the time taken by the last few scans.</td>
</tr>
<tr>
<td>lastScannedSnapshotId</td>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric has been added previously, but the docs doesn't reference it.

@junmuz
Copy link
Copy Markdown
Contributor Author

junmuz commented Apr 28, 2026

@JingsongLi @XiaoHongbo-Hope I have rebased the branch and resolved the conflicts. Can I get a review over it? Thanks.

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

@JingsongLi @XiaoHongbo-Hope I have rebased the branch and resolved the conflicts. Can I get a review over it? Thanks.

sure

if (strictModeChecker != null) {
strictModeChecker.update(newSnapshotId);
}
lastCommittedSnapshotId = newSnapshotId;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This misses the duplicate-success retry path. In false-success retry, the earlier duplicate-check branch returns before this assignment, so the gauge can stay -1.Should we also set lastCommittedSnapshotId = snapshot.id() there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I have made changes to handle that.

@junmuz junmuz force-pushed the adding_additional_metrics branch from b76be0f to c57ef2f Compare April 28, 2026 13:49
junmuz added 3 commits April 28, 2026 08:04
…ommit

Read the actual latest snapshot ID from SnapshotManager after commit
instead of computing it from a pre-commit baseline, avoiding race
conditions with concurrent writers. Also adds additional commit and
source reader metrics.
- Fix lastCommittedSnapshotId to default to -1 and read actual snapshot
  ID post-commit instead of querying snapshotManager
- Remove currentConsumerId metric from ConsumerProgressCalculator and
  ContinuousFileSplitEnumerator as lastScannedSnapshotId already exists
- Add lastScannedSnapshotId and lastCommittedSnapshotId to metrics docs
@junmuz junmuz force-pushed the adding_additional_metrics branch from c57ef2f to 9f0d40a Compare April 28, 2026 15:04
@junmuz junmuz requested a review from XiaoHongbo-Hope April 28, 2026 16:24
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observability improvement. lastCommittedSnapshotId is useful for monitoring commit progress and detecting lag.

Review:

  1. Capturing snapshot ID directly from FileStoreCommitImpl after successful commit avoids race conditions with concurrent writers. Good.

  2. Default -1 before any commit is reasonable — distinguishes "no commit yet" from snapshot 0.

  3. Documentation: Adding lastScannedSnapshotId and lastCommittedSnapshotId to the metrics reference page is valuable for operators.

  4. +54/-11 is clean. Tests cover empty, non-empty, and compaction commits.

  5. Minor: The Flink source reader metrics file is also touched — is lastCommittedSnapshotId exposed at both the source reader and commit sides? Ensure the semantics are clear (source tracks what it has read vs. commit tracks what was written).

LGTM.

@JingsongLi
Copy link
Copy Markdown
Contributor

Please rebase master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants