Skip to content

HDDS-15413. Recon and SCM Container Sync Metrics addition.#10384

Open
devmadhuu wants to merge 2 commits into
apache:masterfrom
devmadhuu:HDDS-15413
Open

HDDS-15413. Recon and SCM Container Sync Metrics addition.#10384
devmadhuu wants to merge 2 commits into
apache:masterfrom
devmadhuu:HDDS-15413

Conversation

@devmadhuu
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR extends ReconScmContainerSyncMetrics to expose per-state metrics for the four actively reconciled container states:

  • OPEN
  • QUASI_CLOSED
  • CLOSED
  • DELETED

For each state, Recon now reports:

  • Last sync-pass duration in milliseconds.
  • Last pre-sync observed container-count drift, computed as SCM count - Recon count.

The existing overall targeted sync metrics remain unchanged:

  • targetedSyncStatus
  • lastTargetedSyncDurationMs

Why are the changes needed?

Recon periodically syncs container state from SCM, every 6 hours by default. Before this change, metrics only showed the overall targeted sync status and total duration. Admins could not tell:

  • Which state pass took time.
  • Whether the latest cycle observed count drift for a specific state.
  • Whether SCM had more or fewer containers than Recon for a given reconciled state.

The new metrics make this visible in Hadoop metrics and downstream Prometheus time-series data.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15413

How was this patch tested?

Ran the below tests:

TestReconStorageContainerSyncHelper, TestReconScmContainerSyncMetrics

@devmadhuu devmadhuu marked this pull request as ready for review May 29, 2026 11:26
Copy link
Copy Markdown
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thanks for working over this, given few comments

};

private static final MetricsInfo TARGETED_SYNC_STATUS = Interns.info(
"targetedSyncStatus",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be renamed as scmContainerSyncStatus

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

private final Map<HddsProtos.LifeCycleState, AtomicLong>
lastContainerSyncDurationMs;
private final Map<HddsProtos.LifeCycleState, AtomicLong>
lastContainerCountDrift;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need not capture last metrics of ContainerSync / Count / duration as this will be captured by grafana

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@devmadhuu devmadhuu requested a review from sumitagrawl June 5, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants