Skip to content

Persist written records' latency as histogram in commit metadata #14766

@hudi-bot

Description

@hudi-bot

As a follow-up enhancement to latency and freshness metrics, this is to persist latencies of a batch of records as a histogram in the commit metadata. This is to help implement watermarks and facilitate stream-stream joins.

JIRA info


Comments

04/Mar/21 02:39;xushiyan;Some previous implementation notes from PR for HUDI-1587

  • consider supporting this feature in OverwriteWithLatestAvroPayload, currently it's only available if configured to use DefaultHoodieRecordPayload
  • to make histogram persisted, avro schema for commit metadata needs to be updated, as well as its facilitating java class.
  • it's better to re-classify this as commit metadata instead of metrics. Commit metadata can be chosen to emit as metrics.
  • some notes from the [email discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E] by [~vinoth]
    ** If we can keep the time interval (i.e the 1 min) configurable and also
    encode it along with the histogram,
    we can control the storage footprint better. May be also consider using
    something like t-digest for histogram?;;;

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:writerWrite client and core write operationsfrom-jirapriority:highSignificant impact; potential bugstype:improvementImprovements to existing functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions