As a follow-up enhancement to latency and freshness metrics, this is to persist latencies of a batch of records as a histogram in the commit metadata. This is to help implement watermarks and facilitate stream-stream joins.
JIRA info
Comments
04/Mar/21 02:39;xushiyan;Some previous implementation notes from PR for HUDI-1587
- consider supporting this feature in OverwriteWithLatestAvroPayload, currently it's only available if configured to use DefaultHoodieRecordPayload
- to make histogram persisted, avro schema for commit metadata needs to be updated, as well as its facilitating java class.
- it's better to re-classify this as commit metadata instead of metrics. Commit metadata can be chosen to emit as metrics.
- some notes from the [email discussion|https://lists.apache.org/thread.html/r328a6ad2e51ed936dfd955d65809ea09232ad47044497d04d8c751ea%40%3Cdev.hudi.apache.org%3E] by [~vinoth]
** If we can keep the time interval (i.e the 1 min) configurable and also
encode it along with the histogram,
we can control the storage footprint better. May be also consider using
something like t-digest for histogram?;;;
As a follow-up enhancement to latency and freshness metrics, this is to persist latencies of a batch of records as a histogram in the commit metadata. This is to help implement watermarks and facilitate stream-stream joins.
JIRA info
Comments
04/Mar/21 02:39;xushiyan;Some previous implementation notes from PR for HUDI-1587
** If we can keep the time interval (i.e the 1 min) configurable and also
encode it along with the histogram,
we can control the storage footprint better. May be also consider using
something like t-digest for histogram?;;;