Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: add changefeed.emitted_batch_sizes metric #115537

Merged
merged 1 commit into from Dec 5, 2023

Conversation

jayshrivastava
Copy link
Contributor

This change introduces a new metric which is a histogram for emitted batch sizes. This metric will help debug any issues with batching when they arise.

This change also introduces a new histogram window type: DataCount16MBuckets. It has 24 buckets from 1 - 16M, exponentially distributed. Right now, the largest batch seen by changefeeds is 670k rows (parquet with 16MB file sizes). There was no previously existing histogram bucket which would be appropriate for measuring batches of this scale. It's possible to configure larger file sizes (ex. 128MB has been used by customers before). 16M is a generous upper bound to how large batches in changefeeds may be.

Release note (ops change): This change introduces the changefeed.emitted_batch_sizes histogram metric which measures the batch sizes used when emitting data to sinks. This metric supports metrics labels.

Closes: #114141

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jayshrivastava jayshrivastava marked this pull request as ready for review December 4, 2023 15:57
@jayshrivastava jayshrivastava requested review from a team as code owners December 4, 2023 15:57
@jayshrivastava jayshrivastava requested review from abarganier and miretskiy and removed request for a team December 4, 2023 15:57
EmittedBatchSizes: b.Histogram(metric.HistogramOptions{
Metadata: metaChangefeedEmittedBatchSizes,
Duration: histogramWindow,
MaxVal: 1 << 24, /* 16M max batch size */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to use 16e6 ?

@jayshrivastava jayshrivastava force-pushed the pubsub-obs branch 2 times, most recently from 810d668 to 0a271ad Compare December 4, 2023 18:49
This change introduces a new metric which is a histogram for emitted batch sizes. This
metric will help debug any issues with batching when they arise.

This change also introduces a new histogram window type: `DataCount16MBuckets`. It has 24
buckets from 1 - 16M, exponentially distributed. Right now, the largest batch seen by
changefeeds is 670k rows (parquet with 16MB file sizes). There was no previously existing
histogram bucket which would be appropriate for measuring batches of this scale. It's
possible to configure larger file sizes (ex. 128MB has been used by customers before).
16M is a generous upper bound to how large batches in changefeeds may be.

Release note (ops change): This change introduces the `changefeed.emitted_batch_sizes`
histogram metric which measures the batch sizes used when emitting data to sinks. This
metric supports metrics labels.

Closes: cockroachdb#114141
@jayshrivastava
Copy link
Contributor Author

bors r=miretskiy

@craig
Copy link
Contributor

craig bot commented Dec 5, 2023

Build succeeded:

@craig craig bot merged commit 8bcaae9 into cockroachdb:master Dec 5, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changefeedccl: add observability for pubsub batch sizes
3 participants