-
Notifications
You must be signed in to change notification settings - Fork 476
Changefeed monitoring guide #19296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changefeed monitoring guide #19296
Conversation
Files changed:
|
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
f41db60 to
8db2635
Compare
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify site configuration. |
edad6e5 to
1cd9186
Compare
rohan-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments - really appreciate all of this!
| - [Sink errors over time](#sink-errors) | ||
| - [Retry counts](#downstream-delivery) | ||
|
|
||
| ## Common troubleshooting scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some thinking - let's remove the common troubleshooting scenario section, including all the subsections (high end to end latency, rangefeed pressure, sink perf issues).
| - Scoped by `changefeed_job_id` | ||
| - Supported Versions: v23.2.13+, v24.1.6+, v24.2.4+, v24.3.0+ | ||
|
|
||
| ## Suggested dashboards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this section entirely
| - Resource usage during catch-up after restarts. | ||
| - Supported Versions: v23.2.3+, v24.1.0+ | ||
|
|
||
| ### End-to-end performance metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bc we want to limit the verbosity of metrics to readers, let's remove the entire end to end performance metrics section
|
@rohan-joshi I've removed the requested sections. When we chatted, you mentioned keeping the diagram as is — is that still the case? I suppose the end-to-end component of that could be a source of confusion if we're not calling it out anywhere. Happy to update the diagram, if you think best. Let me know! (p.s. I do think we actually mention most of the end-to-end metrics elsewhere in the cdc docs) |
|
@rohan-joshi Just a friendly ping on the question above! |
|
@kathancox yikes. Sorry for not responding. I'd rather not hide the subcomponents |
|
OK thanks! @rohan-joshi, @asg0451 any other changes you want me to make, before I push to a docs writer for a final review? |
rohan-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
|
||
| ### High-level performance metrics | ||
|
|
||
| - Metric: `(now() - changefeed.checkpoint_progress)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace this with changefeed.max_behind_nanos, which is now scoped in the next backport releases
|
|
||
| #### Batch latency | ||
|
|
||
| - Metric: `changefeed.sink_batch_hist_nanos` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may need to note that some names differ in the datadog integration. eg losing the _nanos suffix, and in prometheus they replace dots with underscores, etc. applicable for all metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
I also just realized the "Supported versions" bullet for each metric doesn't make sense in versioned docs. I'm going to backport the page to the relevant versions and then have that bullet point list the correct version. |
|
@asg0451 I've worked on some of your feedback in the latest commit. However, it looks like your feedback preceded the edits I made for Rohan, which removed a lot of content. Can you PTAL and see if the change to |
|
In a Slack conversation, I suggested + confirmed the following changes, which would resolve any discrepancy in feedback @rohan-joshi / @asg0451:
@asg0451 @rohan-joshi These are now implemented. Kept both I'll push to a docs review later this afternoon (3/13) Thanks both! |
rmloveland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, non-blocking comment is i don't think v25.1 docs are the right place to litigate all this per-metric old supported versions info but i'm guessing there are $reasons so .. LGTM
Yes, you're right! My mistake, I removed from the v24.3 docs and then forgot to remove from the v25.1 docs. |
849c82e to
c81d7f0
Compare
|
yay |
Fixes DOC-11998
This PR adds a guide for monitoring changefeeds, particularly as a pipeline. Includes recommended metrics, information on potential impact of high values, and suggested dashboards.
Preview
https://deploy-preview-19296--cockroachdb-docs.netlify.app/docs/v25.1/changefeed-monitoring-guide.html