-
Notifications
You must be signed in to change notification settings - Fork 477
Add new section on read from standby feature #20502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new section on read from standby feature #20502
Conversation
Added new pages and added them to the TOC
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
Files changed:
|
❌ Deploy Preview for cockroachdb-docs failed. Why did it fail? →
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
Fixed broken links
…om:cockroachdb/docs into 2025-10-01-doc-13854-add-read-from-standby
Build was failing because summary was too short
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice start!
| SELECT region, SUM(amount) FROM orders GROUP BY region; | ||
| ~~~ | ||
|
|
||
| The results of queries on the standby cluster reflect the state of the primary cluster as of the replicated time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i think a more accurate way to say this is: "reads are always served at a historical time approaches the replicated time." (its not quite the replicated time currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be accurate to word this as "The results of queries on the standby cluster reflect the state of the primary cluster as of a historical time that approaches the replicated time"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler is this due to the lag between primary and standby AND readervc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peachdawnleach yeah i like that language.
@alicia-l2 the lag between the AOST the user provides and the AOST actually used is due to a very annoying technical bug i really want to address. It has to do with the fact that the reader tenant descriptors are updated after the replicated data comes in on the replicating tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msbutler do we have a github item for this bug? imo we should get this in for 26.1 if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would market this as a "known limitation" :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this line to the doc
| The output provides the following information: | ||
| - the replication status of the standby cluster | ||
| - the timestamp of the most recently applied event on the standby cluster | ||
| - any lag relative to the primary cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to see the replicated time used by reader queries specifically, we could point users to the resolved time posted by the standby poller job on the reader vc. maybe that's a follow up item for this page cc @alicia-l2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved time posted by the standby poller job on the reader vc.
how does one get to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run SHOW JOBS on the reader tenant. Or view the job on the db console of the reader tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this language work here?
"For the actual replicated time of a specific query on the ReaderVC, find the resolved time posted by the standby poller job on the ReaderVC. You can find this information by viewing the job on the DB Console, or by running SHOW JOBS on the ReaderVC."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"For the actual replicated time of a specific query on the ReaderVC, find the resolved time posted by the standby poller job on the ReaderVC. You can find this information by viewing the job on the DB Console, or by running
SHOW JOBSon the ReaderVC."
i don't think that's quite right either.
I would be in favor of removing this "Monitor replication lag" section all together because: monitoring pcr lag is a separate user flow compared to monitoring the reader tenant workflow. It feels a bit redundant to explain SHOW VIRTUAL CLUSTER WITH REPLICATION STATUS here and over here.
Above, where you write "historical time that approaches the replicated time.", you could instead link "replicated time" to the show tenant with replication status page here..
I will sync with alicia on the correct ux for monitoring reader tenant queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Can we remove 'poller' though? That's an internal term.
Can we also say "Standby cluster's DB console?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alicia-l2 what do you think of my point about removing the "monitor replication lag" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, tbh I think that once we address that bug you were talking about we can then fix this. I'm fine removing. @peachdawnleach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great start! one thing to note is that we probably also have to edit the sql syntax pages as well for ALTER VIRTUAL CLUSTERhttps://deploy-preview-20502--cockroachdb-docs.netlify.app/docs/v25.3/alter-virtual-cluster
|
|
||
| ## How the read from standby feature works | ||
|
|
||
| PCR utilizes cluster virtualization to separate clusters' control planes from their data planes. A cluster always has one control plane, called a _system virtual cluster (SystemVC)_, and at least one data plane, called an _App Virtual Cluster (AppVC)_. A cluster's SystemVC manages PCR jobs and cluster metadata, and is not used for application queries. All data tables, system tables, and cluster settings in the standby cluster's AppVC are identical to the primary cluster's AppVC. The standby cluster's AppVC itself remains offline during replication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standby cluster's AppVC itself remains offline during replication.
we should say the reason, @msbutler can you help here?
Small changes based on tech review
| SELECT region, SUM(amount) FROM orders GROUP BY region; | ||
| ~~~ | ||
|
|
||
| The results of queries on the standby cluster reflect the state of the primary cluster as of the replicated time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peachdawnleach yeah i like that language.
@alicia-l2 the lag between the AOST the user provides and the AOST actually used is due to a very annoying technical bug i really want to address. It has to do with the fact that the reader tenant descriptors are updated after the replicated data comes in on the replicating tenant.
| The output provides the following information: | ||
| - the replication status of the standby cluster | ||
| - the timestamp of the most recently applied event on the standby cluster | ||
| - any lag relative to the primary cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run SHOW JOBS on the reader tenant. Or view the job on the db console of the reader tenant.
Additional changes from review
Small wording changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm just some tiny nits - thanks!
| The output provides the following information: | ||
| - the replication status of the standby cluster | ||
| - the timestamp of the most recently applied event on the standby cluster | ||
| - any lag relative to the primary cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Can we remove 'poller' though? That's an internal term.
Can we also say "Standby cluster's DB console?"
Removed monitor replication lag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, non-blocking comments to take or leave
Added link
Added links
Added see also links
Addresses: DOC-13854
Adding information about the new read from standby feature in pcr