Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: add replication lag to SHOW VIRTUAL CLUSTER results #120782

Merged
merged 2 commits into from Mar 25, 2024

Conversation

msbutler
Copy link
Collaborator

With this patch, SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS will display the replication lag.

Fixes #120647

Release note (sql change): SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS now displays the PCR replication lag.

@msbutler msbutler self-assigned this Mar 20, 2024
@msbutler msbutler requested review from a team as code owners March 20, 2024 20:10
@msbutler msbutler requested review from dt and stevendanna and removed request for a team March 20, 2024 20:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -305,6 +305,7 @@ var TenantColumnsWithReplication = ResultColumns{
{Name: "source_tenant_name", Typ: types.String},
{Name: "source_cluster_uri", Typ: types.String},
{Name: "replication_job_id", Typ: types.Int},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chatted with @alicia-l2 yesterday and we were we would remove replication_job_id to make space for the new lag column. She also thought it would be more intuitive to order it as retained_time, replicated_time, replication_lag, cutover_time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. Are we sure about this? For instance if you find your replication job PAUSED or FAILED, how confident are we that in this command we always show the relevant error output? I suppose SELECT * FROM [SHOW JOBS] WHERE job_type = 'STREAM INGESTION' isn't too hard. but it is a bit more than copy/pasting the job ID and doing SHOW JOB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how confident are we that in this command we always show the relevant error output

This sounds like something we should be on the hook to find/fix. Needing to go find the job ID in show jobs or the db console is slightly annoying sure, but if it is just something that we would be doing when debugging a bug and then we go fix the status update code, that seems like the right trade for the end-user UX being that we show them just the things that are relevant to them, which is the replication status?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One observation is that this table is much easier to read when it fits on one line. Once it wraps it is much harder to line up which timestamp your'e looking at with the headers. The job_id's cost, in line width and noise, to most users seems higher than the benefit give our not-job-centric UX here, so if it is just there for debugging by CRL engineers, I think we can say we're on the hook for finding it the slightly harder way to reclaim the line width.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the status column all the way to the right while we're here in case there is a long error in it?

I think we should. We'll need to reach out to preview users who have queries against this already, but that is fine.

(... in the with replication status version) and rename data_state to status and remove service_mode from this version while we're at it?

I think in the original command we were thinking that WITH would only add columns. But, if we are making breaking changes we shouldn't be shy.

Copy link
Member

@dt dt Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WITH would only add columns.

Right, we started with "just add columns", but I think we can see now in hands-on usage that just adding more columns doesn't actually get us the best UX.

Maybe we should just have our own SHOW command instead of WITH?

sadly I suspect SHOW VIRTUAL CLUSTER REPLICATION STATUS won't work since REPLICATION isn't reserved and that position can be a name?

I think we can just keep WITH ... but change the whole result cols, rather than just appending to the right.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll go ahead and rename data_state to status and remove service_mode only in with replication status

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll go ahead and rename data_state to status and remove service_mode only in with replication status

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pushed up another commit. if CI is pleased, I'll squash and merge.

pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-show-lag branch 2 times, most recently from 855bef3 to 81ba7e7 Compare March 20, 2024 21:14
pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
With this patch, SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS will display
the replication lag.

Fixes cockroachdb#120647

Release note (sql change): SHOW VIRTUAL CLUSTER ... WITH REPLICATION STATUS now
displays the PCR replication lag.
@msbutler msbutler force-pushed the butler-show-lag branch 2 times, most recently from 7143d60 to 68c3493 Compare March 21, 2024 14:08
@msbutler msbutler added the do-not-merge bors won't merge a PR with this label. label Mar 21, 2024
@dt
Copy link
Member

dt commented Mar 25, 2024

this looks like it needs a ./dev testlogic base --files tenant --rewrite

Epic: none

Release note (sql change): This patch refactors the output of SHOW VC WITH
REPLICATION STATUS to 1) remove the replication_job_id, service_mode return
fields; 2) reorder to following columns to: retained_time, replicated_time,
replication_lag, cutover_time; 3) rename the data_state field status and moves
to the last return column.
@msbutler
Copy link
Collaborator Author

fixed

@msbutler msbutler removed the do-not-merge bors won't merge a PR with this label. label Mar 25, 2024
@msbutler
Copy link
Collaborator Author

TFTRs!

bors r=dt

@craig
Copy link
Contributor

craig bot commented Mar 25, 2024

@craig craig bot merged commit 4cc2bbd into cockroachdb:master Mar 25, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PCR: add replication lag to SHOW ... STATUS
4 participants