-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] kv: add environment variable to inject retries #54695
base: master
Are you sure you want to change the base?
[DNM] kv: add environment variable to inject retries #54695
Conversation
e0a3c10
to
e76fbc3
Compare
e76fbc3
to
0a68a74
Compare
…ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
This patch extends the testserver to inject retry errors in calls to send batch requests at a rate indicated by an environment variable. This should help us uncover bugs due to bad handling of transaction retries. Release note: None
0a68a74
to
a53eb00
Compare
54748: backupccl: fix bug due to transaction restart while publishing descri… r=ajwerner a=ajwerner …ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in #54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
54261: ui: add transactions page to Admin UI r=dhartunian a=elkmaster This commit adds the new Transactions Page to the Admin UI. This page is much like the Statements Page but differs in that it shows Transaction-level statistics in tabular and detail form. Every Transaction is able to display its contained Statements for more detailed analysis. The page and its components are imported from the `admin-ui-components` library. Depends on: cockroachdb/yarn-vendored#38 Release justification: low-risk high impact addition to Admin UI Release note (admin ui change): add Transactions and Transactions details pages. These pages allow for viewing stats at the transaction level. 54749: cli/demo: provision an initial non-`root` "demo" user r=irfansharif a=knz Fixes #54557. Requested by @jseldess First 2 commits from #54741 (can be ignored during review) Release note (cli change): `cockroach demo` now pre-creates a `demo` user account with a random password, instead of letting (and encouraging) the user to use the `root` account directly. The `demo` account is currently granted the `admin` role. 54755: sql: fix bug whereby backfiller would drop spans on txn restart r=ajwerner a=ajwerner This bug was caught by testing with #54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts. Co-authored-by: Vlad <carrott9@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
…ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
…lishing tables Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing TestBackupRestoreResume with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
I've been playing around with improving bulk jobs' testing around these txn retries and am curious if there's a path forward to getting something like this merged/what can be done to help our testing in this area. I think that a setting like this is useful. Do we have any worries about introducing this setting generally? I played around with a very early prototype of introducing these kinds of retires specifically when we create new txns in bulk jobs: #56074, but it might be nice to get the more general testing that this provides. |
This patch extends the testserver to inject retry errors in calls to send
batch requests at a rate indicated by an environment variable. This should
help us uncover bugs due to bad handling of transaction retries.
Release note: None