Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] kv: add environment variable to inject retries #54695

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ajwerner
Copy link
Contributor

This patch extends the testserver to inject retry errors in calls to send
batch requests at a rate indicated by an environment variable. This should
help us uncover bugs due to bad handling of transaction retries.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/inject-random-restart-errors branch from e0a3c10 to e76fbc3 Compare September 23, 2020 14:16
@ajwerner ajwerner marked this pull request as draft September 23, 2020 14:19
@ajwerner ajwerner force-pushed the ajwerner/inject-random-restart-errors branch from e76fbc3 to 0a68a74 Compare September 23, 2020 15:37
@ajwerner ajwerner changed the title [DNM] testserver: add environment variable to inject retries [DNM] kv: add environment variable to inject retries Sep 23, 2020
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 23, 2020
…ptors

Before this patch, if a restart occurred while trying to publish descriptors,
we would short-circuit publishing the descriptors on the subsequent attempt.

The approach was validated by stressing `TestBackupRestoreResume` with the
patch in cockroachdb#54695 which previously
reproduced the failure immediately.

Release note (bug fix): Fixed a bug whereby a transaction restart at the
wrong moment during a restore could leave descriptors offline after the
restore completed successfully.
This patch extends the testserver to inject retry errors in calls to send
batch requests at a rate indicated by an environment variable. This should
help us uncover bugs due to bad handling of transaction retries.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/inject-random-restart-errors branch from 0a68a74 to a53eb00 Compare September 23, 2020 21:14
craig bot pushed a commit that referenced this pull request Sep 24, 2020
54748: backupccl: fix bug due to transaction restart while publishing descri… r=ajwerner a=ajwerner

…ptors

Before this patch, if a restart occurred while trying to publish descriptors,
we would short-circuit publishing the descriptors on the subsequent attempt.

The approach was validated by stressing `TestBackupRestoreResume` with the
patch in #54695 which previously
reproduced the failure immediately.

Release note (bug fix): Fixed a bug whereby a transaction restart at the
wrong moment during a restore could leave descriptors offline after the
restore completed successfully.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
craig bot pushed a commit that referenced this pull request Sep 24, 2020
54261: ui: add transactions page to Admin UI r=dhartunian a=elkmaster

This commit adds the new Transactions Page to the Admin UI.

This page is much like the Statements Page but differs in that
it shows Transaction-level statistics in tabular and detail form.

Every Transaction is able to display its contained Statements
for more detailed analysis.

The page and its components are imported from the
`admin-ui-components` library.

Depends on: cockroachdb/yarn-vendored#38

Release justification: low-risk high impact addition to Admin UI

Release note (admin ui change): add Transactions and Transactions
details pages. These pages allow for viewing stats at the
transaction level.

54749: cli/demo: provision an initial non-`root` "demo" user  r=irfansharif a=knz

Fixes #54557.
Requested by @jseldess 

First 2 commits from   #54741 (can be ignored during review)

Release note (cli change): `cockroach demo` now pre-creates a `demo`
user account with a random password, instead of letting (and
encouraging) the user to use the `root` account directly.
The `demo` account is currently granted the `admin` role.

54755: sql: fix bug whereby backfiller would drop spans on txn restart r=ajwerner a=ajwerner

This bug was caught by testing with #54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.

Co-authored-by: Vlad <carrott9@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 28, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 28, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail
almost immediately, now it does not fail under stress. I'm open to suggestions
on how to more generally test this.

Release note (bug fix): Fixed a rare bug which can lead to index backfills
failing in the face of transaction restarts.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 29, 2020
…ptors

Before this patch, if a restart occurred while trying to publish descriptors,
we would short-circuit publishing the descriptors on the subsequent attempt.

The approach was validated by stressing `TestBackupRestoreResume` with the
patch in cockroachdb#54695 which previously
reproduced the failure immediately.

Release note (bug fix): Fixed a bug whereby a transaction restart at the
wrong moment during a restore could leave descriptors offline after the
restore completed successfully.
pbardea added a commit to pbardea/cockroach that referenced this pull request Oct 19, 2020
…lishing tables

Before this patch, if a restart occurred while trying to publish
descriptors, we would short-circuit publishing the descriptors on the
subsequent attempt.

The approach was validated by stressing TestBackupRestoreResume with the
patch in cockroachdb#54695 which previously reproduced the failure immediately.

Release note (bug fix): Fixed a bug whereby a transaction restart at the
wrong moment during a restore could leave descriptors offline after the
restore completed successfully.
@pbardea
Copy link
Contributor

pbardea commented Oct 30, 2020

I've been playing around with improving bulk jobs' testing around these txn retries and am curious if there's a path forward to getting something like this merged/what can be done to help our testing in this area. I think that a setting like this is useful. Do we have any worries about introducing this setting generally?

I played around with a very early prototype of introducing these kinds of retires specifically when we create new txns in bulk jobs: #56074, but it might be nice to get the more general testing that this provides.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants