Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdctest: perform txn retries in validator #43215

Merged
merged 4 commits into from
Dec 17, 2019

Conversation

andreimatei
Copy link
Contributor

Before this patch, fingerprintValidator.applyRowUpdate() was missing a
retry loop around its sql txn. But turns out that the txn was not needed
at all since we're running a single statement.

Fixes #42773

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit is #43200

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/cmd/roachtest/test.go, line 277 at r4 (raw file):

		// get any stack traces.
		if err, ok := args[0].(error); ok {
			msg = fmt.Sprintf("%+v", err)

@knz help me out here

I really want stack traces in these errors, but currently they look pretty bad (see example below). Without doing something about it, the output I think would be worse than the current one without the stack traces. Or perhaps you'll teach me to do something else.

For example, below is a pq: restart transaction: TransactionRetryWithProtoRefreshError wrapped a couple of times.
I'd like to see the messages of all the wrapping errors plus the message of the original TransactionRetryWithProtoRefreshError one on the first line, and then all the stack traces (perhaps with the respective messages again between them). I'd also get rid of the "error with attached stack trace:" lines; they add little.

printed with %s:

 monitor failure: monitor task failed: CDC failed: fingerprintValidator failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY): "sql txn" meta={id=4b231b10 key=/Table/54/1/500/0 pri=0.01371671 epo=0 ts=1576532273.889446872,1 min=1576532273.877883066,0 seq=2} rw=true stat=PENDING rts=1576532273.877883066,0 wto=false max=1576532273.946122576,0

printed with %+v:

error with attached stack trace:
		    main.(*monitor).WaitE
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2157
		    main.(*monitor).Wait
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2165
		    main.runCDCBank
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:378
		    main.registerCDC.func8
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:649
		    main.(*testRunner).runTest.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:700
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2210
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - monitor task failed:
		  - error with attached stack trace:
		    main.runCDCBank.func2.1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:307
		    main.runCDCBank.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:363
		    main.(*monitor).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2145
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - CDC failed:
		  - pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY): "sql txn" meta={id=33aa1def key=/Table/54/1/994/0 pri=0.02000490 epo=0 ts=1576533090.201346685,1 min=1576533090.155239137,0 seq=2} rw=true stat=PENDING rts=1576533090.155239137,0 wto=false max=1576533090.201346685,0
		    fingerprintValidator failed
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).applyRowUpdate.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:351
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).applyRowUpdate
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:430
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:484
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.Validators.NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:548
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*CountValidator).NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:594
		    main.runCDCBank.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:362
		    main.(*monitor).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2145
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357

Can you help a brother out? Or would you rather I figure it out myself?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need this:

diff --git a/pkg/cmd/roachtest/cluster_test.go b/pkg/cmd/roachtest/cluster_test.go
index 2d04694bd7..cc4075210e 100644
--- a/pkg/cmd/roachtest/cluster_test.go
+++ b/pkg/cmd/roachtest/cluster_test.go
@@ -20,7 +20,7 @@ import (
        "time"

        "github.com/cockroachdb/cockroach/pkg/testutils"
-       "github.com/pkg/errors"
+       "github.com/cockroachdb/errors"
        "github.com/stretchr/testify/require"
 )

@@ -148,7 +148,7 @@ func TestClusterMonitor(t *testing.T) {

                // If wait terminates, context gets canceled.
                err := m.wait(`true`)
-               if err != context.Canceled {
+               if !errors.Is(err, context.Canceled) {
                        t.Errorf(`expected context canceled, got: %+v`, err)
                }
        })

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/cmd/roachtest/test.go, line 277 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

@knz help me out here

I really want stack traces in these errors, but currently they look pretty bad (see example below). Without doing something about it, the output I think would be worse than the current one without the stack traces. Or perhaps you'll teach me to do something else.

For example, below is a pq: restart transaction: TransactionRetryWithProtoRefreshError wrapped a couple of times.
I'd like to see the messages of all the wrapping errors plus the message of the original TransactionRetryWithProtoRefreshError one on the first line, and then all the stack traces (perhaps with the respective messages again between them). I'd also get rid of the "error with attached stack trace:" lines; they add little.

printed with %s:

 monitor failure: monitor task failed: CDC failed: fingerprintValidator failed: pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY): "sql txn" meta={id=4b231b10 key=/Table/54/1/500/0 pri=0.01371671 epo=0 ts=1576532273.889446872,1 min=1576532273.877883066,0 seq=2} rw=true stat=PENDING rts=1576532273.877883066,0 wto=false max=1576532273.946122576,0

printed with %+v:

error with attached stack trace:
		    main.(*monitor).WaitE
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2157
		    main.(*monitor).Wait
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2165
		    main.runCDCBank
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:378
		    main.registerCDC.func8
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:649
		    main.(*testRunner).runTest.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:700
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2210
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - monitor task failed:
		  - error with attached stack trace:
		    main.runCDCBank.func2.1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:307
		    main.runCDCBank.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:363
		    main.(*monitor).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2145
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357
		  - CDC failed:
		  - pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY): "sql txn" meta={id=33aa1def key=/Table/54/1/994/0 pri=0.02000490 epo=0 ts=1576533090.201346685,1 min=1576533090.155239137,0 seq=2} rw=true stat=PENDING rts=1576533090.155239137,0 wto=false max=1576533090.201346685,0
		    fingerprintValidator failed
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).applyRowUpdate.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:351
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).applyRowUpdate
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:430
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*fingerprintValidator).NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:484
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.Validators.NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:548
		    github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest.(*CountValidator).NoteResolved
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl/cdctest/validator.go:594
		    main.runCDCBank.func2
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:362
		    main.(*monitor).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2145
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/Users/andrei/work/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/Users/andrei/goroot/src/runtime/asm_amd64.s:1357

Can you help a brother out? Or would you rather I figure it out myself?

my 2c is that I want the order of the errors reversed but I don't need the message grouping.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need this:

Done, thanks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @knz)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/cmd/roachtest/test.go, line 277 at r4 (raw file):

I'd like to see [...]

If you file an issue on cockroachdb/errors with an example of your desired output (like, copy-paste the error above and re-arrange it in the way you want it to look like) I can see what I can do. Ben suggested that we (I) invest in improving this anyway.

One way I'm seeing this move forward is to have %+v print a "summary" that includes the things that you prefer to see in the common case (like your suggestion above) and keep %#v for the more verbose report, or something like that.

Wrap and attach stack traces to various CDC testing code.

Release note: None
It takes a while and it's nice to see in the UI.

Release note: None
Before this patch, when an error was passed to t.Fatal(err), it was
formatted with "%s". Now it will be formatted with "%+v", which includes
stack traces. Without the stack trace, it's often hard to say where an
error was generated.
Note that a succinct stack for the t.Fatal() call was already printed,
but that's different from the error's origin.

Release note: None
Before this patch, fingerprintValidator.applyRowUpdate() was missing a
retry loop around its sql txn. But turns out that the txn was not needed
at all since we're running a single statement.

Fixes cockroachdb#42773

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/cmd/roachtest/test.go, line 277 at r4 (raw file):

Previously, knz (kena) wrote…

I'd like to see [...]

If you file an issue on cockroachdb/errors with an example of your desired output (like, copy-paste the error above and re-arrange it in the way you want it to look like) I can see what I can do. Ben suggested that we (I) invest in improving this anyway.

One way I'm seeing this move forward is to have %+v print a "summary" that includes the things that you prefer to see in the common case (like your suggestion above) and keep %#v for the more verbose report, or something like that.

ok, we'll take it to a separate issue

craig bot pushed a commit that referenced this pull request Dec 17, 2019
43215: cdctest: perform txn retries in validator r=andreimatei a=andreimatei

Before this patch, fingerprintValidator.applyRowUpdate() was missing a
retry loop around its sql txn. But turns out that the txn was not needed
at all since we're running a single statement.

Fixes #42773

43250: cmd/roachtest: improve sqlsmith r=mjibson a=mjibson



Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
@craig
Copy link
Contributor

craig bot commented Dec 17, 2019

Build succeeded

@craig craig bot merged commit 4627894 into cockroachdb:master Dec 17, 2019
craig bot pushed a commit that referenced this pull request Dec 18, 2019
43304: release-19.2: cdctest: perform txn retries in validator r=andreimatei a=andreimatei

Backport 4/4 commits from #43215.

/cc @cockroachdb/release

---

Before this patch, fingerprintValidator.applyRowUpdate() was missing a
retry loop around its sql txn. But turns out that the txn was not needed
at all since we're running a single statement.

Fixes #42773


Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@andreimatei andreimatei deleted the cdc.validator branch January 7, 2020 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: cdc/bank failed
4 participants