New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: add nil checking for avroDataRecord.refreshTypeMetadata #119639
Conversation
Do we prefer to add nil check before calling refreshMetadata instead? That seems to be the convention way of handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if nil checking is the best approach here. I believe the bug in the issue was caused by regional by row tables and the hidden region
column getting encoded. We should investigate why it happens more in depth and avoid the code path which causes the NPE if possible
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/ccl/changefeedccl/avro.go
line 1055 at r1 (raw file):
// The only user-defined type is enum, so this is usually a no-op. func (r *avroDataRecord) refreshTypeMetadata(row cdcevent.Row) error { if r == nil {
I would do the alternative you identified and move the nil check to the call site.
Previously, rharding6373 (Rachael Harding) wrote…
Done. |
I agree we should keep the investigation going on. My patch here just adds the nil check. The issue being closed here is only for adding the nil check. I think registered.schema.after is expected to be nil here since it is only set for wrapped envelope
|
I looked into this more with @andyyang890. The reason why we hit this cache is because three parallel encoders were started in
We thought it might be useful to add another test case here to force one encoder to make sure we are triggering the code path involving the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Wenyi that after
being nil seems to be expected here since it's only expected to be not nil when the envelope type is wrapped
. From some git archaeology, it looks like initially the after
field was populated for all envelope types and when the change was made to move the data from after
into the record
field for non-wrapped
envelope types, that assumption wasn't corrected. Here's the relevant commit: c00bf0f
I think adding the nil check should be enough to resolve this issue.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @wenyihu6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into the bottom of this issue Wenyi and Andy! And thanks for fixing it so quickly. Added backport labels.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rharding6373 and @wenyihu6)
pkg/ccl/changefeedccl/encoder_avro.go
line 223 at r2 (raw file):
if ok { registered = v.(confluentRegisteredEnvelopeSchema) if registered.schema.after != nil {
I think we'll also want to add a check for registered.schema.record
since that seems to match the original intent of this code. Something like:
- if err := registered.schema.after.refreshTypeMetadata(updatedRow); err != nil {
- return nil, err
+ if registered.schema.after != nil {
+ if err := registered.schema.after.refreshTypeMetadata(updatedRow); err != nil {
+ return nil, err
+ }
+ }
+ if registered.schema.record != nil {
+ if err := registered.schema.record.refreshTypeMetadata(updatedRow); err != nil {
+ return nil, err
+ }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rharding6373 and @wenyihu6)
pkg/ccl/changefeedccl/encoder_test.go
line 1269 at r3 (raw file):
cluster, db, cleanup := startTestCluster(t) defer cleanup() if rand.Intn(2) == 0 {
I think I'd prefer to see something like testutils.RunTrueAndFalse
instead so that we always run both cases.
cd8f043
to
8df6cd3
Compare
Previously, andyyang890 (Andy Yang) wrote…
Done. |
Previously, andyyang890 (Andy Yang) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/encoder_avro.go
line 150 at r4 (raw file):
if ok { registered = v.(confluentRegisteredKeySchema) if registered.schema != nil {
I'm not sure we should have this check. This field should never be nil and if it somehow was, the code later in this function will panic anyway.
pkg/ccl/changefeedccl/encoder_test.go
line 1266 at r4 (raw file):
} for _, test := range tests { testutils.RunTrueAndFalse(t, test.format, func(t *testing.T, overrideWithSingleWorker bool) {
You probably still want an outer t.Run(test.format ...
and then this inner call should be testutils.RunTrueAndFalse(t, "overrideWithSingleWorker", ...
.
Previously, andyyang890 (Andy Yang) wrote…
Ack. Added as a safe check. I will remove it. |
Previously, andyyang890 (Andy Yang) wrote…
I think cockroach/pkg/testutils/subtest.go Line 17 in ac63a65
|
Previously, wenyihu6 (Wenyi Hu) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/encoder_test.go
line 1266 at r4 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I think
RunTrueAndFalse
lets you pass the test name here? Do you want something else other than test.format here?cockroach/pkg/testutils/subtest.go
Line 17 in ac63a65
func RunTrueAndFalse[T testingTB[T]](t T, name string, fn func(t T, b bool)) {
The subtest name passed to RunTrueAndFalse
should be the thing you're toggling. You're not toggling wrapped
to true
and false
. If you do what I suggested, the subtest names will be something like TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=false
and TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=true
.
Previously, andyyang890 (Andy Yang) wrote…
Ack. I changed the names to the following. Is this better?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @rharding6373, and @wenyihu6)
pkg/ccl/changefeedccl/encoder_test.go
line 1266 at r4 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Ack. I changed it to the following. Is this better?
--- PASS: TestAvroWithRegionalTable (10.57s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=false (5.21s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=false/wrapped (1.69s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=false/bare (1.76s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=false/key_only (1.76s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=true (5.25s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=true/wrapped (1.72s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=true/bare (1.80s) --- PASS: TestAvroWithRegionalTable/overrideWithSingleWorker=true/key_only (1.73s)
At the risk of excessive bikshedding, I'd prefer to see envelope
type first (e.g. TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=false
) because to me that feels like the natural hierarchical order of things (i.e. each test case is testing a type of envelope
and the overrideWithSingleWorker
argument is a modifier of that test).
This also has the positive side effect of allowing the for loop to be closer (in nesting depth) to the data structure it's iterating over. In fact, you could have something like:
for _, test := range []struct {
envelope string
payload []string
}{
// test cases
}{
t.Run(test.envelope, ...
}
pkg/ccl/changefeedccl/encoder_test.go
line 1236 at r6 (raw file):
tests := []struct { format string
nit: rename this to envelope
/envelopeType
to match the name of the option it's setting
pkg/ccl/changefeedccl/encoder_test.go
line 1270 at r6 (raw file):
defer cleanup() if overrideWithSingleWorker { // Run the test with one and three(default) workers to test both the
nit: this comment should be moved to be on top of the testutils.RunTrueAndFalse
line
Previously, the avro encoder could call `refreshTypeMetadata` on `avroDataRecord` without proper nil checking. This could lead to node panics because `avroDataRecord` could sometimes be nil. For example, `registered.schema.after` is set only when using the wrapped envelope. Thus, avro encoder could lead to panics when using with other envelope formats. This patch addresses this issue by adding a defensive nil check when invoking `refreshTypeMetadata`. Fixes: cockroachdb#119428 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andyyang890, @jayshrivastava, and @rharding6373)
pkg/ccl/changefeedccl/encoder_test.go
line 1266 at r4 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
At the risk of excessive bikshedding, I'd prefer to see
envelope
type first (e.g.TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=false
) because to me that feels like the natural hierarchical order of things (i.e. each test case is testing a type ofenvelope
and theoverrideWithSingleWorker
argument is a modifier of that test).This also has the positive side effect of allowing the for loop to be closer (in nesting depth) to the data structure it's iterating over. In fact, you could have something like:
for _, test := range []struct { envelope string payload []string }{ // test cases }{ t.Run(test.envelope, ... }
Ack. It now prints the following. Is this better?
--- PASS: TestAvroWithRegionalTable (10.83s)
--- PASS: TestAvroWithRegionalTable/wrapped (3.69s)
--- PASS: TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=false (1.94s)
--- PASS: TestAvroWithRegionalTable/wrapped/overrideWithSingleWorker=true (1.75s)
--- PASS: TestAvroWithRegionalTable/bare (3.50s)
--- PASS: TestAvroWithRegionalTable/bare/overrideWithSingleWorker=false (1.79s)
--- PASS: TestAvroWithRegionalTable/bare/overrideWithSingleWorker=true (1.71s)
--- PASS: TestAvroWithRegionalTable/key_only (3.53s)
--- PASS: TestAvroWithRegionalTable/key_only/overrideWithSingleWorker=false (1.78s)
--- PASS: TestAvroWithRegionalTable/key_only/overrideWithSingleWorker=true (1.75s)
pkg/ccl/changefeedccl/encoder_test.go
line 1236 at r6 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: rename this to
envelope
/envelopeType
to match the name of the option it's setting
Done.
pkg/ccl/changefeedccl/encoder_test.go
line 1270 at r6 (raw file):
Previously, andyyang890 (Andy Yang) wrote…
nit: this comment should be moved to be on top of the
testutils.RunTrueAndFalse
line
Done.
I think I've resolved all comments above. I will go ahead and merge this, but lmk if you have any other comments. Thanks for the thorough PR review! bors r=andyyang890, rharding6373 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava and @rharding6373)
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from e33ed5d to blathers/backport-release-23.1-119639: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, the avro encoder could call
refreshTypeMetadata
onavroDataRecord
without proper nil checking. This could lead to node panicsbecause
avroDataRecord
could sometimes be nil. For example,registered.schema.after
is set only when using the wrapped envelope. Thus,avro encoder could lead to panics when using with other envelope formats. This
patch addresses this issue by adding a defensive nil check when invoking
refreshTypeMetadata
.Fixes: #119428
Release note: None