sql: error out when control characters are detected in IDENT types#70627
sql: error out when control characters are detected in IDENT types#70627jackcwu wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
Making this pull request for now - still have some questions about backwards compatibility to ask in standup |
rafiss
left a comment
There was a problem hiding this comment.
just some small comments
pkg/sql/scanner/scan.go
Outdated
| } | ||
| } | ||
|
|
||
| for _, r := range buf { |
There was a problem hiding this comment.
hm why do we want this here instead of in scanIdent?
| {`$0`, "placeholder index must be between 1 and 65536"}, | ||
| {`$9223372036854775809`, "placeholder index must be between 1 and 65536"}, | ||
| {`B'123'`, `"2" is not a valid binary digit`}, | ||
| {string([]byte{34, 103, 111, 108, 97, 110, 103, 0x0, 34}), `control character found`}, |
There was a problem hiding this comment.
can we add a few test cases that don't use raw bytes, so it's a bit more readable?
for example:
{string("cat\ndog"), `control character found`},
There was a problem hiding this comment.
The string has to include the actual double quotes (representing the IDENT) so it'll have to be something like
"cat\ndog" actually not sure if the \n is still considered a control character inside the ````
Maybe"cat` + "\n" + `dog"
There was a problem hiding this comment.
then we could do either
{"\"cat\ndog\"", `control character found`},
or
{`"cat
dog"`), `control character found`},
RichardJCai
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jackcwu, @rafiss, and @RichardJCai)
pkg/sql/scanner/scan.go, line 552 at r1 (raw file):
for { ch := s.peek() //fmt.Println(ch, ch >= utf8.RuneSelf, ch >= 'A' && ch <= 'Z')
Unrelated but while you're touching this file can you remove this println?
pkg/sql/scanner/scan.go, line 566 at r1 (raw file):
s.pos++ } //fmt.Println("parsed: ", s.in[start:s.pos], isASCII, isLower)
Here too
|
Leaving this PR for later |
|
This is a nice bugfix but we've run into some issues about maintaining backwards compatibility if a user has already created an object with a control character in it's name. The only way to access that object would to be using the IDENT with the control character inside, but we want to make sure the user has an escape hatch to use that table even after this change. Using a cluster setting is tricky since we don't have access to Perhaps we could create a builtin to rename these kinds of objects after doing this change. |
|
We've had some discussion on this. Let's go ahead and do this change and not add any settings/gates for it, since it's such an obscure edge case and users will immediately be aware if they are affected. |
45499be to
7206e33
Compare
jackcwu
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jackcwu, @rafiss, and @RichardJCai)
pkg/sql/parser/scanner_test.go, line 358 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
then we could do either
{"\"cat\ndog\"", `control character found`},or
{`"cat dog"`), `control character found`},
added
Release note: None Release note (<category, see below>): <what> <show> <why>
7206e33 to
ac4c4d0
Compare
jackcwu
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @RichardJCai)
pkg/sql/scanner/scan.go, line 552 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
Unrelated but while you're touching this file can you remove this println?
Done.
pkg/sql/scanner/scan.go, line 566 at r1 (raw file):
Previously, RichardJCai (Richard Cai) wrote…
Here too
Done.
pkg/sql/scanner/scan.go, line 917 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
hm why do we want this here instead of in
scanIdent?
So @RichardJCai and I spent some time digging around and found that scanIdent() is only ever called on the cases where tokens are interpreted as idents without quotes, and there is already a check in that function to validate all characters. The change I made (since you made this comment, I added an if statement around it so prevent erroring out on single quotes) handles the cases where idents involving double quotes are parsed and have an invalid character inside
|
it looks like the test failure is from this indicates that it's checking for control characters in non-identifiers. hm, i'm not sure how to correctly detect whether the input is an identifer or not though. |
Release note: None
Resolves #26583