Skip to content

sql: error out when control characters are detected in IDENT types#70627

Open
jackcwu wants to merge 1 commit intocockroachdb:masterfrom
jackcwu:ctrl_char_error
Open

sql: error out when control characters are detected in IDENT types#70627
jackcwu wants to merge 1 commit intocockroachdb:masterfrom
jackcwu:ctrl_char_error

Conversation

@jackcwu
Copy link
Copy Markdown
Contributor

@jackcwu jackcwu commented Sep 23, 2021

Release note: None

Resolves #26583

@jackcwu jackcwu requested review from a team, RichardJCai and rafiss September 23, 2021 17:02
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jackcwu
Copy link
Copy Markdown
Contributor Author

jackcwu commented Sep 23, 2021

Making this pull request for now - still have some questions about backwards compatibility to ask in standup

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some small comments

}
}

for _, r := range buf {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm why do we want this here instead of in scanIdent?

{`$0`, "placeholder index must be between 1 and 65536"},
{`$9223372036854775809`, "placeholder index must be between 1 and 65536"},
{`B'123'`, `"2" is not a valid binary digit`},
{string([]byte{34, 103, 111, 108, 97, 110, 103, 0x0, 34}), `control character found`},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a few test cases that don't use raw bytes, so it's a bit more readable?

for example:

{string("cat\ndog"), `control character found`},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string has to include the actual double quotes (representing the IDENT) so it'll have to be something like
"cat\ndog" actually not sure if the \n is still considered a control character inside the ````

Maybe"cat` + "\n" + `dog"

Copy link
Copy Markdown
Collaborator

@rafiss rafiss Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we could do either

{"\"cat\ndog\"", `control character found`},

or

{`"cat
dog"`), `control character found`},

Copy link
Copy Markdown
Contributor

@RichardJCai RichardJCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jackcwu, @rafiss, and @RichardJCai)


pkg/sql/scanner/scan.go, line 552 at r1 (raw file):

	for {
		ch := s.peek()
		//fmt.Println(ch, ch >= utf8.RuneSelf, ch >= 'A' && ch <= 'Z')

Unrelated but while you're touching this file can you remove this println?


pkg/sql/scanner/scan.go, line 566 at r1 (raw file):

		s.pos++
	}
	//fmt.Println("parsed: ", s.in[start:s.pos], isASCII, isLower)

Here too

@jackcwu
Copy link
Copy Markdown
Contributor Author

jackcwu commented Sep 23, 2021

Leaving this PR for later

@RichardJCai
Copy link
Copy Markdown
Contributor

This is a nice bugfix but we've run into some issues about maintaining backwards compatibility if a user has already created an object with a control character in it's name.

The only way to access that object would to be using the IDENT with the control character inside, but we want to make sure the user has an escape hatch to use that table even after this change.

Using a cluster setting is tricky since we don't have access to SETTINGS right now in scan.go where we produce the error. We would have to refactor a lot to make this possible.

Perhaps we could create a builtin to rename these kinds of objects after doing this change.

@rafiss
Copy link
Copy Markdown
Collaborator

rafiss commented Sep 29, 2021

We've had some discussion on this. Let's go ahead and do this change and not add any settings/gates for it, since it's such an obscure edge case and users will immediately be aware if they are affected.

Copy link
Copy Markdown
Contributor Author

@jackcwu jackcwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jackcwu, @rafiss, and @RichardJCai)


pkg/sql/parser/scanner_test.go, line 358 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

then we could do either

{"\"cat\ndog\"", `control character found`},

or

{`"cat
dog"`), `control character found`},

added

Release note: None

Release note (<category, see below>): <what> <show> <why>
Copy link
Copy Markdown
Contributor Author

@jackcwu jackcwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @RichardJCai)


pkg/sql/scanner/scan.go, line 552 at r1 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

Unrelated but while you're touching this file can you remove this println?

Done.


pkg/sql/scanner/scan.go, line 566 at r1 (raw file):

Previously, RichardJCai (Richard Cai) wrote…

Here too

Done.


pkg/sql/scanner/scan.go, line 917 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

hm why do we want this here instead of in scanIdent?

So @RichardJCai and I spent some time digging around and found that scanIdent() is only ever called on the cases where tokens are interpreted as idents without quotes, and there is already a check in that function to validate all characters. The change I made (since you made this comment, I added an if statement around it so prevent erroring out on single quotes) handles the cases where idents involving double quotes are parsed and have an invalid character inside

@rafiss
Copy link
Copy Markdown
Collaborator

rafiss commented Oct 5, 2021

it looks like the test failure is from

../../sql/logictest/testdata/logic_test/computed:987:
expected success, but found
(42601) lexical error: control character found


statement ok
SET experimental_computed_column_rewrites = "
  (ts::STRING) -> ((ts AT TIME ZONE 'utc')::STRING),
  (mod(fnv32(b::STRING), 4)) -> (mod(fnv32(b), 4)),
  (str::TIMESTAMP) -> (parse_timestamp(str))
"

this indicates that it's checking for control characters in non-identifiers. hm, i'm not sure how to correctly detect whether the input is an identifer or not though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: CockroachDB accepts invalid characters in identifiers

4 participants