New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plpgsql: misc fixes for parsing and formatting identifiers and constants #119034
Conversation
Found these while working on #106368 |
e559395
to
ea953ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)
pkg/sql/plpgsql/parser/plpgsql.y
line 666 at r1 (raw file):
| stmt_while { $$.val = $1.statement()
Do we not have tests for this case?
pkg/sql/scanner/plpgsql_scan.go
line 147 at r2 (raw file):
} // scanString is similar to Scanner.scanString, but uses PL/pgSQL tokens.
Are the tokens the only difference? I wonder if there's a clever way we can combine the two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
pkg/sql/scanner/plpgsql_scan.go
line 26 at r2 (raw file):
// PLpgSQLScanner is a scanner with a PLPGSQL specific scan function type PLpgSQLScanner struct { Scanner
nit: do you think it'd be worth making the currently-embedded Scanner
object actually a named variable so that we could see more easily what s.scanString
actually refers to (i.e. it's PLpgSQLScanner.scanString
or Scanner.scanString
)?
pkg/sql/scanner/plpgsql_scan.go
line 147 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Are the tokens the only difference? I wonder if there's a clever way we can combine the two functions.
I'm confused - I just copied the contents of scanString
from this PR and replaced Scanner.scanString
with them, and there were effectively no differences. What are the different parts? Or if the code exactly the same, and the only difference is other s.scan*
methods that get called by this code that was copied?
pkg/sql/scanner/plpgsql_scan.go
line 73 at r3 (raw file):
return case 'e', 'E':
Interesting that Scanner.Scan
has this code block, so I have the same question as Marcus: can both scanners share the core of its logic? This would remove a lot of code duplication, and I think we can do so without incurring a performance penalty of interfaces. For example, if both scanner objects have scanFoo
function that behaves differently, we could imagine a config struct that would provide this function, i.e. something like
type ScannerConfig struct {
scanFoo func(...)
scanBar func(...)
}
NewScanner(cfg ScannerConfig) Scanner {
return Scanner{...}
}
func (s *Scanner) Scan() {
...
switch ch {
case 'foo':
s.cfg.scanFoo(...)
case 'bar':
s.cfg.scanBar(...)
// all other cases have `scan` functions that behave the same
...
}
Am I missing something that this wouldn't work?
pkg/sql/plpgsql/parser/parser_test.go
line 23 at r1 (raw file):
) func TestParseDatadriven(t *testing.T) {
nit: why this change? We have plenty of TestDataDriven
and almost none TestDatadriven
(I think we should instead do s/TestParseDatadriven/TestParseDataDriven
for an already existing test).
This commit fixes formatting for variable and cursor declarations to use `FmtCtx.FormatNode` instead of `FormatString`. This ensures that variable names are correctly redacted when applicable, and that complex identifiers round-trip through the parser. Informs cockroachdb#106368 Release note (bug fix): Fixed a bug that prevented the use of PL/pgSQL routines with complex variable names that require double quotes. This bug has existed since v23.2.
This commit fixes handling for string constants in the PL/pgSQL parser. Now, the PL/pgSQL scanner handles string scanning itself instead of delegating to the SQL scanner. This ensures that the correct token ID is used when an error occurs (e.g. unterminated string). This commit also adds a check for the ERROR token during PL/pgSQL lexing, so that a scanning error is noticed and propagated back to the user. Informs cockroachdb#106368 Release note (bug fix): Fixed a bug that could cause creation of a syntactically invalid PL/pgSQL routine to return the wrong error. This bug has existed since v23.2.
This commit adds support for scanning an escaped string constant in a PL/pgSQL routine. This is necessary because escaped strings have imbalanced quotes (like `e'foobar''`), which previously caused the scanner to return an "unterminated string" error. Informs cockroachdb#106368 Release note (bug fix): Fixed a bug that could result in a syntax error if a PL/pgSQL routine was created with an escaped string constant in the routine body. This bug has existed since v23.2.
ea953ab
to
2a423d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/sql/plpgsql/parser/plpgsql.y
line 666 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Do we not have tests for this case?
We do, but I believe the way stmt_while
is referenced in the parser just happens to work out. If it was something like | foo stmt_while bar
where stmt_while
wasn't the only symbol, it might have been broken.
pkg/sql/scanner/plpgsql_scan.go
line 26 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: do you think it'd be worth making the currently-embedded
Scanner
object actually a named variable so that we could see more easily whats.scanString
actually refers to (i.e. it'sPLpgSQLScanner.scanString
orScanner.scanString
)?
Good idea, done.
pkg/sql/scanner/plpgsql_scan.go
line 147 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'm confused - I just copied the contents of
scanString
from this PR and replacedScanner.scanString
with them, and there were effectively no differences. What are the different parts? Or if the code exactly the same, and the only difference is others.scan*
methods that get called by this code that was copied?
The tokens are pretty much it. There are some places where the plpgsql scanner has removed some logic from the sql scanner, but from my reading of postgres source I think it's safe to combine them. I'd rather do that in a different PR though, if that's alright.
pkg/sql/plpgsql/parser/parser_test.go
line 23 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: why this change? We have plenty of
TestDataDriven
and almost noneTestDatadriven
(I think we should instead dos/TestParseDatadriven/TestParseDataDriven
for an already existing test).
I was changing it to fit the SQL test, but I'll just change that one instead.
Opened an issue to de-duplicate the scanning logic here: #119755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4, 4 of 4 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
pkg/sql/scanner/plpgsql_scan.go
line 147 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
The tokens are pretty much it. There are some places where the plpgsql scanner has removed some logic from the sql scanner, but from my reading of postgres source I think it's safe to combine them. I'd rather do that in a different PR though, if that's alright.
SGTM
TFTRs! bors r+ |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from fa63b60 to blathers/backport-release-23.2-119034: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
plpgsql: correctly format variable names in declarations
This commit fixes formatting for variable and cursor declarations to
use
FmtCtx.FormatNode
instead ofFormatString
. This ensures thatvariable names are correctly redacted when applicable, and that
complex identifiers round-trip through the parser.
Informs #106368
Release note (bug fix): Fixed a bug that prevented the use of PL/pgSQL
routines with complex variable names that require double quotes. This
bug has existed since v23.2.
plpgsql/parser: propagate errors during string scanning
This commit fixes handling for string constants in the PL/pgSQL parser.
Now, the PL/pgSQL scanner handles string scanning itself instead of delegating
to the SQL scanner. This ensures that the correct token ID is used when an
error occurs (e.g. unterminated string). This commit also adds a check for the
ERROR token during PL/pgSQL lexing, so that a scanning error is noticed and
propagated back to the user.
Informs #106368
Release note (bug fix): Fixed a bug that could cause creation of a
syntactically invalid PL/pgSQL routine to return the wrong error. This bug has
existed since v23.2.
plpgsql/parser: fix parsing for escaped strings
This commit adds support for scanning an escaped string constant in a
PL/pgSQL routine. This is necessary because escaped strings have imbalanced
quotes (like
e'foobar''
), which previously caused the scanner to returnan "unterminated string" error.
Informs #106368
Release note (bug fix): Fixed a bug that could result in a syntax error
if a PL/pgSQL routine was created with an escaped string constant in
the routine body. This bug has existed since v23.2.