Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plpgsql: misc fixes for parsing and formatting identifiers and constants #119034

Merged
merged 3 commits into from Feb 29, 2024

Conversation

DrewKimball
Copy link
Collaborator

plpgsql: correctly format variable names in declarations

This commit fixes formatting for variable and cursor declarations to
use FmtCtx.FormatNode instead of FormatString. This ensures that
variable names are correctly redacted when applicable, and that
complex identifiers round-trip through the parser.

Informs #106368

Release note (bug fix): Fixed a bug that prevented the use of PL/pgSQL
routines with complex variable names that require double quotes. This
bug has existed since v23.2.

plpgsql/parser: propagate errors during string scanning

This commit fixes handling for string constants in the PL/pgSQL parser.
Now, the PL/pgSQL scanner handles string scanning itself instead of delegating
to the SQL scanner. This ensures that the correct token ID is used when an
error occurs (e.g. unterminated string). This commit also adds a check for the
ERROR token during PL/pgSQL lexing, so that a scanning error is noticed and
propagated back to the user.

Informs #106368

Release note (bug fix): Fixed a bug that could cause creation of a
syntactically invalid PL/pgSQL routine to return the wrong error. This bug has
existed since v23.2.

plpgsql/parser: fix parsing for escaped strings

This commit adds support for scanning an escaped string constant in a
PL/pgSQL routine. This is necessary because escaped strings have imbalanced
quotes (like e'foobar''), which previously caused the scanner to return
an "unterminated string" error.

Informs #106368

Release note (bug fix): Fixed a bug that could result in a syntax error
if a PL/pgSQL routine was created with an escaped string constant in
the routine body. This bug has existed since v23.2.

@DrewKimball DrewKimball added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Feb 9, 2024
@DrewKimball DrewKimball requested review from mgartner, yuzefovich and a team February 9, 2024 21:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball
Copy link
Collaborator Author

Found these while working on #106368

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/plpgsql/parser/plpgsql.y line 666 at r1 (raw file):

| stmt_while
  {
    $$.val = $1.statement()

Do we not have tests for this case?


pkg/sql/scanner/plpgsql_scan.go line 147 at r2 (raw file):

}

// scanString is similar to Scanner.scanString, but uses PL/pgSQL tokens.

Are the tokens the only difference? I wonder if there's a clever way we can combine the two functions.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


pkg/sql/scanner/plpgsql_scan.go line 26 at r2 (raw file):

// PLpgSQLScanner is a scanner with a PLPGSQL specific scan function
type PLpgSQLScanner struct {
	Scanner

nit: do you think it'd be worth making the currently-embedded Scanner object actually a named variable so that we could see more easily what s.scanString actually refers to (i.e. it's PLpgSQLScanner.scanString or Scanner.scanString)?


pkg/sql/scanner/plpgsql_scan.go line 147 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Are the tokens the only difference? I wonder if there's a clever way we can combine the two functions.

I'm confused - I just copied the contents of scanString from this PR and replaced Scanner.scanString with them, and there were effectively no differences. What are the different parts? Or if the code exactly the same, and the only difference is other s.scan* methods that get called by this code that was copied?


pkg/sql/scanner/plpgsql_scan.go line 73 at r3 (raw file):

		return

	case 'e', 'E':

Interesting that Scanner.Scan has this code block, so I have the same question as Marcus: can both scanners share the core of its logic? This would remove a lot of code duplication, and I think we can do so without incurring a performance penalty of interfaces. For example, if both scanner objects have scanFoo function that behaves differently, we could imagine a config struct that would provide this function, i.e. something like

type ScannerConfig struct {
  scanFoo func(...)
  scanBar func(...)
}

NewScanner(cfg ScannerConfig) Scanner {
  return Scanner{...}
}

func (s *Scanner) Scan() {
...
  switch ch {
  case 'foo':
    s.cfg.scanFoo(...)
  case 'bar':
    s.cfg.scanBar(...)
  // all other cases have `scan` functions that behave the same
...
}

Am I missing something that this wouldn't work?


pkg/sql/plpgsql/parser/parser_test.go line 23 at r1 (raw file):

)

func TestParseDatadriven(t *testing.T) {

nit: why this change? We have plenty of TestDataDriven and almost none TestDatadriven (I think we should instead do s/TestParseDatadriven/TestParseDataDriven for an already existing test).

This commit fixes formatting for variable and cursor declarations to
use `FmtCtx.FormatNode` instead of `FormatString`. This ensures that
variable names are correctly redacted when applicable, and that
complex identifiers round-trip through the parser.

Informs cockroachdb#106368

Release note (bug fix): Fixed a bug that prevented the use of PL/pgSQL
routines with complex variable names that require double quotes. This
bug has existed since v23.2.
This commit fixes handling for string constants in the PL/pgSQL parser.
Now, the PL/pgSQL scanner handles string scanning itself instead of delegating
to the SQL scanner. This ensures that the correct token ID is used when an
error occurs (e.g. unterminated string). This commit also adds a check for the
ERROR token during PL/pgSQL lexing, so that a scanning error is noticed and
propagated back to the user.

Informs cockroachdb#106368

Release note (bug fix): Fixed a bug that could cause creation of a
syntactically invalid PL/pgSQL routine to return the wrong error. This bug has
existed since v23.2.
This commit adds support for scanning an escaped string constant in a
PL/pgSQL routine. This is necessary because escaped strings have imbalanced
quotes (like `e'foobar''`), which previously caused the scanner to return
an "unterminated string" error.

Informs cockroachdb#106368

Release note (bug fix): Fixed a bug that could result in a syntax error
if a PL/pgSQL routine was created with an escaped string constant in
the routine body. This bug has existed since v23.2.
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


pkg/sql/plpgsql/parser/plpgsql.y line 666 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Do we not have tests for this case?

We do, but I believe the way stmt_while is referenced in the parser just happens to work out. If it was something like | foo stmt_while bar where stmt_while wasn't the only symbol, it might have been broken.


pkg/sql/scanner/plpgsql_scan.go line 26 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: do you think it'd be worth making the currently-embedded Scanner object actually a named variable so that we could see more easily what s.scanString actually refers to (i.e. it's PLpgSQLScanner.scanString or Scanner.scanString)?

Good idea, done.


pkg/sql/scanner/plpgsql_scan.go line 147 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm confused - I just copied the contents of scanString from this PR and replaced Scanner.scanString with them, and there were effectively no differences. What are the different parts? Or if the code exactly the same, and the only difference is other s.scan* methods that get called by this code that was copied?

The tokens are pretty much it. There are some places where the plpgsql scanner has removed some logic from the sql scanner, but from my reading of postgres source I think it's safe to combine them. I'd rather do that in a different PR though, if that's alright.


pkg/sql/plpgsql/parser/parser_test.go line 23 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: why this change? We have plenty of TestDataDriven and almost none TestDatadriven (I think we should instead do s/TestParseDatadriven/TestParseDataDriven for an already existing test).

I was changing it to fit the SQL test, but I'll just change that one instead.

@DrewKimball
Copy link
Collaborator Author

Opened an issue to de-duplicate the scanning logic here: #119755

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r4, 4 of 4 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)


pkg/sql/scanner/plpgsql_scan.go line 147 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

The tokens are pretty much it. There are some places where the plpgsql scanner has removed some logic from the sql scanner, but from my reading of postgres source I think it's safe to combine them. I'd rather do that in a different PR though, if that's alright.

SGTM

@DrewKimball
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 29, 2024

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 29, 2024

Build succeeded:

@craig craig bot merged commit fcbcbee into cockroachdb:master Feb 29, 2024
18 checks passed
Copy link

blathers-crl bot commented Feb 29, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from fa63b60 to blathers/backport-release-23.2-119034: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants