Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Allow restart savepoint to be renamed #31971

Merged
merged 1 commit into from
Nov 13, 2018
Merged

sql: Allow restart savepoint to be renamed #31971

merged 1 commit into from
Nov 13, 2018

Conversation

bobvawter
Copy link
Member

@bobvawter bobvawter commented Oct 29, 2018

Currently, the only allowable savepoint names are those beginning with
cockroach_restart which triggers a Cockroach-specific transaction-retry flow.
This creates compatibility issues with certain integration scenarios (e.g. apps
using Spring ORM) that can take advantage of restartable transactions, but
which assume than an arbitrary savepoint identifier may be used.

This patch introduces a new session variable force_savepoint_restart which
allows the user to customize the savepoint name to any valid identifier, while
always triggering a restartable transaction.

Telemetry:

Resolves #30588 (customize SAVEPOINT name)
Relates to #15012 (possible change for detecting "empty" transactions)

Release note (sql change): Users can customize the auto-retry savepoint name
used by the SAVEPOINT command by setting the force_savepoint_restart
session variable. For example: SET force_savepoint_restart=true; BEGIN; SAVEPOINT foo will now function as desired. This session variable may also be
supplied as part of a connection string to support existing code that assumes
that arbitrary savepoint names may be used.

Release note (sql change): The names supplied to a SAVEPOINT command are now
properly treated as SQL identifiers. That is SAVEPOINT foo and SAVEPOINT FOO are now equivalent statements.

@bobvawter bobvawter requested review from andreimatei, knz and a team October 29, 2018 15:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bobvawter
Copy link
Member Author

bobvawter commented Oct 29, 2018

@andreimatei, can you review this since @knz is occupied this week?

@bobvawter bobvawter requested a review from a team as a code owner October 29, 2018 17:52
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General architecture :lgtm:

However you missed something: the postgres savepoint name follows regular identifier naming rules.

For example it's possible to use mixed case identifiers with double quotes.

Therefore it is not correct to upper case the in-memory copy in the session variable, nor it is correct to case-insensitively compare the provided name with the stored name.

Please modify that part of the code and add mixed-case tests to exercise.

Thanks

Reviewed 10 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@andreimatei
Copy link
Contributor

Is this patch about accepting a customizable prefix on cockroach_restart or about fully customizing the savepoint name?
Because the former seems a lot more palatable to me than the former and also in that case I think I'd suggest making an optional prefix (or a list of prefixes) a cluster setting or maybe even baking some into crdb. Is there a fixed prefix we need to support for Spring?

@andreimatei
Copy link
Contributor

Or rather, I think we should accept any prefix to cockroach_restart without asking many questions.

@knz
Copy link
Contributor

knz commented Oct 29, 2018

What is this distinction "prefix" vs "all name" you're thinking about?

The PR (and the original request) was about customizing the entire savepoint name. Not just the prefix.

@andreimatei
Copy link
Contributor

The message starts with "apps using Spring ORM that assume than an arbitrary savepoint prefix may be used". So perhaps we can just accept any prefix and be done with it.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to set a pattern on the name here, or should it just be a boolean variable treat_all_savepoints_as_cockroach_restart? That is, in the future when/if we support real savepoints, would Spring (and others) support using one set of savepoint names for retries and another for regular savepoints? My guess is no, and in practice supporting any name other than cockroach_restart will be equivalent to having no restrictions at all.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@knz
Copy link
Contributor

knz commented Oct 30, 2018

Is it useful to set a pattern on the name here, or should it just be a boolean variable treat_all_savepoints_as_cockroach_restart?

I very strongly disagree with the idea of "all savepoints should be retry savepoints".

We need to delineate those savepoints which are used for the purpose of retries, so that we can add telemetry for all the others used for the purpose of savepointing. Otherwise we will lose the opportunity to measure demand for actual savepoints.

@bdarnell
Copy link
Contributor

To be clear, I'm not saying that we should remove the restriction entirely and automatically treat all savepoints as retry savepoints. I'm saying that applications should be able to opt in to this mode, and that opting in to this mode doesn't seem to be substantially worse than opting into a prefix-based whitelist (and it's easier - I doubt the naming scheme used by your ORM is well-documented or even guaranteed to be stable across releases). The prefix-based whitelist only makes sense if there are ORMs that don't allow enough control to use cockroach_restart, but do allow enough control to use two separate prefixes. This seems unlikely to me.

@knz
Copy link
Contributor

knz commented Oct 30, 2018

Ben, Andrei, why are both of you referring to "prefixes"?

This PR (and the linked issue) are about opting in into an alternate identifier for the savepoint name. I don't see the specification/request to recognize a "prefix". Where did you get that idea?

@bobvawter
Copy link
Member Author

bobvawter commented Oct 30, 2018 via email

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my part, I was confused about what would be sufficient to support Spring - I thought Spring puts its own prefix in front of a user-configurable savepoint name - so we end up with savepoint xxx-cockroach_restart. That's apparently not the case.

I've talked offline with @bobvawter and suggested that we just make a cluster setting that disables the savepoint name validation entirely (as Ben was also suggesting, although he didn't say cluster setting vs session var). The argument is that, since we reject savepoints coming after the beginning of the txn regardless of the name anyway, technically it's not incorrect to accept any name as the savepoint immediately after the begin. Like, even if the app doesn't know what it's doing, savepoints immediately after begin technically behave like proper SQL savepoints.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@knz
Copy link
Contributor

knz commented Oct 30, 2018

  1. Andrei, Ben, a cluster setting is also not the good choice. This behavior should be controllable per app.

Bob, our design pattern for this kind of feature so far has been a session var, with a default value configurable via a cluster settings. See for example how the distsql and optimizer session vars are handled and used. We could (should, probably) handle the savepoint naming in that way.

  1. I reiterate that postgres savepoints are identifiers and the identifier rules apply. So regardless of whether the comparison is done by prefix or using the full savepoint name, there must not be any case folding involved in the code.

  2. I want to know what Spring expects to be able to do. We should provide enough flexibility that it works with spring without shutting off any future situation where we would support both retry savepoints and regular savepoints. A "large hammer" solution in the shape of "all savepoints become retry savepoints" does not seem appropriate, unless nothing else works for Spring.

@bdarnell
Copy link
Contributor

The argument is that, since we reject savepoints coming after the beginning of the txn regardless of the name anyway, technically it's not incorrect to accept any name as the savepoint immediately after the begin. Like, even if the app doesn't know what it's doing, savepoints immediately after begin technically behave like proper SQL savepoints.

That's not true. Once we accept a savepoint as a retry signal, we change the semantics of RELEASE SAVEPOINT, treating it as the real commit. That's why it's important that this be opt in, and as narrowly-scoped as possible (so I don't think a cluster setting is a good idea even if it's just setting a default for a session variable).

@andreimatei
Copy link
Contributor

andreimatei commented Oct 30, 2018 via email

@bobvawter
Copy link
Member Author

To summarize the discussion so far:

  • We have multiple customers for whom SAVEPOINT cockroach_restart would improve their ability to restart transactions, but, for various reasons, using this exact savepoint name is a non-starter.
  • It could be the case that savepoint names do not include any stable prefix that we could match against.
  • Cockroach's restart savepoint has semantics that are different from a standard savepoint in that RELEASE SAVEPOINT is analogous to a COMMIT and affect subsequent transaction priorities.
  • If we were to implement full savepoint support, we'd still need to support this restartable flow into the future.
  • Savepoint names are, properly, SQL identifiers and the existing case-folding isn't quite right.
  • In general, cluster settings should be used to provide default values for session variables.
  • We want telemetry to measure the use of restart savepoints vs. unimplemented savepoints.

Based on the above, I'd like to rework the PR to:

  • Add a boolean session variable, force_savepoint_restart, which clearly indicates that you're painting with a wide brush.
  • If the variable is false, preserve current behavior by case-insensitively matching for cockroach_restart.
  • If the variable is true, we accept any identifier for the savepoint name and always trigger the restart behavior.
  • In either case, store the ident used to ensure that the RELEASE and ROLLBACK commands use an exactly-matching ident.
    • This would break clients who SAVEPOINT cockroach_restart but then RELEASE COCKROACH_RESTART.
    • We can call this out as a breaking-change release note which users can trivially fix.

I have a preference against exposing a cluster-wide setting to make force_savepoint_restart to default to true, since it seems like clients really ought to know that they're getting a Cockroach-specific behavior. If a random, non-restart aware, savepoint-using client were to connect to that cluster, the observed behavior would be quite unexpected.

Future work:

  • If we were to add full support for savepoints in the future, we would certainly retain SAVEPOINT cockroach_restart as a special case to allow restart and regular savepoints to be mixed in a session.
  • If we want to allow clients that use sessions with force_savepoint_restart=true to be able to use regular savepoints, we could use a minor syntax extension like SAVEPOINT foo NO RESTART allowing them to gradually transition.

@knz
Copy link
Contributor

knz commented Oct 31, 2018

It could be the case that savepoint names do not include any stable prefix that we could match against.

What's the evidence for that?

This would break clients who SAVEPOINT cockroach_restart but then RELEASE COCKROACH_RESTART.

nope -- the syntax cockroach_restart and COCKROACH_RESTART are equivalent already, and also equivalent when using proper identifier semantics. That's because they are both unquoted and thus get normalized during lexing to lowercase cockroach_restart.

A difference would exist between SAVEPOINT Foo and SAVEPOINT "Foo", or SAVEPOINT COCKROACH_RESTART and SAVEPOINT "COCKROACH_RESTART".


No comment on the rest. I personally support the proposal to have an opt-in session var.

I still would like to double-check the intended semantics by finding out actual evidence of what is needed by clients (see first point above).

@bdarnell
Copy link
Contributor

We have multiple customers for whom SAVEPOINT cockroach_restart would improve their ability to restart transactions, but, for various reasons, using this exact savepoint name is a non-starter.

Have we explored the option of just executing the SAVEPOINT command directly with these customers? The ORM's nested transaction framework may use fixed savepoint names, but if they're not getting any benefit from integrating with the ORM then db.execute("SAVEPOINT cockroach_restart") might work. There may be enough reason to integrate with the ORM (that's why we made our sqlalchemy patch), but it may not be strictly required.

It could be the case that savepoint names do not include any stable prefix that we could match against.

What's the evidence for that?

SQLAlchemy (without our intrusive patch), for example, will always use a savepoint named sa_savepoint_%s (with %s replaced by a per-client counter). We could match on the prefix sa_savepoint_, but it tells us nothing but "this savepoint was created by sqlalchemy", and doesn't tell us whether the client is prepared for our "commit on release" semantic shift.

Based on the above, I'd like to rework the PR to:

I agree with this plan.

@bobvawter bobvawter requested a review from a team October 31, 2018 18:55
@bobvawter
Copy link
Member Author

PTAL: This PR has been reworked with the boolean-flag approach.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on principle -- I am personally a bit cold to the idea of "painting with a large brush" but as long as the setting is opt-in, we're good.

However before this merges this PR needs to clarify the identifier case sensitivity in tests -- I think there is a bug remaining (see my comment below) and an accompanying test should be added.

Reviewed 13 of 13 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/conn_executor_exec.go, line 1401 at r2 (raw file):

		return errors.Errorf(`SAVEPOINT "%s" is in use`, ex.state.activeSavepointName)
	}
	if !ex.sessionData.ForceSavepointRestart && !strings.HasPrefix(savepoint.Normalize(), RestartSavepointName) {

Remove the Normalize() call. This has been done by the lexer already.

Add tests that confirm the identifier casing rules are applied properly.


pkg/sql/conn_executor_exec.go, line 1403 at r2 (raw file):

	if !ex.sessionData.ForceSavepointRestart && !strings.HasPrefix(savepoint.Normalize(), RestartSavepointName) {
		return pgerror.UnimplementedWithIssueError(10735,
			"SAVEPOINT not supported except for "+RestartSavepointName)

Consider adding a hint between parentheses that if the user is trying to implement client retries they can change the name of the savepoint with the new setting.

@bobvawter bobvawter requested a review from a team November 12, 2018 14:40
Copy link
Member Author

@bobvawter bobvawter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for case-sensitivity.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/conn_executor_exec.go, line 1401 at r2 (raw file):

Previously, knz (kena) wrote…

Remove the Normalize() call. This has been done by the lexer already.

Add tests that confirm the identifier casing rules are applied properly.

Done.


pkg/sql/conn_executor_exec.go, line 1403 at r2 (raw file):

Previously, knz (kena) wrote…

Consider adding a hint between parentheses that if the user is trying to implement client retries they can change the name of the savepoint with the new setting.

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/conn_executor_exec.go, line 1401 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

Done.

The new code is not correct either. You need string(savepoint).

.String() happens to work because the current value of RestartSavepointName happens to be lowercase and not a keyword. Should any of these two properties change, String() would fail to do what you want.


pkg/sql/conn_executor_exec.go, line 1399 at r3 (raw file):

			return nil
		}
		return errors.Errorf(`SAVEPOINT "%s" is in use`, ex.state.activeSavepointName)

Use this instead:

`SAVEPOINT %q is in use`, tree.ErrString(&ex.state.activeSavepointName)

Also, consider using pgerror.NewErrorf with a suitable error code (perhaps the same as in postgres - 3B001 CodeInvalidSavepointSpecificationError)


pkg/sql/conn_executor_exec.go, line 1405 at r3 (raw file):

			"SAVEPOINT not supported except for "+RestartSavepointName,
			"Retryable transactions with arbitrary SAVEPOINT names can be enabled "+
				"with SET force_savepoint_restart=true;")

nit: no need for a semicolon at the end. You can write ... can be enabled with "SET force_savepoint_restart = true".

Currently, the only allowable savepoint names are those beginning with
`cockroach_restart` which triggers a Cockroach-specific transaction-retry flow.
This creates compatibility issues with certain integration scenarios (e.g. apps
using Spring ORM) that can take advantage of restartable transactions, but
which assume than an arbitrary savepoint identifier may be used.

This patch introduces a new session variable `force_savepoint_restart` which
allows the user to customize the savepoint name to any valid identifier, while
always triggering a restartable transaction.

Telemetry:
* Unimplemented-feature error now includes reference to #10735.
* `sql.force_savepoint_restart` event recorded when enabled.

Resolves #30588 (customize `SAVEPOINT` name)
Relates to #15012 (possible change for detecting "empty" transactions)

Release note (sql change): Users can customize the auto-retry savepoint name
used by the `SAVEPOINT` command by setting the `force_savepoint_restart`
session variable.  For example: `SET force_savepoint_restart=true; BEGIN;
SAVEPOINT foo` will now function as desired. This session variable may also be
supplied as part of a connection string to support existing code that assumes
that arbitrary savepoint names may be used.

Release note (sql change): The names supplied to a `SAVEPOINT` command are now
properly treated as SQL identifiers.  That is `SAVEPOINT foo` and `SAVEPOINT
FOO` are now equivalent statements.
Copy link
Member Author

@bobvawter bobvawter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/conn_executor_exec.go, line 1401 at r2 (raw file):

Previously, knz (kena) wrote…

The new code is not correct either. You need string(savepoint).

.String() happens to work because the current value of RestartSavepointName happens to be lowercase and not a keyword. Should any of these two properties change, String() would fail to do what you want.

Done. It just feels shady to cast a type alias back into a string.


pkg/sql/conn_executor_exec.go, line 1399 at r3 (raw file):

Previously, knz (kena) wrote…

Use this instead:

`SAVEPOINT %q is in use`, tree.ErrString(&ex.state.activeSavepointName)

Also, consider using pgerror.NewErrorf with a suitable error code (perhaps the same as in postgres - 3B001 CodeInvalidSavepointSpecificationError)

Done.


pkg/sql/conn_executor_exec.go, line 1405 at r3 (raw file):

Previously, knz (kena) wrote…

nit: no need for a semicolon at the end. You can write ... can be enabled with "SET force_savepoint_restart = true".

Removed the semicolon. I suppose the user will be able to determine if they need it or not.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/conn_executor_exec.go, line 1401 at r2 (raw file):

Previously, bobvawter (Bob Vawter) wrote…

Done. It just feels shady to cast a type alias back into a string.

Feel free to propose an additional method on Name that does the right thing.

@bobvawter
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Nov 13, 2018
31971: sql: Allow restart savepoint to be renamed r=bobvawter a=bobvawter

Currently, the only allowable savepoint names are those beginning with
`cockroach_restart` which triggers a Cockroach-specific transaction-retry flow.
This creates compatibility issues with certain integration scenarios (e.g. apps
using Spring ORM) that can take advantage of restartable transactions, but
which assume than an arbitrary savepoint identifier may be used.

This patch introduces a new session variable `force_savepoint_restart` which
allows the user to customize the savepoint name to any valid identifier, while
always triggering a restartable transaction.

Telemetry:
* Unimplemented-feature error now includes reference to #10735.
* `sql.force_savepoint_restart` event recorded when enabled.

Resolves #30588 (customize `SAVEPOINT` name)
Relates to #15012 (possible change for detecting "empty" transactions)

Release note (sql change): Users can customize the auto-retry savepoint name
used by the `SAVEPOINT` command by setting the `force_savepoint_restart`
session variable.  For example: `SET force_savepoint_restart=true; BEGIN;
SAVEPOINT foo` will now function as desired. This session variable may also be
supplied as part of a connection string to support existing code that assumes
that arbitrary savepoint names may be used.

Release note (sql change): The names supplied to a `SAVEPOINT` command are now
properly treated as SQL identifiers.  That is `SAVEPOINT foo` and `SAVEPOINT
FOO` are now equivalent statements.


Co-authored-by: Bob Vawter <bob@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 13, 2018

Build succeeded

@craig craig bot merged commit 1ed61fc into cockroachdb:master Nov 13, 2018
@knz knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation Nov 21, 2018
@knz knz moved this from Triage to Finished (m2.2-1) in (DEPRECATED) SQL Front-end, Lang & Semantics Nov 21, 2018
@bobvawter bobvawter deleted the custom-savepoint branch December 10, 2018 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants