Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: cli: compute --drain-wait based on cluster setting values #98577

Merged
merged 3 commits into from
Mar 14, 2023

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Mar 14, 2023

Backport 3/3 commits from #98390.

/cc @cockroachdb/release

Release justification: low risk improvement to an important area


This includes a few commits related to draining


cli: compute --drain-wait based on cluster setting values

fixes #98388

Release note (cli change): The --drain-wait argument for the drain command will be automatically increased if the command detects that it is smaller than the sum of server.shutdown.drain_wait, server.shutdown.connection_wait, server.shutdown.query_wait times two, and server.shutdown.lease_transfer_wait.

This recommendation was already documented, but now the advice will be applied automatically.


sql: fix check for closing connExecutor during draining

This fixes a minor bug in which the connection would not get closed at
the earliest possible time during server shutdown.

The connection is supposed to be closed as soon as we handle a Sync
message when the conn_executor is in the draining state and not in a
transaction. Since the transaction state was checked before state
transitions occurred, this would cause the connection to remain open for
an extra bit of time. This was particularly a problem because the Sync
message is also the command that auto-commits an implicit transaction.
So before this commit, it was actually impossible for the check to work
as it was supposed to.

Now we check the txn state after state transitions occur.


roachtest: enhance drain test

The test now does much more:

  • Checks that --drain-wait is automatically increased if it is set lower
    than the cluster settings require.
  • Check that the /health?ready=1 endpoint fails during the drain_wait
    period.
  • Check for the proper error message during the connection_wait phase.
  • Check for the proper error message when trying to begin a new
    query/transaction during the query_wait phase.
  • Check that an open transaction is allowed to continue during the
    query_wait phase.
  • Check for the proper error message when a query is canceled during
    shutdown.

Release note (cli change): The --drain-wait argument for the `drain`
command will be automatically increased if the command detects that it
is smaller than the sum of server.shutdown.drain_wait,
server.shutdown.connection_wait, server.shutdown.query_wait times two,
and server.shutdown.lease_transfer_wait. If the --drain-wait argument is
0, then no timeout is used.

This recommendation was already documented, but now the advice will be
applied automatically.
This fixes a minor bug in which the connection would not get closed at
the earliest possible time during server shutdown.

The connection is supposed to be closed as soon as we handle a Sync
message when the conn_executor is in the draining state and not in a
transaction. Since the transaction state was checked before state
transitions occurred, this would cause the connection to remain open for
an extra bit of time. This was particularly a problem because the Sync
message is also the command that auto-commits an implicit transaction.
So before this commit, it was actually impossible for the check to work
as it was supposed to.

Now we check the txn state after state transitions occur.

Release note: None
@rafiss rafiss requested a review from knz March 14, 2023 14:20
@rafiss rafiss requested a review from a team as a code owner March 14, 2023 14:20
@rafiss rafiss requested a review from a team March 14, 2023 14:20
@rafiss rafiss requested a review from a team as a code owner March 14, 2023 14:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz changed the title cli: compute --drain-wait based on cluster setting values release-22.2: cli: compute --drain-wait based on cluster setting values Mar 14, 2023
The test now does much more:
- Checks that --drain-wait is automatically increased if it is set lower
  than the cluster settings require.
- Check that the /health?ready=1 endpoint fails during the drain_wait
  period.
- Check for the proper error message during the connection_wait phase.
- Check for the proper error message when trying to begin a new
  query/transaction  during the query_wait phase.
- Check that an open transaction is allowed to continue during the
  query_wait phase.
- Check for the proper error message when a query is canceled during
  shutdown.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants