Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: treat memory monitor errors as retriable #120806

Merged
merged 1 commit into from Mar 22, 2024

Conversation

fqazi
Copy link
Collaborator

@fqazi fqazi commented Mar 21, 2024

Release note (bug fix): Memory exhaustion errors which can occur during schema changes are no longer considered to be permanent failures. These schema changes will now be retried rather than reverted.

This is just rebase of a PR by @ajwerner: #71816, with a minor change to improve test reliability.

Fixes: #120807

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@fqazi fqazi force-pushed the memory-monitor-errors branch 2 times, most recently from 6ed0b14 to 8dc92b1 Compare March 21, 2024 18:38
@fqazi fqazi marked this pull request as ready for review March 21, 2024 18:38
@fqazi fqazi requested a review from a team as a code owner March 21, 2024 18:38
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just had minor nits. thanks for reviving this!

@@ -254,7 +254,7 @@ func IsPermanentSchemaChangeError(err error) bool {
}

switch pgerror.GetPGCode(err) {
case pgcode.SerializationFailure, pgcode.InternalConnectionFailure:
case pgcode.SerializationFailure, pgcode.InternalConnectionFailure, pgcode.OutOfMemory, pgcode.DiskFull:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm for DiskFull does retrying also make sense? i'm wondering if retrying could make the issue worse.

// Run across both nodes to make sure that the error makes it across distsql
// boundaries.
testutils.RunTrueAndFalse(t, "local", func(t *testing.T, local bool) {
var shouldFail int64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make this a atomic.Int64 for slightly better readability

tdb.Exec(t, "INSERT INTO foo VALUES (1)")
tdb.Exec(t, `ALTER TABLE foo EXPERIMENTAL_RELOCATE SELECT ARRAY[$1], 1`,
tc.Server(dataNode).GetFirstStoreID())
tdb.Exec(t, `ALTER TABLE foo ADD COLUMN j INT NOT NULL DEFAULT 42`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add one more assertion here to verify that shouldFail is > 2? that way we are sure the error was injected.

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafiss TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/schema_changer.go line 257 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

hm for DiskFull does retrying also make sense? i'm wondering if retrying could make the issue worse.

Done.

Good point, and I reverted this part since that is riskier.

@fqazi
Copy link
Collaborator Author

fqazi commented Mar 22, 2024

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 22, 2024

Build failed:

Release note (bug fix): Memory exhaustion errors which can occur during schema
changes are no longer considered to be permanent failures. These schema changes
will now be retried rather than reverted.

Fixes: cockroachdb#120807
@fqazi
Copy link
Collaborator Author

fqazi commented Mar 22, 2024

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 22, 2024

@craig craig bot merged commit 620aa02 into cockroachdb:master Mar 22, 2024
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: treat memory monitor errors as retriable
4 participants