Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema: pause schema change job when running out of disk space #125513

Closed
rimadeodhar opened this issue Jun 11, 2024 · 4 comments · Fixed by #126047
Closed

schema: pause schema change job when running out of disk space #125513

rimadeodhar opened this issue Jun 11, 2024 · 4 comments · Fixed by #126047
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@rimadeodhar
Copy link
Collaborator

rimadeodhar commented Jun 11, 2024

Currently, we pause a schema change job when its running out of disk space during an operation involving an index backfill.

Ask: Can we expand this functionality to column backfill as well?

This will help prevent a schema change from consuming all disk space and causing a cluster outage particularly on underprovisioned systems. See this support ticket for an example of such a case.

Jira issue: CRDB-39494

@rimadeodhar rimadeodhar added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jun 11, 2024
@rimadeodhar
Copy link
Collaborator Author

@fqazi - what do you think of the above approach? We should be able to expand this for other schema changes involving backfill as well right?

@fqazi
Copy link
Collaborator

fqazi commented Jun 11, 2024

I think its a reasonable approach, and I think was probably not there because it seems far less likely. We probably didn't expect the column backfill to balloon the size too much with an in place backfill.

@rimadeodhar
Copy link
Collaborator Author

rimadeodhar commented Jun 17, 2024

I think there is a very real possibility of column backfill leading to this scenario. In the legacy case, we will rewrite all the rows containing the new column which will easily 2x the table size until GC runs. With the declarative schema changer, we simply recreate the index so once again the space usage is high. I think this is worth prioritizing as it will at least prevent cluster outages due to the disk filling up.

@rafiss rafiss added the P-1 Issues/test failures with a fix SLA of 1 month label Jun 18, 2024
@rafiss
Copy link
Collaborator

rafiss commented Jun 18, 2024

Let's add this check for the column backfiller here:

cockroach/pkg/sql/backfill.go

Lines 2393 to 2397 in d7f4220

uint64(columnBackfillUpdateChunkSizeThresholdBytes.Get(&sc.settings.SV)),
backfill.ColumnMutationFilter,
); err != nil {
return err
}

And also in the declarative schema changer here:

return runBackfiller(ctx, deps, tracker, backfillProgresses, mergeProgresses, deps.TransactionalJobRegistry().CurrentJob(), tables)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants