Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: DROP ... CASCADE can be slow #26373

Closed
ParkMyCar opened this issue Apr 2, 2024 · 1 comment · Fixed by #27169
Closed

sql: DROP ... CASCADE can be slow #26373

ParkMyCar opened this issue Apr 2, 2024 · 1 comment · Fixed by #27169
Labels
C-bug Category: something is broken

Comments

@ParkMyCar
Copy link
Member

What version of Materialize are you using?

v0.94.2

What is the issue?

Running a DROP ... CASCADE can be relatively slow because we collect all of the items to drop at the planning stage and then is subject to a similar issue as seen in #26361

@ParkMyCar ParkMyCar added the C-bug Category: something is broken label Apr 2, 2024
@chaas
Copy link
Contributor

chaas commented Apr 10, 2024

Looked at stats for current blue/green use cases and the drop can be as slow as 7s, which should be improved, but is not as severe as #26361.
@ParkMyCar ran some tests that are consistent with this number.
See https://materializeinc.slack.com/archives/C02FWJ94HME/p1712178602536649 for more details.
Moving this to the backlog for now.

ParkMyCar added a commit that referenced this issue May 21, 2024
)

This PR refactors `mz_catalog::Op::DropObject` to take a `Vec<ObjectId>`
instead of a single `ObjectId`.

Previously a `DROP ... CASCADE` would result in a `Op::DropObject` for
every object we needed to drop. This had poor performance because
removing a batch of items from a durable catalog transaction is `O(n)`,
where `n` is the number of objects in the catalog. But because we would
issue an `Op::DropObject` for every item we're dropping, we would have a
batch size of 1 and the runtime of `DROP ... CASCADE` became `O(m * n)`,
where `m` is the number of objects we're dropping. With this refactoring
the runtime of `DROP ... CASCADE` is `O(n)`.

Concretely I tested creating a schema with 1 table and 1,000 views then
running `DROP SCHEMA a CASCADE`:

* `main`: ~28 seconds
* This PR: ~700ms

### Motivation

Fixes #26373

### Tips for reviewer

Hiding white space helps, a large code block got surrounded by `for id
in object_ids { ... }` and indented

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [x] This PR includes the following [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note):
  - Improves the performance of dropping large amount of items at once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants