Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices to avoid dangerous mutations #47227

Open
FedericoCeratto opened this issue Mar 3, 2023 · 2 comments
Open

Best practices to avoid dangerous mutations #47227

FedericoCeratto opened this issue Mar 3, 2023 · 2 comments
Labels
question Question?

Comments

@FedericoCeratto
Copy link
Contributor

Sometimes it is necessary to update the value of a column across a large number of row.
The ALTER TABLE ... UPDATE command presents some risks:

Often this needs to be done on tables that are receiving new inserts during the update.

Are there recommended ways to do so safely while minimizing I/O load? Could you please comment on alternatives e.g.
a) Copy the whole table and run a mutation on the copy, then exchange the tables
b) Select+insert the affected rows into a new table, then copy them back and run OPTIMIZE TABLE ... FINAL assuming deduplication can be used
c) Select+insert the affected rows into a new table, run a DELETE mutation on the original table and copy the rows back

Thanks!

@FedericoCeratto FedericoCeratto added the question Question? label Mar 3, 2023
@den-crane
Copy link
Contributor

I test all update/delete in a stage environment with the same tables (DDL) and the same version of Clickhouse.

@FedericoCeratto
Copy link
Contributor Author

@den-crane FWIW we do the same but we are not confident that data corruption bugs or stuck mutations are 100% reproducible across testbeds and production, hence my question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question?
Projects
None yet
Development

No branches or pull requests

2 participants