Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to force deduplicate children #9070

Merged
merged 5 commits into from Feb 28, 2020

Conversation

urykhy
Copy link
Contributor

@urykhy urykhy commented Feb 11, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

reduce dataloss chance on insert into table with mv.

Detailed description / Documentation draft:

if you insert data into replicated table with mv, there is a chance, that ZooKeeper exception can cause data inserted into table only, since if CH do not try to insert into mv if block already deduplicated by main table. Usually this works well, but if ZooKeeper error occures after insert into main table, this block will never reach MV. If u set force_deduplicate_childrens=1, then CH will try to insert block into MV even if it deduplicated by parent table.

partial fix for #2621

@akuzm
Copy link
Contributor

akuzm commented Feb 12, 2020

Is it possible to add some kind of integration test for this?

@urykhy
Copy link
Contributor Author

urykhy commented Feb 14, 2020

@akuzm i'm not sure

we arrange test environment to trigger this kind of error, but:

  • it's build from closed components (uploader, tests, test data)
  • not 100% accuracy, this patch only reduce rate of error. it's still possible that data will not reach MV or reach MV twice.
  • time to run at least 15 minutes.

@alexey-milovidov alexey-milovidov changed the title add option to force deduplicate childrens add option to force deduplicate children Feb 14, 2020
@den-crane
Copy link
Contributor

den-crane commented Feb 15, 2020

@urykhy at least you can emulate MV insert fail
https://gist.github.com/den-crane/bc35a15a8d71899fe15e5ed36668d387

As far as I understand with your setting the second insert (retry) should insert rows into test12345mv and SELECT count() FROM test12345mv should return rows.

You can implement two test cases with your setting and without.
I assume these tests will be pretty stable and fast enough.

@alexey-milovidov
Copy link
Member

01083_expressions_in_engine_arguments

  • well known issue.

@alexey-milovidov alexey-milovidov merged commit 7b511b2 into ClickHouse:master Feb 28, 2020
@urykhy urykhy deleted the fix-mv-insert-2621 branch February 28, 2020 06:31
@nikitamikhaylov nikitamikhaylov added the pr-improvement Pull request with some product improvements label Feb 28, 2020
nikitamikhaylov pushed a commit that referenced this pull request Mar 5, 2020
add option to force deduplicate children

(cherry picked from commit 7b511b2)
nikitamikhaylov pushed a commit that referenced this pull request Mar 5, 2020
add option to force deduplicate children

(cherry picked from commit 7b511b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants