Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

02362_part_log_merge_algorithm is flaky #63633

Closed
mstetsyuk opened this issue May 10, 2024 · 2 comments
Closed

02362_part_log_merge_algorithm is flaky #63633

mstetsyuk opened this issue May 10, 2024 · 2 comments

Comments

@mstetsyuk
Copy link
Member

mstetsyuk commented May 10, 2024

02362_part_log_merge_algorithm is flaky: one, two, three, four.

After two INSERTs followed by an OPTIMIZE, the test expects the part log to only have the following events:

NewPart
NewPart
MergeParts

However, because the test has a race condition with ReplicatedMergeTreeCleanupThread removing stale parts, the part log also includes RemovePart events.

The test can be fixed by adding WHERE event_type IN ('NewPart', 'MergeParts') to filter out stale parts removals (as well as any other potential background events). Alternatively, we can do WHERE event_type != 'RemovePart' to only filter out removals, and still fail if other background events happen.

@tavplubix
Copy link
Member

Well done @mstetsyuk, thanks! However, one thing is not clear:

However, because the test has a race condition with ReplicatedMergeTreeCleanupThread removing stale parts, the part log also includes RemovePart events.

There's an old_parts_lifetime setting (8 minutes by default) which delays the removal of stale parts. How is the race condition possible if the test execution time is about 30-40 seconds, and it's much less than 8 minutes? Do we have a bug that causes premature removal of data parts?

@mstetsyuk
Copy link
Member Author

mstetsyuk commented May 13, 2024

There's an old_parts_lifetime setting (8 minutes by default) which delays the removal of stale parts. How is the race condition possible if the test execution time is about 30-40 seconds, and it's much less than 8 minutes?

@tavplubix, the value of old_parts_lifetime is randomized. In fact, in all the examples that I listed, old_parts_lifetime is set to 10, making it possible for the race condition to manifest itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants