Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SYSTEM UNFREEZE for ordinary database #38262

Merged

Conversation

PolyProgrammist
Copy link
Contributor

@PolyProgrammist PolyProgrammist commented Jun 21, 2022

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix SYSTEM UNFREEZE query for Ordinary database. Fix for #36424

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

if (!disk->exists(store_path))
continue;
for (auto prefix_it = disk->iterateDirectory(store_path); prefix_it->isValid(); prefix_it->next())
for (auto store_path: store_paths)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for loop added

@robot-ch-test-poll robot-ch-test-poll added the pr-bugfix Pull request with bugfix, not backported by default label Jun 21, 2022
@antonio2368 antonio2368 added the can be tested Allows running workflows for external contributors label Jun 21, 2022
@tavplubix tavplubix self-assigned this Jun 21, 2022
@tavplubix
Copy link
Member

I looked through the changes in previous PR (just to understand changes in this one) and left some comments in #36424, please take a look. You can address the comments in this PR.

Comment on lines 27 to 34
if (version == 1) {
/// is_replicated and is_remote are not used
bool is_replicated = true;
writeBoolText(is_replicated, buffer);
buffer.write("\n", 1);
bool is_remote = true;
writeBoolText(is_remote, buffer);
buffer.write("\n", 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to completely remove these flags without changing version. It will break compatibility with 22.6.1, but I think it's not a problem, because 22.6 was just released recently and I doubt that someone has already started to use 22.6.1 with this freeze-metadata-stuff in production. And previous serialization format was invalid anyway (I mean issues with escaping).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, flags were introduced before this PR. See history of removed code for details

writeBoolText(is_remote, buffer);
buffer.write("\n", 1);
}
writeString(escapeForFileName(replica_name), buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check that other strings cannot contain unexpected chars

}
}
if (disk->exists(backup_path))
{
/// After unfreezing we need to clear revision.txt file and empty directories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not clear why we cannot just remove everything recursively. Why do we need complex "unfreeze" logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// After unfreezing we need to clear revision.txt file and empty directories.
/// revision.txt file shouldn't be unfreezed, it should just be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I mean that FREEZE PARTITION only creates hardlinks. So UNFREEZE should only remove hardlinks. It's not clear why do we need some complex logic for "unfreezing", it was supposed to do just rm -rf. I understand that all this stuff is needed for "s3 zero copy replication", but I do not understand how "s3 zero copy replication" (and especially "unfreezing" with "s3 zero copy replication") work because of lack of comments. A comment that describes how all this stuff works is required.

@tavplubix
Copy link
Member

See also #36424 (comment)

@tavplubix
Copy link
Member

Issue with zk session should be fixed as well

@PolyProgrammist
Copy link
Contributor Author

PolyProgrammist commented Jun 24, 2022

Also: It does not work correctly if table was created with old syntax. It's ok to drop support for UNFREEZE in this case, but we should throw an exception at least.

Freeze.cpp: 178

@alexey-milovidov
Copy link
Member

Ordinary databases are deprecated in 22.7.
Is this PR worth merging?

@tavplubix
Copy link
Member

Ordinary databases are deprecated in 22.7. Is this PR worth merging?

Yes, because it also addresses review comments from #36424. Previous PR was merged by mistake and it has to be fixed. We can drop support for Ordinary, but we need the rest fixes.

@PolyProgrammist
Copy link
Contributor Author

@tavplubix please check zookeeper and partition matching. do you mean something like that? 85bf022

@@ -179,8 +184,13 @@ PartitionCommandsResultInfo Unfreezer::unfreezePartitionsFromTableDirectory(Merg
for (auto it = disk->iterateDirectory(table_directory); it->isValid(); it->next())
{
const auto & partition_directory = it->name();

int count_underscores = std::count_if(partition_directory.begin(), partition_directory.end(), []( char c ){return c =='_';});
if ((format_version.has_value() && format_version == 0) || count_underscores == 4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of underscores can be 4 with new syntax as well (if mutation version is not 0)

Copy link
Member

@tavplubix tavplubix Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we don't need to parse partition_id at all if matcher always returns true (when it called from Unfreezer::unfreeze or similar place). Currently it can parse some garbage, but simply does not throw exception in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Parsing partition_id is needed in order to output it. If I remove check with underscores,

  • ALTER TABLE UNFREEZE will delete data only in case if format_version is more than zero.
  • SYSTEM UNFREEZE will delete any data despite of format_version

So not checking this leads to different behaviour of these queries. Am I right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that SYSTEM UNFREEZE prints partition ids in query result, I though ids are needed only for filtering. So currently it prints invalid ids if table was created with old syntax. It's not clear how to fix it, I see the following options:

  • Do not return list of partitions for SYSTEM UNFREEZE
  • Autodetect format_version somehow for SYSTEM UNFREEZE (not sure if it's possible, but maybe checking some files inside frozen data part can help)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think not returning list is easier and for sure possible. But still what about deleting data? Different data will be deleted for alter and system queries if not doing autodetect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if format_version is 0, there will be an exception and since then the partition will not be removed with ALTER query. But as soon as there is no exception when called from Unfreezer::unfreeze, the partition will be removed with SYSTEM query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok, SYSTEM UNFREEZE will simply remove everything, because it's impossible to distinguish partitions in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems like throwing an exception may break some user queries since this logic was introduced for alter unfreeze. May it? Is it critical?

if (!matcher(partition_id))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather mark it as TODO and / or create a new issue about this. If I touch this code, the only solution that doesn't break any query I see is to autodetect format version. Otherwise, it may break query and behave differently for alter and system queries. I would create a separate issue for autodetection as soon as it is not about just SYSTEM UNFREEZE feature. It is about fixing small bug for both ALTER UNFREEZE and SYSTEM UNFREEZE which may introduce bigger bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/Storages/Freeze.cpp Outdated Show resolved Hide resolved
src/Storages/Freeze.h Outdated Show resolved Hide resolved
Copy link
Member

@tavplubix tavplubix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but a few things left:

  • fix tests (see tests/config/install.sh)
  • add a comment describing how all this stuff with hardlinks and unfreezing works

@ianton-ru
Copy link
Contributor

@tavplubix Can you see now? TestsBugfixCheck failed with "nothing to run" for intergation tests - is this mean that integration test required too?

@tavplubix
Copy link
Member

TestsBugfixCheck failed with "nothing to run" for intergation tests - is this mean that integration test required too?

No, it means that TestsBugfixCheck works incorrectly, cc: @vdimir

@vdimir
Copy link
Member

vdimir commented Aug 18, 2022

Notice: bugfix check: stateless - error: fail: 0, passed: 2 Report url: https://s3.amazonaws.com/clickhouse-test-reports/38262/46de6997dd7ef0da596a6c2dfa07197ca7355b65/stateless_tests_bugfix_validate_check.html
Notice: bugfix check: integration - error: Nothing to run Report url: https://s3.amazonaws.com/clickhouse-test-reports/38262/46de6997dd7ef0da596a6c2dfa07197ca7355b65/integration_tests_bugfix_validate_check.html

It expects that added test would fail on the previous stable release (either functional or integration), but it's passed. In some cases, if the test is not stably reproduced, it can be fine.

But "error: Nothing to run" looks confusing, I'll fix the message.

@tavplubix
Copy link
Member

Also it should post check status with link to the report (as other checks do) and actions task should not fail

@vdimir vdimir mentioned this pull request Aug 18, 2022
3 tasks
@tavplubix
Copy link
Member

Integration tests failures look suspicious

@ianton-ru
Copy link
Contributor

Yes, we will fix it.

@tavplubix
Copy link
Member

Integration tests still fail

@tavplubix tavplubix marked this pull request as draft September 5, 2022 12:04
@alexey-milovidov
Copy link
Member

Integration tests have failed.

@tavplubix tavplubix marked this pull request as ready for review September 9, 2022 13:09
src/Storages/Freeze.cpp Outdated Show resolved Hide resolved
@tavplubix tavplubix merged commit 4d146b0 into ClickHouse:master Sep 13, 2022
@alesapin alesapin added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Sep 20, 2022
@alesapin alesapin removed the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Sep 20, 2022
aalexfvk pushed a commit to aalexfvk/ClickHouse that referenced this pull request Jan 20, 2023
…-system-unfreeze

Fix SYSTEM UNFREEZE for ordinary database
alesapin added a commit that referenced this pull request Jan 27, 2023
…tem-unfreeze-for-ordinary-database

Merge pull request #38262 from PolyProgrammist/fix-ordinary-system-un…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants