Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Keeper crash on shutdown (fix test_keeper_snapshot_on_exit) #44908

Merged
merged 1 commit into from Jan 5, 2023

Conversation

antonio2368
Copy link
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Fix the issue found by test_keeper_snapshot_on_exit after #44881 was merged.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 4, 2023
Copy link
Collaborator

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonio2368 thank you for the fix!

@@ -523,7 +523,7 @@ void KeeperStateMachine::processReadRequest(const KeeperStorage::RequestForSessi
true /*check_acl*/,
true /*is_local*/);
for (const auto & response : responses)
if (!responses_queue.push(response))
if (!responses_queue.push(response) && !responses_queue.isFinished())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to move this check out from loop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question: is it possible that the queue is finished between the loop iterations? If so it's better to add:

if (responses_queue.isFinished())
    break;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address this and the comment above, the biggest problem is the NuRaft.
If I throw an exception from one of its threads, std::abort happens (that's what caused the problem), that lib doesn't handle exceptions nicely as CH. Assume all methods here are called by NuRaft and not CH.

The queue can be finished in between the loop iterations but it can also be finished between the checks:

if finished -> finished is false
  break;
if (!push(...)) -> finished is true, i.e we have same problem

I can add the check for break to not do unnecessary iterations as @novikd suggested, but because I would still need to keep the check after an unsuccessful push I wouldn't change much code-wise.

So I would like to merge this to fix CI ASAP and then polish it.

@@ -248,7 +248,7 @@ nuraft::ptr<nuraft::buffer> KeeperStateMachine::commit(const uint64_t log_idx, n
session_id = storage->getSessionID(session_id_request.session_timeout_ms);
LOG_DEBUG(log, "Session ID response {} with timeout {}", session_id, session_id_request.session_timeout_ms);
response->session_id = session_id;
if (!responses_queue.push(response_for_session))
if (!responses_queue.push(response_for_session) && !responses_queue.isFinished())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to move this check to the beginning of the function?
Also what will be if exception will be thrown if queue is finished? Or at least some log message maybe?
@antonio2368 what do you think?

@@ -523,7 +523,7 @@ void KeeperStateMachine::processReadRequest(const KeeperStorage::RequestForSessi
true /*check_acl*/,
true /*is_local*/);
for (const auto & response : responses)
if (!responses_queue.push(response))
if (!responses_queue.push(response) && !responses_queue.isFinished())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question: is it possible that the queue is finished between the loop iterations? If so it's better to add:

if (responses_queue.isFinished())
    break;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants