Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found bugs mainly caused by snapshot #49

Closed
tangruize opened this issue Aug 28, 2021 · 5 comments
Closed

Found bugs mainly caused by snapshot #49

tangruize opened this issue Aug 28, 2021 · 5 comments
Assignees

Comments

@tangruize
Copy link

tangruize commented Aug 28, 2021

Hello! I found some issues that may cause catastrophic consequences. I created a PR #118. Seems that some issues have been fixed. Here are the issues that have not been fixed. They may be bugs or improper use of the library.

  1. raft_recv_appendentries#L746 when ae->prev_log_idx==0, entries will be appended unconditionally even if logs have been compacted, causing the corresponding compacted AE entries to be appended twice.
  2. raft_begin_load_snapshot#L1553 log may mismatch. It is necessary to load the snapshot in this case. It causes the server to lag until the next snapshot is received.
  3. raft_recv_appendentries#L768 log at prev_log_idx may have been compacted. In this case, compacted logs in AE entries should be treated as committed logs and remaining matching logs should be appended to log. This causes performance degradation.
  4. raft_send_appendentries_all#L1303 returns immediately if raft_send_appendentries returns a non-zero value. However, the return value maybe RAFT_ERR_NEEDS_SNAPSHOT. It seems that sending AE to different servers should be irrelevant. Plus 2. mentioned above, it can cause the entire cluster to fail to progress.
  5. raft_recv_appendentries_response#L626 next_idx can be decreased to equal match idx, causing the matched logs to be retransmitted.

I would be happy to help. And I am looking forward to your reply!

@tangruize tangruize changed the title Issues found by TLA+ model checking Fix bugs mainly caused by snapshot Sep 2, 2021
@tangruize tangruize changed the title Fix bugs mainly caused by snapshot Found bugs mainly caused by snapshot Sep 2, 2021
@sjpotter
Copy link
Contributor

trying to understand the bugs before I dive into the PR as its not against our code.

  1. so I'm a bit unsure how "ae->prev_log_idx==0" would happen in practice. I can see it happening in the context of a test case, but I don't see how it can happen during regular usage, except perhaps if the node literally has no entries and that should be correct then?

  2. re

    /* snapshot was unnecessary */
    if (last_included_index < raft_get_current_idx(me_))
        return -1;

I think perhaps it should be raft_get_commit_idx(me_) not current_idx snapshots should be compared to commit idx, not current idx. that might solve the problem?

  1. I agree with this (assuming non leader can snapshot) that it can cause worse performance (though is technically correct), so we should check if our personal snapshot_idx > prev_log_idx, and if so, adjust_prev_log_idx to it. perhaps could even optimize it a bit further, to adjust prev_log_idx to follower's last_commit_idx if its less than it.

  2. I agree with, there's no reason raft_send_appendentries_all() should ever fail except as an assert that kills the leader.

  3. next_idx is no longer decremented, its always set to the current_idx + 1 (or our current_idx, whichever is less) we get from the follower. So I'm unsure this is an issue anymore (except perhaps if we get messages out of order?)

thoughts on my analysis / understanding of the issues as of now? will try to dig into the PR to see if it gives me further insight (problem is, that it seems to cover a lot of issues that make it difficult to tease out the individual issues)

@sjpotter
Copy link
Contributor

to go back to #1. in raft_send_appendentries we do

    raft_index_t next_idx = raft_node_get_next_idx(node);
    if (next_idx > 1)
    {
        raft_entry_t* prev_ety = raft_get_entry_from_idx(me_, next_idx - 1);
        if (!prev_ety)
        {
            ae.prev_log_idx = me->snapshot_last_idx;
            ae.prev_log_term = me->snapshot_last_term;
        }
        else
        {
            ae.prev_log_idx = next_idx - 1;
            ae.prev_log_term = prev_ety->term;
            raft_entry_release(prev_ety);
        }
    }

the only way for prev_log_idx to be 0, if we have literally never sent any entries to it. Basically, I think #1 is not a bug, just if it happens on a non empty log, the user is using the library incorrectly and there's little we can do.

so in summary I just see a simple fix for #4 (ignore error) and #3 perhaps we can optimize cases where follower snapshot. With that said, I'm not sure its really neccessary, what will happen is that we will just return an error, give the leader our current_idx and it will then reply to us with the correct data. I'm unsure there's significant value in optimizing this case, but am willing to be told we should.

@tangruize
Copy link
Author

I think your understanding is correct. I will explain more clearly.

  1. This happens when the follower can take a snapshot and the leader retransmits an AE with ae->prev_log_idx==0. The AE's first log index should be 1 but the follower thinks it is snapshot_last_idx+1. It may not be easy to trigger under normal circumstances, because users often set restrictions to take a snapshot, may be after a period of operation. While an AE with ae->prev_log_idx==0 should appear in the early stage of system startup. But it doesn't mean it won't happen. Since the upstream repo does not set any usage restrictions (e.g. UDP, least snapshot size ...), it is easy to trigger this error in allowed usage. I don’t know if the author has considered this situation, I think at least it should be documented. Here is my fix of this bug: tangruize/willemt-raft@2665f2b
  2. Yes. But I think the author has considered this situation. If compared with raft_get_commit_idx(), according to raft paper, matching log entries should be retained. Maybe the author doesn’t want to take a snapshot of only a part of the log, so he chooses a different way. But the follower' log may not match the leader's (the follower may be leader before). It is possible that the follower will always refuse to take a snapshot until the log length of the snapshot is longer than its own log. If the leader cannot commit more log entries, the entire cluster may not able to make progress. However, there is a possibility mentioned in 4: raft_send_appendentries_all#L1303 returns immediately if raft_send_appendentries returns a non-zero value. The non-zero value may be RAFT_ERR_NEEDS_SNAPSHOT that returned by the follower whose logs do not match. Here is my fix of this bug: tangruize/willemt-raft@37a6564

@tangruize
Copy link
Author

  1. Yes. prev_log_idx can be adjusted to follower's last_commit_idx if its less than it. Here is my fix: tangruize/willemt-raft@2665f2b
  2. This problem is not serious. But combined with another bug mentioned in 2, it can cause the whole cluster unable to make progress.
  3. Yes, this problem may be caused by network out of order or leader retransmission. It is not serious. The worst case is that next_idx is decreased to match_idx, causing already matched log entry to be retransmitted again. The whole progress is monotonic (next_idx won't be decreased to less than match_idx), and will not enter an endless loop. If the problem of 3. mentioned above is fixed, this problem will not appear. The follower will reply false if there is no log entry at prev_log_idx. However, if the entry at prev_log_idx is snapshotted, the follower should reply true. I found that this modification comes from this pr: AE error response not properly handled. willemt/raft#97 . I don’t understand the reason for the modification and think the original code is correct.

@fadidahanna
Copy link
Contributor

@tangruize, in the past ~2y we've made lots of refactoring and stabilization to this repo. In case you think these issues are still relevant please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants