Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lock re-entrancy in process_append_entry_request_as_follower() #43

Merged
merged 1 commit into from Sep 3, 2022

Conversation

penberg
Copy link
Collaborator

@penberg penberg commented Sep 2, 2022

In my project, I have a struct that implements both the Cluster and
StateMachine traits, and, therefore, also share the same Mutex
that's passed to Replica.

However, process_append_entry_request_as_follower is holding on to the
state machine lock while attempting to acquire the cluster lock, which
results in re-entrancy that causes a deadlock (as reported by
no_deadlocks crate):

A reentrance has been attempted, but `std::sync`'s locks are not reentrant. This results in a deadlock. dependence cycle: [Thread(ThreadId(11)), Lock(20)]
Lock taken at:
   0: little_raft::replica::Replica<C,M,T,D>::process_append_entry_request_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:709:33
   1: little_raft::replica::Replica<C,M,T,D>::process_message_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:766:18
   2: little_raft::replica::Replica<C,M,T,D>::process_message
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:352:32
   3: little_raft::replica::Replica<C,M,T,D>::poll_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:332:21
   4: little_raft::replica::Replica<C,M,T,D>::start
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:233:36

Reentrace at:
   0: little_raft::replica::Replica<C,M,T,D>::process_append_entry_request_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:735:27
   1: little_raft::replica::Replica<C,M,T,D>::process_message_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:766:18
   2: little_raft::replica::Replica<C,M,T,D>::process_message
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:352:32
   3: little_raft::replica::Replica<C,M,T,D>::poll_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:332:21
   4: little_raft::replica::Replica<C,M,T,D>::start
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:233:36

Fix the deadlock by moving entry processing to a separate function,
which reduces the scope of the state machine lock so that it is no
longer held when we attempt to acquire the cluster lock.

Fixes #42.

In my project, I have a struct that implements both the `Cluster` and
`StateMachine` traits, and, therefore, also share the same `Mutex`
that's passed to `Replica`.

However, `process_append_entry_request_as_follower` is holding on to the
state machine lock while attempting to acquire the cluster lock, which
results in re-entrancy that causes a deadlock (as reported by
`no_deadlocks` crate):

```
A reentrance has been attempted, but `std::sync`'s locks are not reentrant. This results in a deadlock. dependence cycle: [Thread(ThreadId(11)), Lock(20)]
Lock taken at:
   0: little_raft::replica::Replica<C,M,T,D>::process_append_entry_request_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:709:33
   1: little_raft::replica::Replica<C,M,T,D>::process_message_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:766:18
   2: little_raft::replica::Replica<C,M,T,D>::process_message
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:352:32
   3: little_raft::replica::Replica<C,M,T,D>::poll_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:332:21
   4: little_raft::replica::Replica<C,M,T,D>::start
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:233:36

Reentrace at:
   0: little_raft::replica::Replica<C,M,T,D>::process_append_entry_request_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:735:27
   1: little_raft::replica::Replica<C,M,T,D>::process_message_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:766:18
   2: little_raft::replica::Replica<C,M,T,D>::process_message
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:352:32
   3: little_raft::replica::Replica<C,M,T,D>::poll_as_follower
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:332:21
   4: little_raft::replica::Replica<C,M,T,D>::start
             at /Users/penberg/src/database/little-raft/little_raft/src/replica.rs:233:36
```

Fix the deadlock by moving entry processing to a separate function,
which reduces the scope of the state machine lock so that it is no
longer held when we attempt to acquire the cluster lock.

Fixes andreev-io#42.
penberg added a commit to chiselstrike/chiselstore that referenced this pull request Sep 2, 2022
Let's switch to our own fork of Little Raft until the following deadlock
fix is merged upstream:

andreev-io/little-raft#43
@andreev-io andreev-io merged commit f0c8d4f into andreev-io:master Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock when acquiring cluster lock while appending entry as follower
2 participants