Skip to content

refactor: consensus state to fix deadlock#491

Merged
shotonoff merged 112 commits intov0.12-devfrom
td-115-fix-consensus-deadlock
Mar 17, 2023
Merged

refactor: consensus state to fix deadlock#491
shotonoff merged 112 commits intov0.12-devfrom
td-115-fix-consensus-deadlock

Conversation

@shotonoff
Copy link
Collaborator

@shotonoff shotonoff commented Oct 21, 2022

Issue being fixed or feature implemented

The consensus state is designed as a component that stores state data and behavior at the same logical component. Since the state shares its data with several goroutines, a mutex must be used to access the data.
Every time the state receives an event, the state locks the data until the operation is completed. This approach has a potential deadlock.
This refactoring is aimed at eliminating the deadlock issue and separating the modification logic across independent handlers (state transitions or state commands)

What was done?

Key design changes:

  1. Reading messages for state update
    Previously, to read the events for state-update operation, we used internal and peer queues (actually each queue is a Go channel). This approach was changed by using the “Fan In” concurrency pattern. That means, we read the messages from both channels concurrently and put them into the output queue, which is available for a read in state.
  • chanQueue[T any] is a generic structure for a Go channel
  • chanMsgSender is a message sender via peer or internal queues. The message is routed to the proper channel by peerID. If peerID is provided, then will be used peer queue otherwise internal
  • chanMsgReader[T any] is a generic message reader. The reader uses “Fan In” pattern. This component runs reading messages concurrently, each queue reads messages in a goroutine and shifts it into an "output" channel that is available for reading in a state.
  • msgInfoQueue combines reader and sender components together and provides high level methods (like “send” of “read”) to interact with the internal components.
  1. Introduced a new component msgInfoDispatcher
    Each received message is dispatched to a proper handler based on the message type.
    The dispatcher has handlers for a following messages:
  • ProposalMessage
  • BlockPartMessage
  • VoteMessage
  • CommitMessage

Dispatcher is responsible for routing and executing message handlers.

  1. New approach to state data management
    To allow application state data to be independent of state implementation, introduced a few new units:
  • AppStateStore is used to store and load data. All operations are safe and use mutex.
  • AppState is a copy of stored data in AppStateStore. By analogy with initial implementation, AppState has a copy of consensus.RoundState and state.State data. Along with providing a copy of data, AppState also has some methods (or behavior) like updateToState or isValidator etc. Another key point is the version field which is the version number of read data. AppStateStore expects that AppState.version matches with the same number. Once the data updates, AppStateStore increments version number, and the next load of AppState will have that version. Since the application can update AppState concurrently, It is necessary to allow updating only the correct AppState version.
  1. State Machine
    The biggest part of this refactoring is the introduction of a new state machine component.
    This state machine represents of
  • FMS is a registry of all possible state commands (or state transitions) and dispatcher of an event to a correct state handler. It is used to registry a command by an event type and execute the command for with an event (StateEvent)
    StateEvent is used to execute a state transition. It consists of event type, copy of AppState, and the arbitrary data.

  • StateEvent provides event data with a pointer to StateData, FMS (to be able to dispatch event to the next state) and Data.

  • CommandHandler is an interface that must be implemented by a state transition. To execute state transition you must provide StateEvent (desc. below)
    Supported state transitions:

    • EnterNewRound
    • EnterPropose
    • SetProposal
    • DecideProposal
    • AddProposalBlockPart
    • ProposalCompletedType
    • DoPrevote
    • TryAddVote
    • EnterCommit
    • EnterPrevote
    • EnterPrecommit
    • TryAddCommit
    • AddCommit
    • ApplyCommit
    • TryFinalizeCommit
    • EnterPrevoteWait
    • EnterPrecommitWait
  • VoteSigner contains methods to sign a vote and sign and add the vote.

  • Observer is a simple implementation of the Observer design pattern. It uses to inform subscribers or state dependent components, about updating fields at state.
    Supported subscriptions

    • SetProposedAppVersion
    • SetPrivValidator
    • SetTimeoutTicker
    • SetMetrics
    • SetReplayMode
  • blockExecutor is a wrapper over state.BlockExecutor to provide some functionality to guarantee that prepare or process proposal calls only once.

How it works

  1. All components described above are initialized in the NewState function.
  2. After a successful catchup operation, the state runs a message-reading goroutine to update the state. Instead of using a mutex, after each received message, the state gets AppState from AppStateStore, and handles the message with that AppState. AppStateStore.Get returns a copy of data, but the AppState is passed as a pointer to a dispatcher and commands. This usage of a pointer is safe and necessary to mutate data using the same code as we have currently.
  3. Messages received from the msgInfoQueue are dispatched to a message-handler. The dispatcher routes the message based on a message type. The handler execute the state transition.
  4. After a successful message execution, AppState should be updated in AppStateStore. Also, for some cases AppState can be updated inside the state-transition handler, it is required for unit-tests which are checking the AppState data on an event (need to eliminate it and get data for checking from an event).

So the mutex is only used to get a copy of the AppState and update the current version AppState on the new one.
AppState is changed by state commands (state transition), so as consuming messages pass through Go channels, it happens synchronously.

How Has This Been Tested?

Unit / E2E tests
Need to add unit tests for the new components

Breaking Changes

The consensus protocol hasn't changed, only how the code is executed has changed.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Nov 14, 2022
@github-actions github-actions bot removed the Stale label Nov 15, 2022
@github-actions
Copy link

github-actions bot commented Dec 6, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Dec 6, 2022
@github-actions github-actions bot removed the Stale label Dec 7, 2022
shotonoff and others added 6 commits February 27, 2023 14:34
* refactor: set and decide proposal operations move to the separated component proposaler
* refactor: separate a gossip logic on implementation and a standard way of iterative handling a gossip condition

* chore: remove unless commented line

* refactor: remove custom "peerSyncErr" error

* refactor: add Gossiper interface, generate Gossiper mock struct, add unit test to queryMaj23GossipHandler

* refactor: revert changes in types.MakeBlock

* refactor: remove unused methods/functions, change a way how a vote should be picked for a gossip

* refactor: peerGossipWorker implements service.Service

* refactor: add synchronization for blockExecutor.committedState
@shotonoff shotonoff changed the base branch from v0.10-dev to v0.11-dev March 2, 2023 13:15
shotonoff and others added 6 commits March 2, 2023 15:12
* refactor: blocksync module is refactored on using worker-pool

* test: add unit test for worker-pool

* test: replace custom assertError on tmrequire.Error

* refactor: add Now function into flowrate package, to obtain the current time

* fix: TestHandshakeErrorsIfAppReturnsWrongAppHash in replay_test.go

* refactor: move the logic wait-for blockchain sync from reactor to BlockPool

* fix: TestReactor_SyncTime

* test: change timing for test TestBlockPoolBasic

* chore: replace clock on clockwork

* refactor: replace reactor's consuming on separate consumer function in Channel, rename SetPeerRange on AddPeer

* test: add TestConsumeError

* chore: remove unused mustHexToBytes

* chore: add examples in promise/example_test.go and docs comments

* chore: change a package for MockValidatorSet from types on factory

* Update internal/blocksync/channel.go
@shotonoff shotonoff changed the base branch from v0.11-dev to v0.12-dev March 14, 2023 13:43
shotonoff and others added 2 commits March 14, 2023 14:44
* refactor: add internal/p2p/client implementation
* refactor: change blocksync module to support p2p client
lklimek
lklimek previously approved these changes Mar 15, 2023
@shotonoff shotonoff changed the base branch from v0.12-dev to v0.11-dev March 15, 2023 15:10
@shotonoff shotonoff dismissed lklimek’s stale review March 15, 2023 15:10

The base branch was changed.

@shotonoff shotonoff changed the base branch from v0.11-dev to v0.12-dev March 15, 2023 15:11
shotonoff and others added 2 commits March 16, 2023 10:51
* test: fix TestMakeHTTPDialerURL and rename on TestDialParamsFromURL
lklimek
lklimek previously approved these changes Mar 16, 2023
* refactor: introduce p2p client in mempool module
@shotonoff shotonoff requested a review from lklimek March 17, 2023 10:14
@shotonoff shotonoff merged commit 84b4a09 into v0.12-dev Mar 17, 2023
@shotonoff shotonoff deleted the td-115-fix-consensus-deadlock branch March 17, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants