feat(docs): ADR on the consensus Write-Ahead Log (WAL)#1408
feat(docs): ADR on the consensus Write-Ahead Log (WAL)#1408romac merged 31 commits intocirclefin:mainfrom
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
| So, ideally, it should be the right layer to invoke the WAL primitive for the | ||
| logging of inputs and for replaying WAL entries retrieved from persistent | ||
| storage upon recovery. |
There was a problem hiding this comment.
It's not clear to me why that is, given all the downsides listed below?
What is the actual upside that you see of making the driver aware of the WAL versus leaving it in core-consensus?
ancazamfir
left a comment
There was a problem hiding this comment.
Very nice doc, reads very well!
We should update the other adrs, especially adr-003 which was written more than 2 years ago.
| 2. When a consensus message is broadcast, via the `PublishConsensusMsg` effect; | ||
| 3. When a value is finalized by the consensus, via the `Decide` effect. | ||
|
|
||
| > FIXME: is there any particular reason for item 1, i.e. starting a round? |
There was a problem hiding this comment.
I can't recall the exact reasoning. StartedRound notifies the host, which might be considered an "output" that could trigger external actions (e.g., proposer starts building a block). Flushing makes sure that the state leading to this round is persisted before the host acts on it. Maybe @romac knows.
There was a problem hiding this comment.
It is listed here: #469 (comment), so it was a requirement. But it is not documented why.
There was a problem hiding this comment.
Flushing makes sure that the state leading to this round is persisted before the host acts on it.
Yes, I think the reasoning was that StartRound is a state machine output and therefore needs to trigger a WAL flush.
Co-authored-by: Anca Zamfir <ancazamfir@users.noreply.github.com>
It is actually 4 months ago and mostly up-to-date. But I agree that ADR 001 is historical right now. |
oh, i meant adr-001 indeed. |
|
Can someone make the checks to run? They are always canceled for some reason. |
nenadmilosevic95
left a comment
There was a problem hiding this comment.
Wonderfull ADR! Great job! Thanks @cason, very well written and extremely useful!!! I left some comments!
| | [003](./adr-003-values-propagation.md) | Propagation of Proposed Values | Accepted | | ||
| | [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted | No newline at end of file | ||
| | [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted | | ||
| | [006](./adr-006-write-ahead-log.md) | Consensus Write-Ahead Log (WAL) | Accepted | |
There was a problem hiding this comment.
The number conflicts with PoV ADR, can you use 007 instead?
| > process is the locking mechanism present in multiple consensus algorithms. | ||
| > In Tendermint, a process that emits a `Precommit` for a value `v` also | ||
| > _locks_ `v` at that round. | ||
| > The lock is a promise to not accept a value different than `v` in future |
There was a problem hiding this comment.
Unless we have L28-30 but I think for the purpose of this example this is precise enough
| > recovered, then the process can receive a proposal for a value `v'` in round | ||
| > `r` and misbehave in two ways: | ||
| > | ||
| > * Amnesia: by "forgetting" about the promise associated to the pre-crash |
There was a problem hiding this comment.
Amnesia can also be when process doesn't re-propose its valid_value this can cause equivocation.
There was a problem hiding this comment.
Yes, there are multiple cases, I am just considering the ignoring lock misbehaviour as an example. This block was not in the initial version, then from some comments I realized that the rationale of replaying was not straightforward as I thought. So I am giving here an example.
There was a problem hiding this comment.
I would then just write it slightly different, something like "misbehave in different ways:", so it doesn't seem there are only two ways. Anyway, this is just nitpicking so please ignore if not needed
| > | ||
| > * Amnesia: by "forgetting" about the promise associated to the pre-crash | ||
| > lock on `v`, accept the proposed value `v' != v`; | ||
| > * Equivocation: for accepting `v'`, to emit a `Prevote` for `id(v')` in |
There was a problem hiding this comment.
Minor again but to me this is more a double-signing than equivocation
| `process_input` method, which is the same used to process ordinary inputs. | ||
| The `Phase::Recovering` flag is actually only used to block the `WalAppend` | ||
| effect from appending again to the WAL inputs that are being replayed, | ||
| as long as blocking calls to the associated WAL's `flush()` method. |
There was a problem hiding this comment.
I don't understand this sentence?
| As a result, replayed inputs produce outputs, `Effect`s in the consensus engine | ||
| parlance, in the same way as ordinary inputs, the exception being only the | ||
| effects related to persisting inputs to the WAL. | ||
| In any case, the existence of the `Phase::Recovering` flag allows filtering out | ||
| behaviors, effects, an inputs that are not needed or redundant | ||
| in recovery mode - although it should be used very carefully. | ||
| Once replaying is done, the `Phase::Recovering` flag is cleared and processed | ||
| inputs are appended to the existing WAL, which is not [reset](#reset-1) in this case. |
There was a problem hiding this comment.
I would polish this part a bit
| externalized, either to the application (1 and 3) or the network (2). | ||
|
|
||
| A current limitation of the persistence approach, however, is the fact that | ||
| calls to `wal_append` are also blocking, while they do not have to. |
There was a problem hiding this comment.
This means that we at the moment we don't do asynchronous writes?
There was a problem hiding this comment.
We do, but they are blocking.
| | [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted | | ||
| | [007](./adr-007-write-ahead-log.md) | Consensus Write-Ahead Log (WAL) | Accepted | |
There was a problem hiding this comment.
Thanks for changing to 007, I wanted to let you use 006 since you are older :)
Co-authored-by: nenadmilosevic95 <nenad.milosevic@circle.com>
| > recovered, then the process can receive a proposal for a value `v'` in round | ||
| > `r` and misbehave in two ways: | ||
| > | ||
| > * Amnesia: by "forgetting" about the promise associated to the pre-crash |
There was a problem hiding this comment.
I would then just write it slightly different, something like "misbehave in different ways:", so it doesn't seem there are only two ways. Anyway, this is just nitpicking so please ignore if not needed
Closes #469.
Rendered