Skip to content

feat(docs): ADR on the consensus Write-Ahead Log (WAL)#1408

Merged
romac merged 31 commits intocirclefin:mainfrom
cason:adr-wal
Feb 5, 2026
Merged

feat(docs): ADR on the consensus Write-Ahead Log (WAL)#1408
romac merged 31 commits intocirclefin:mainfrom
cason:adr-wal

Conversation

@cason
Copy link
Copy Markdown
Contributor

@cason cason commented Jan 27, 2026

Closes #469.

Rendered

@github-actions

This comment was marked as off-topic.

@github-actions github-actions Bot added the need-triage This issue needs to be triaged label Jan 27, 2026
@github-actions github-actions Bot closed this Jan 27, 2026
@cason cason reopened this Jan 27, 2026
@cason cason removed the need-triage This issue needs to be triaged label Jan 27, 2026
Comment on lines +71 to +73
So, ideally, it should be the right layer to invoke the WAL primitive for the
logging of inputs and for replaying WAL entries retrieved from persistent
storage upon recovery.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why that is, given all the downsides listed below?

What is the actual upside that you see of making the driver aware of the WAL versus leaving it in core-consensus?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing.

@cason cason marked this pull request as ready for review February 2, 2026 13:09
@cason cason requested a review from ancazamfir as a code owner February 2, 2026 13:09
Copy link
Copy Markdown
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice doc, reads very well!
We should update the other adrs, especially adr-003 which was written more than 2 years ago.

Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
2. When a consensus message is broadcast, via the `PublishConsensusMsg` effect;
3. When a value is finalized by the consensus, via the `Decide` effect.

> FIXME: is there any particular reason for item 1, i.e. starting a round?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall the exact reasoning. StartedRound notifies the host, which might be considered an "output" that could trigger external actions (e.g., proposer starts building a block). Flushing makes sure that the state leading to this round is persisted before the host acts on it. Maybe @romac knows.

Copy link
Copy Markdown
Contributor Author

@cason cason Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is listed here: #469 (comment), so it was a requirement. But it is not documented why.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing makes sure that the state leading to this round is persisted before the host acts on it.

Yes, I think the reasoning was that StartRound is a state machine output and therefore needs to trigger a WAL flush.

Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
Comment thread docs/architecture/adr-006-write-ahead-log.md Outdated
@cason
Copy link
Copy Markdown
Contributor Author

cason commented Feb 3, 2026

We should update the other adrs, especially adr-003 which was written more than 2 years ago.

It is actually 4 months ago and mostly up-to-date. But I agree that ADR 001 is historical right now.

Copy link
Copy Markdown
Contributor

@adizere adizere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Daniel, lgtm!

@ancazamfir
Copy link
Copy Markdown
Contributor

We should update the other adrs, especially adr-003 which was written more than 2 years ago.

It is actually 4 months ago and mostly up-to-date. But I agree that ADR 001 is historical right now.

oh, i meant adr-001 indeed.

@cason
Copy link
Copy Markdown
Contributor Author

cason commented Feb 3, 2026

Can someone make the checks to run? They are always canceled for some reason.

@cason cason requested review from ancazamfir and romac February 4, 2026 08:36
ancazamfir
ancazamfir previously approved these changes Feb 4, 2026
Copy link
Copy Markdown
Contributor

@nenadmilosevic95 nenadmilosevic95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderfull ADR! Great job! Thanks @cason, very well written and extremely useful!!! I left some comments!

Comment thread docs/architecture/readme.md Outdated
| [003](./adr-003-values-propagation.md) | Propagation of Proposed Values | Accepted |
| [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted | No newline at end of file
| [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted |
| [006](./adr-006-write-ahead-log.md) | Consensus Write-Ahead Log (WAL) | Accepted |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number conflicts with PoV ADR, can you use 007 instead?

> process is the locking mechanism present in multiple consensus algorithms.
> In Tendermint, a process that emits a `Precommit` for a value `v` also
> _locks_ `v` at that round.
> The lock is a promise to not accept a value different than `v` in future
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we have L28-30 but I think for the purpose of this example this is precise enough

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> recovered, then the process can receive a proposal for a value `v'` in round
> `r` and misbehave in two ways:
>
> * Amnesia: by "forgetting" about the promise associated to the pre-crash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amnesia can also be when process doesn't re-propose its valid_value this can cause equivocation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are multiple cases, I am just considering the ignoring lock misbehaviour as an example. This block was not in the initial version, then from some comments I realized that the rationale of replaying was not straightforward as I thought. So I am giving here an example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then just write it slightly different, something like "misbehave in different ways:", so it doesn't seem there are only two ways. Anyway, this is just nitpicking so please ignore if not needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>
> * Amnesia: by "forgetting" about the promise associated to the pre-crash
> lock on `v`, accept the proposed value `v' != v`;
> * Equivocation: for accepting `v'`, to emit a `Prevote` for `id(v')` in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor again but to me this is more a double-signing than equivocation

Comment thread docs/architecture/adr-007-write-ahead-log.md Outdated
`process_input` method, which is the same used to process ordinary inputs.
The `Phase::Recovering` flag is actually only used to block the `WalAppend`
effect from appending again to the WAL inputs that are being replayed,
as long as blocking calls to the associated WAL's `flush()` method.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence?

Comment on lines +470 to +477
As a result, replayed inputs produce outputs, `Effect`s in the consensus engine
parlance, in the same way as ordinary inputs, the exception being only the
effects related to persisting inputs to the WAL.
In any case, the existence of the `Phase::Recovering` flag allows filtering out
behaviors, effects, an inputs that are not needed or redundant
in recovery mode - although it should be used very carefully.
Once replaying is done, the `Phase::Recovering` flag is cleared and processed
inputs are appended to the existing WAL, which is not [reset](#reset-1) in this case.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would polish this part a bit

externalized, either to the application (1 and 3) or the network (2).

A current limitation of the persistence approach, however, is the fact that
calls to `wal_append` are also blocking, while they do not have to.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we at the moment we don't do asynchronous writes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but they are blocking.

Comment thread docs/architecture/adr-007-write-ahead-log.md Outdated
| [004](./adr-004-coroutine-effect-system.md) | Coroutine-Based Effect System for Consensus | Accepted |
| [007](./adr-007-write-ahead-log.md) | Consensus Write-Ahead Log (WAL) | Accepted |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing to 007, I wanted to let you use 006 since you are older :)

romac
romac previously approved these changes Feb 4, 2026
Copy link
Copy Markdown
Contributor

@romac romac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work 🎉

@romac romac self-requested a review February 4, 2026 11:07
Co-authored-by: nenadmilosevic95 <nenad.milosevic@circle.com>
@cason cason dismissed stale reviews from romac and ancazamfir via 5dc17ed February 4, 2026 11:21
> recovered, then the process can receive a proposal for a value `v'` in round
> `r` and misbehave in two ways:
>
> * Amnesia: by "forgetting" about the promise associated to the pre-crash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then just write it slightly different, something like "misbehave in different ways:", so it doesn't seem there are only two ways. Anyway, this is just nitpicking so please ignore if not needed

romac
romac previously approved these changes Feb 5, 2026
@romac romac added this pull request to the merge queue Feb 5, 2026
Merged via the queue into circlefin:main with commit 1cab18f Feb 5, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spec: Consensus Write-Ahead Log (WAL)

5 participants