Status
This epic is captured for planning purposes only. No commitment to deliver
any of the sketches below at this time. The work is recognised as
valuable, particularly given SolidSyslog's regulated-industry audience,
but is not currently scheduled. Sketches are described at a level
sufficient to remember the intent and rationale; they will be expanded
into stories when prioritised against other work.
Do not start any implementation work from this epic without explicit
direction.
Context and rationale
SolidSyslog's file store and recovery logic handle scenarios — power loss
mid-write, partial writes, corrupted records, flash block failures — that
are difficult to cover comprehensively with hand-written BDD scenarios.
The state space of "what the on-disk file looks like at the moment
recovery begins" is too large to enumerate by hand, and the bugs that
matter are typically at the intersection of "rare event" and "specific
moment in the code."
Fuzzing and property-based testing provide adversarial coverage: rather
than testing cases we thought of, they search for cases we didn't. This
complements the existing BDD suite — BDD documents intent and provides
readable regression coverage; fuzzing finds the long tail.
For the IEC 62443 positioning specifically, evidence of fuzz testing and
abnormal-condition testing strengthens the compliance story under FR3
(system integrity). Auditors and procurement engineers in regulated
industries increasingly expect to see this in mature embedded libraries.
Treating fuzzing as part of the compliance evidence — not just developer
hygiene — is part of why it earns its keep.
Failure modes addressed
The work is organised around four classes of bug, because different
techniques target different failure modes:
- Parser/format robustness — corrupted record headers, truncated
bodies, bad CRCs, bit flips. Does the store reject these cleanly
without crashing or accepting garbage?
- Recovery state-machine correctness — given an arbitrary file
state at startup, does the store resume consistently with no
duplicates, no losses beyond the documented uncommitted-tail boundary,
and correct sequenceId continuity?
- Crash-consistency — does the on-disk state remain recoverable
across power loss at any point during a write?
- Flash wear/failure semantics — bad blocks, partial-page writes,
blocks that report success but contain garbage. Does the layer above
the flash driver degrade gracefully?
Constraints (when this work happens)
- C99, OO-in-C with vtables and dependency injection, null object
pattern, no dynamic memory.
- Fakes/spies as preferred test doubles, consistent with existing test
architecture.
- CppUTest for unit-level integration; fuzzing harnesses are separate
from the unit test suite.
- All CI work uses GitHub Actions.
- Sanitizers (ASan, UBSan, MSan) are a precondition for fuzzing to be
effective — already supported in the template.
- DEVLOG.md to be updated when this work is eventually picked up.
Candidate sketches
These are sketches, not committed stories. Each becomes a story (added to
the Stories section below) only when scope is clear and the work is
scheduled. The right tools and approaches may have shifted by the time
any of these is picked up — revisit before implementation. New ideas are
added as comments before promotion.
A. libFuzzer harness for the record parser
Build a libFuzzer (or AFL++) harness that takes an arbitrary byte buffer
and feeds it to the file store's record-reading code. Run under ASan +
UBSan. Two harnesses anticipated: a single-record parse harness (asserts
no crash, no UB, no read past buffer) and a whole-store recovery harness
(asserts recovery invariants — no duplicate sequenceIds, no skipped
sequenceIds within the committed region, recovery terminates).
CI integration anticipated: short fuzz session (~60 seconds per harness)
on every PR as a smoke test; longer scheduled run nightly. Corpus stored
as a release artefact or in a separate corpus repo.
Highest-leverage starting point and the lowest-effort to set up. Likely
the first sketch to promote if the epic is scheduled.
B. Property-based crash-consistency tests
Build a property-based test using theft (silentbicycle/theft) or
rapidcheck that generates random sequences of operations (raise, send,
mark-sent, crash-at-point-N) and asserts recovery invariants hold for any
generated sequence and any crash point. The test uses a fake file backend
that can simulate crashes by truncating the file to arbitrary
intermediate states, including flash-realistic partial-write outcomes
(last byte missing, last sector zeroed, last sector containing arbitrary
bytes).
Directly tests the designed recovery strategy and is the test most likely
to find bugs not anticipated by hand-written scenarios. Lives entirely in
user-space with no special tooling required.
C. OSS-Fuzz application
If and when the repository is public, apply for OSS-Fuzz integration.
OSS-Fuzz provides free continuous fuzzing for open-source projects,
generates real bug reports, and is a quality signal that procurement-side
engineers in regulated industries recognise. Builds on the harnesses
from A.
Blocked on the repository visibility decision.
D. Filesystem-level crash testing with dm-flakey
Add a scheduled CI job using Linux dm-flakey to test recovery against
real filesystem crash semantics (ext4 with fsync, FAT, etc.). Catches
bugs that depend on real filesystem reordering and fsync behaviour,
which the in-process fake from B will not surface. Heavier CI setup
(privileged container or self-hosted runner with root); appropriate as
a nightly or weekly job rather than per-PR.
Lower priority than A and B — the fake catches most bugs; this catches
the residual class.
E. LittleFS fault-injection integration tests
When a LittleFS-backed flash port is added (under E18: Flash Storage
Support), build integration tests that exercise SolidSyslog's behaviour
when LittleFS reports errors (bad block detected, write failure, mount
failure on corrupted state). Lean on LittleFS's existing fault-injection
infrastructure where possible — the goal is to test SolidSyslog's
response to filesystem errors, not to re-test LittleFS itself.
Blocked on E18 producing a LittleFS port.
F. Compliance positioning of fuzzing evidence
Document the fuzzing strategy and its outcomes as part of the IEC 62443
evidence pack (overlapping with E19: CRA-Ready Cybersecurity Reporting
and Response). Map fuzz testing and property-based crash-consistency
testing to the relevant FR3 (system integrity) requirements and any
applicable CR/EDR controls. Documentation work, not test work — but it's
the part that converts the technical investment into procurement-relevant
evidence.
Depends on A and B having produced results worth documenting.
Suggested ordering when scheduled
A → B → F → C → D → E. A and B are the technical foundation; F converts
that into compliance evidence; C, D, E are extensions that depend on
external factors (repo visibility, real filesystem testing infrastructure,
LittleFS port existing).
A and B are independently valuable and can be delivered without committing
to the rest. Don't treat the epic as all-or-nothing; pick up individual
sketches as they earn priority.
Epic-level acceptance (when complete)
- Parser and recovery code have continuous fuzzing coverage with
sanitizers, both in CI and (if applicable) via OSS-Fuzz.
- Property-based tests assert recovery invariants across randomised
operation sequences and crash points.
- Crash-consistency has been tested against real filesystem semantics,
not just the in-process fake.
- Flash failure modes are exercised through the LittleFS fault-injection
path.
- The fuzzing strategy is documented as IEC 62443 evidence, mapping
technique to requirement.
Related epics
- E10: Static Analysis and MISRA — different toolset (lint vs
adversarial input); F's compliance documentation should cross-reference
E10's rule coverage table.
- E18: Flash Storage Support — E is downstream; cannot be scheduled
before E18 lands a LittleFS port.
- E19: CRA-Ready Cybersecurity Reporting and Response — F produces
evidence that lives in E19's deliverable.
Stories
None yet. Sketches above are promoted to stories (and added here) when
scheduled.
Status
This epic is captured for planning purposes only. No commitment to deliver
any of the sketches below at this time. The work is recognised as
valuable, particularly given SolidSyslog's regulated-industry audience,
but is not currently scheduled. Sketches are described at a level
sufficient to remember the intent and rationale; they will be expanded
into stories when prioritised against other work.
Do not start any implementation work from this epic without explicit
direction.
Context and rationale
SolidSyslog's file store and recovery logic handle scenarios — power loss
mid-write, partial writes, corrupted records, flash block failures — that
are difficult to cover comprehensively with hand-written BDD scenarios.
The state space of "what the on-disk file looks like at the moment
recovery begins" is too large to enumerate by hand, and the bugs that
matter are typically at the intersection of "rare event" and "specific
moment in the code."
Fuzzing and property-based testing provide adversarial coverage: rather
than testing cases we thought of, they search for cases we didn't. This
complements the existing BDD suite — BDD documents intent and provides
readable regression coverage; fuzzing finds the long tail.
For the IEC 62443 positioning specifically, evidence of fuzz testing and
abnormal-condition testing strengthens the compliance story under FR3
(system integrity). Auditors and procurement engineers in regulated
industries increasingly expect to see this in mature embedded libraries.
Treating fuzzing as part of the compliance evidence — not just developer
hygiene — is part of why it earns its keep.
Failure modes addressed
The work is organised around four classes of bug, because different
techniques target different failure modes:
bodies, bad CRCs, bit flips. Does the store reject these cleanly
without crashing or accepting garbage?
state at startup, does the store resume consistently with no
duplicates, no losses beyond the documented uncommitted-tail boundary,
and correct sequenceId continuity?
across power loss at any point during a write?
blocks that report success but contain garbage. Does the layer above
the flash driver degrade gracefully?
Constraints (when this work happens)
pattern, no dynamic memory.
architecture.
from the unit test suite.
effective — already supported in the template.
Candidate sketches
These are sketches, not committed stories. Each becomes a story (added to
the Stories section below) only when scope is clear and the work is
scheduled. The right tools and approaches may have shifted by the time
any of these is picked up — revisit before implementation. New ideas are
added as comments before promotion.
A. libFuzzer harness for the record parser
Build a libFuzzer (or AFL++) harness that takes an arbitrary byte buffer
and feeds it to the file store's record-reading code. Run under ASan +
UBSan. Two harnesses anticipated: a single-record parse harness (asserts
no crash, no UB, no read past buffer) and a whole-store recovery harness
(asserts recovery invariants — no duplicate sequenceIds, no skipped
sequenceIds within the committed region, recovery terminates).
CI integration anticipated: short fuzz session (~60 seconds per harness)
on every PR as a smoke test; longer scheduled run nightly. Corpus stored
as a release artefact or in a separate corpus repo.
Highest-leverage starting point and the lowest-effort to set up. Likely
the first sketch to promote if the epic is scheduled.
B. Property-based crash-consistency tests
Build a property-based test using theft (silentbicycle/theft) or
rapidcheck that generates random sequences of operations (raise, send,
mark-sent, crash-at-point-N) and asserts recovery invariants hold for any
generated sequence and any crash point. The test uses a fake file backend
that can simulate crashes by truncating the file to arbitrary
intermediate states, including flash-realistic partial-write outcomes
(last byte missing, last sector zeroed, last sector containing arbitrary
bytes).
Directly tests the designed recovery strategy and is the test most likely
to find bugs not anticipated by hand-written scenarios. Lives entirely in
user-space with no special tooling required.
C. OSS-Fuzz application
If and when the repository is public, apply for OSS-Fuzz integration.
OSS-Fuzz provides free continuous fuzzing for open-source projects,
generates real bug reports, and is a quality signal that procurement-side
engineers in regulated industries recognise. Builds on the harnesses
from A.
Blocked on the repository visibility decision.
D. Filesystem-level crash testing with dm-flakey
Add a scheduled CI job using Linux dm-flakey to test recovery against
real filesystem crash semantics (ext4 with fsync, FAT, etc.). Catches
bugs that depend on real filesystem reordering and fsync behaviour,
which the in-process fake from B will not surface. Heavier CI setup
(privileged container or self-hosted runner with root); appropriate as
a nightly or weekly job rather than per-PR.
Lower priority than A and B — the fake catches most bugs; this catches
the residual class.
E. LittleFS fault-injection integration tests
When a LittleFS-backed flash port is added (under E18: Flash Storage
Support), build integration tests that exercise SolidSyslog's behaviour
when LittleFS reports errors (bad block detected, write failure, mount
failure on corrupted state). Lean on LittleFS's existing fault-injection
infrastructure where possible — the goal is to test SolidSyslog's
response to filesystem errors, not to re-test LittleFS itself.
Blocked on E18 producing a LittleFS port.
F. Compliance positioning of fuzzing evidence
Document the fuzzing strategy and its outcomes as part of the IEC 62443
evidence pack (overlapping with E19: CRA-Ready Cybersecurity Reporting
and Response). Map fuzz testing and property-based crash-consistency
testing to the relevant FR3 (system integrity) requirements and any
applicable CR/EDR controls. Documentation work, not test work — but it's
the part that converts the technical investment into procurement-relevant
evidence.
Depends on A and B having produced results worth documenting.
Suggested ordering when scheduled
A → B → F → C → D → E. A and B are the technical foundation; F converts
that into compliance evidence; C, D, E are extensions that depend on
external factors (repo visibility, real filesystem testing infrastructure,
LittleFS port existing).
A and B are independently valuable and can be delivered without committing
to the rest. Don't treat the epic as all-or-nothing; pick up individual
sketches as they earn priority.
Epic-level acceptance (when complete)
sanitizers, both in CI and (if applicable) via OSS-Fuzz.
operation sequences and crash points.
not just the in-process fake.
path.
technique to requirement.
Related epics
adversarial input); F's compliance documentation should cross-reference
E10's rule coverage table.
before E18 lands a LittleFS port.
evidence that lives in E19's deliverable.
Stories
None yet. Sketches above are promoted to stories (and added here) when
scheduled.