Skip to content

agent flash: size CMD_FLASH_STREAM sector bitmap from actual num_sectors#108

Merged
widgetii merged 1 commit into
masterfrom
fix/flash-stream-bitmap-dynamic-size
May 16, 2026
Merged

agent flash: size CMD_FLASH_STREAM sector bitmap from actual num_sectors#108
widgetii merged 1 commit into
masterfrom
fix/flash-stream-bitmap-dynamic-size

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

Both sides of CMD_FLASH_STREAM hardcoded the sector bitmap at 32 bytes (= 256 sectors max). Fine for an 8 MiB-or-smaller flash, silently bricking above that. This PR makes the bitmap length follow ceil(num_sectors / 8) on both sides, with regression tests that lock in the bug.

Bugs

Host (src/defib/agent/client.py write_flash)

bitmap = bytearray(32). With a 32 MiB NOR (512 sectors at 64 KiB), the per-sector iteration IndexErrors at s=256. Crucially, this happens after the host has shipped the (truncated) bitmap to the agent and the agent has begun erasing sectors per that bitmap. The host crash leaves the device with whatever partial state the agent reached — in my repro on hi3520dv200 that meant a wiped boot partition (U-Boot gone, device fell back to bootrom fastboot markers on power-on; recovered by re-flashing).

Failure mode in the user-visible log:

30% (9856KB / 32768KB) 18916 B/s
...
IndexError: bytearray index out of range
  File ".../defib/agent/client.py", line 603, in write_flash
    if bitmap[s // 8] & (1 << (s % 8)):

Agent (agent/main.c handle_flash_stream)

if (len < 44) { proto_send_ack(ACK_CRC_ERROR); return; } rejected anything shorter than the old fixed 12-byte header + 32-byte bitmap. Once the host became size-aware (the change above), small-flash writes that legitimately send a 12 + ceil(num_sectors/8)-byte payload (16-byte bitmap for 8 MiB → 28-byte payload) get rejected as malformed:

Flash stream rejected: 0x01
Flash write failed after 3.1s

Replaced with a two-step check: validate header presence first (≥12 B), then once size is known compute the expected bitmap length and check the full payload again.

Wire format compatibility

The agent already indexes bitmap[s / 8] off the wire payload for any s < num_sectors, so the agent's bitmap-consumer logic doesn't change — any host-side length the new code sends works against every existing agent build of this protocol version. The agent change here is purely the length validation.

Regression tests

tests/test_flash_stream_regression.py adds two test classes:

TestWriteFlashBitmapSizing — locks in the host-side IndexError and the fix:

  • test_old_buggy_crashes_at_sector_256 — reproduces the 32-byte-bitmap IndexError at s=256
  • test_old_buggy_crashes_for_32mb_flash — same at the actual 32 MiB / 512 sector case
  • test_fixed_works_at_256_sector_boundary — 257 sectors → 33 B bitmap, all bits set
  • test_fixed_works_for_full_32mb_flash — 512 sectors → 64 B bitmap, all bits set
  • test_fixed_sizes_bitmap_tightly — bitmap is exactly (n + 7) // 8 bytes for n ∈ {1,7,8,9,64,127,128,129,255,256,257,512,1024}
  • test_real_write_flash_constructs_correct_bitmap — end-to-end through FlashAgentClient.write_flash on a 17 MiB blob with data in sectors 0 and 256; intercepts the CMD_FLASH_STREAM payload and asserts the bitmap is 33 bytes with both bits set

TestFlashStreamVariableBitmap — locks in the wire format for the variable-length payload:

  • test_8mb_flash_uses_16_byte_bitmap / test_32mb_flash_uses_64_byte_bitmap — the bitmap-byte math
  • test_payload_size_8mb / test_payload_size_32mb — 28 B and 76 B payload round-trips through COBS framing

Test plan

  • uv run pytest tests/test_flash_stream_regression.py -x -q — 28 passed (was 18 before this PR)
  • uv run pytest tests/ -x -q --ignore=tests/fuzz — 540 passed, 2 skipped
  • uv run ruff check src/defib/agent/client.py tests/test_flash_stream_regression.py — clean
  • uv run mypy src/defib/agent/client.py --ignore-missing-imports — no issues
  • make -C agent test HOST_CC=gcc — 5412/5412 passed
  • make -C agent clean && make -C agent SOC=hi3520dv200 — builds (15009 B)
  • Hardware re-verified end-to-end: defib agent flash -c hi3520dv200 -p /dev/ttyUSB1 -i recovery.bin on real hi3520dv200 / MX25L25635E NOR — 8 MiB image flashed in 543 s, CRC32 verified OK, device rebooted.

🤖 Generated with Claude Code

Both sides of CMD_FLASH_STREAM hardcoded the bitmap at 32 bytes (256
sectors max) — fine for an 8 MiB-or-smaller flash, silently bricking
above that:

  - Host (client.py write_flash): bytearray(32). With a 32 MiB NOR (=
    512 sectors at 64 KiB), the sector loop IndexErrors at s=256 partway
    through the per-sector iteration. By the time the host raises, the
    agent has already started erasing sectors per the (short) bitmap
    it received — the host crash leaves the boot partition wiped (U-Boot
    gone, device falls back to bootrom fastboot markers on power-on).
    Verified bricking hi3520dv200 + recovery on real hardware 2026-05-16.

  - Agent (handle_flash_stream): `if (len < 44)` rejected payloads
    shorter than the old fixed 12 B header + 32 B bitmap. Once the host
    became size-aware (this PR), sub-16 MiB writes that legitimately
    send a 12 + ceil(num_sectors/8) B payload (16-byte bitmap for 8 MiB)
    started failing with ACK_CRC_ERROR / "Flash stream rejected: 0x01".

Wire format stays compatible: the agent already indexes bitmap[s/8] off
the wire payload for any s < num_sectors, so any host-side length the
new code sends works on every existing agent build of this protocol
version. The agent change is purely the length validation.

Regression tests in tests/test_flash_stream_regression.py
(TestWriteFlashBitmapSizing + TestFlashStreamVariableBitmap):

  - test_old_buggy_crashes_at_sector_256        locks in the bug
  - test_old_buggy_crashes_for_32mb_flash       locks in the bug at 32 MiB
  - test_fixed_works_at_256_sector_boundary     +1 sector past old cap
  - test_fixed_works_for_full_32mb_flash        the actual bricking case
  - test_fixed_sizes_bitmap_tightly             ceil(n/8) bytes for 1..1024
  - test_real_write_flash_constructs_correct_bitmap
        end-to-end through FlashAgentClient.write_flash on a 17 MiB blob,
        asserts the wire payload's 33-byte bitmap has bit set for both
        sector 0 and sector 256
  - test_8mb_flash_uses_16_byte_bitmap          payload-size math
  - test_32mb_flash_uses_64_byte_bitmap         payload-size math
  - test_payload_size_8mb                       round-trip 28 B payload
  - test_payload_size_32mb                      round-trip 76 B payload

Agent test_agent.c (5412/5412) still passes — its bitmap tests cover the
COBS framing and bit ordering, not the length check, so no agent C-side
test changes were needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit a7b1edb into master May 16, 2026
13 checks passed
@widgetii widgetii deleted the fix/flash-stream-bitmap-dynamic-size branch May 16, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant