Skip to content

fix(firmware): load STACKCHAN.RON via long-filename lookup#577

Merged
andymai merged 1 commit into
mainfrom
fix/sd-config-lfn-read
May 29, 2026
Merged

fix(firmware): load STACKCHAN.RON via long-filename lookup#577
andymai merged 1 commit into
mainfrom
fix/sd-config-lfn-read

Conversation

@andymai
Copy link
Copy Markdown
Owner

@andymai andymai commented May 29, 2026

Summary

  • STACKCHAN is a 9-character base name, which is not a valid FAT 8.3 short name. embedded-sdmmc's open_file_in_dir only accepts 8.3 names, so it rejected "STACKCHAN.RON" with FilenameError(NameTooLong) — which read_config masked as FileNotFound. Boot config silently never loaded, even with the file present on the card (stored as short name STACKC~1.RON + a long-filename entry, as any OS creates it).
  • Resolve the short name by matching the long name via iterate_dir_lfn, then open by that short name. Preserves the documented /sd/STACKCHAN.RON path and existing cards.

Test plan

  • just clippy-firmware (-D warnings) — clean.
  • On-device (CoreS3, real STACKCHAN.RON on the card): flashed --release; boot log went from SD: read /sd/STACKCHAN.RON failed (FileNotFound) to net config: ssid=Interweb country=US hostname=stackchan.local — the 241-byte config now parses and applies. Clean boot, LCD renders.

Known follow-ups (same root cause, out of scope here)

  • Writeback (PUT /settings, atomic STACKCHAN.NEWSTACKCHAN.RON): embedded-sdmmc cannot create long-named files, so writeback to the 9-char-base name is still impossible. Needs a separate decision (8.3 staging name, or rename).
  • WAKE_WORD.tflite has the same long-name (and >3-char extension) issue on the read path.

Greptile: please review.

"STACKCHAN" is a 9-char base name, which is not a valid FAT 8.3 short
name, so embedded-sdmmc's open_file_in_dir rejected "STACKCHAN.RON" with
NameTooLong — masked as FileNotFound — and the boot config never loaded
even though the file was on the card (short name STACKC~1.RON + an LFN
entry). Resolve the short name by matching the long name via
iterate_dir_lfn, then open by that.
@andymai andymai merged commit a7e1176 into main May 29, 2026
11 of 12 checks passed
@andymai andymai deleted the fix/sd-config-lfn-read branch May 29, 2026 18:22
@release-kun release-kun Bot mentioned this pull request May 29, 2026
release-kun Bot added a commit that referenced this pull request May 29, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>stackchan-firmware: 0.113.3</summary>

##
[0.113.3](stackchan-firmware-v0.113.2...stackchan-firmware-v0.113.3)
(2026-05-29)


### Bug Fixes

* **firmware:** load STACKCHAN.RON via long-filename lookup
([#577](#577))
([a7e1176](a7e1176))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: release-kun[bot] <276042328+release-kun[bot]@users.noreply.github.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR fixes the long-standing silent boot failure where STACKCHAN.RON (9-char base name) was rejected by open_file_in_dir with NameTooLong, masked as FileNotFound, causing config to never load even with the file present on the card.

  • Adds iterate_dir_lfn in read_config to match the long filename case-insensitively and retrieve the FAT 8.3 alias (STACKC~1.RON), then opens the file by that alias — correct approach for embedded-sdmmc's API constraints.
  • factory_reset still passes CONFIG_FILE (\"STACKCHAN.RON\") directly to delete_file_in_dir, which hits the same NameTooLong error and propagates StorageError::Write on every call — it needs the same LFN-resolve treatment.

Confidence Score: 3/5

The read-config fix is correct and on-device tested, but factory_reset always reports a write failure due to the same long-filename root cause.

The read_config fix is well-implemented and on-device verified. factory_reset passes the long filename directly to delete_file_in_dir, receives NameTooLong rather than NotFound, and unconditionally propagates StorageError::Write — meaning POST /factory-reset is broken on every call.

crates/stackchan-firmware/src/storage.rs — specifically the factory_reset method's use of CONFIG_FILE in delete_file_in_dir

Important Files Changed

Filename Overview
crates/stackchan-firmware/src/storage.rs Adds LFN directory scan in read_config to resolve STACKCHAN.RON's 9-char base name to its 8.3 alias before opening — correctly fixes the silent FileNotFound on boot. factory_reset still passes CONFIG_FILE directly to delete_file_in_dir, which returns NameTooLong (not NotFound) and always propagates StorageError::Write.

Comments Outside Diff (1)

  1. crates/stackchan-firmware/src/storage.rs, line 766-789 (link)

    P1 factory_reset always returns StorageError::Write due to long name

    delete_file_in_dir(CONFIG_FILE) receives "STACKCHAN.RON", which has the same 9-char base name that open_file_in_dir rejected before this fix. The driver returns NameTooLong before it even looks up the file on disk. Because the guard only suppresses embedded_sdmmc::Error::NotFound, a NameTooLong result sets first_error to Write unconditionally — so every POST /factory-reset call reports failure regardless of card state. factory_reset needs the same LFN-resolve-then-delete-by-short-name treatment applied to read_config in this PR.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/stackchan-firmware/src/storage.rs
    Line: 766-789
    
    Comment:
    **`factory_reset` always returns `StorageError::Write` due to long name**
    
    `delete_file_in_dir(CONFIG_FILE)` receives `"STACKCHAN.RON"`, which has the same 9-char base name that `open_file_in_dir` rejected before this fix. The driver returns `NameTooLong` before it even looks up the file on disk. Because the guard only suppresses `embedded_sdmmc::Error::NotFound`, a `NameTooLong` result sets `first_error` to `Write` unconditionally — so every `POST /factory-reset` call reports failure regardless of card state. `factory_reset` needs the same LFN-resolve-then-delete-by-short-name treatment applied to `read_config` in this PR.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/stackchan-firmware/src/storage.rs:766-789
**`factory_reset` always returns `StorageError::Write` due to long name**

`delete_file_in_dir(CONFIG_FILE)` receives `"STACKCHAN.RON"`, which has the same 9-char base name that `open_file_in_dir` rejected before this fix. The driver returns `NameTooLong` before it even looks up the file on disk. Because the guard only suppresses `embedded_sdmmc::Error::NotFound`, a `NameTooLong` result sets `first_error` to `Write` unconditionally — so every `POST /factory-reset` call reports failure regardless of card state. `factory_reset` needs the same LFN-resolve-then-delete-by-short-name treatment applied to `read_config` in this PR.

Reviews (1): Last reviewed commit: "fix(firmware): load STACKCHAN.RON via lo..." | Re-trigger Greptile

andymai added a commit that referenced this pull request May 29, 2026
…rite (#579)

## Summary

Completes the SD config feature: it could mount (#575) and read (#577)
but not save. `PUT /settings` writeback was broken the same way the read
path was — the staging file `STACKCHAN.NEW` (9-char base) couldn't be
*created* by embedded-sdmmc, and the final copy targeted the
un-creatable `STACKCHAN.RON`.

embedded-sdmmc cannot create long-named files at all, but it **can
overwrite an existing file opened by its short name**. So:
- Stage to a valid 8.3 name (`STKCFG.NEW`) the firmware can create.
- Resolve the existing `STACKCHAN.RON` short name via the shared
`lfn_short_name` helper (factored out of the read path).
- Truncate-write the rendered config onto it — which leaves the
long-name directory entry intact, so the file stays readable as
`STACKCHAN.RON` next boot.

Writeback updates an **existing** config (the realistic lifecycle — the
file must already be present for Wi-Fi/auth to come up). If
`STACKCHAN.RON` is absent, `write_config` returns `FileNotFound` with a
clear log rather than silently writing a differently-named file.

## Test plan

- `just clippy-firmware` (`-D warnings`) — clean.
- **On-device (CoreS3, real `STACKCHAN.RON` on the card):** a temporary
idempotent self-test (rewrite the just-read config, then re-read) —
removed before commit — logged:
  - `dbg writeback: write_config OK`
- `dbg writeback: re-read OK ssid=Interweb hostname=stackchan` —
round-trip intact, long name preserved.

## Surfaced by config now loading (separate, not addressed here)

- `sidecar session: persist failed (Write)` — a different persisted
file; needs its own look.
- `wifi: start_async failed (InternalError(NoMem))` — Wi-Fi now actually
starts (there's an SSID); a pre-existing init path that was never
exercised while config never loaded.

Greptile: please review.
andymai added a commit that referenced this pull request May 29, 2026
… pool sizing (#581)

## Summary

Once a real `STACKCHAN.RON` loaded (after #575/#577), Wi-Fi never came
up — `start_async` failed with `InternalError(NoMem)`. This fixes the
bring-up end to end.

**Root cause (measured on-device):** only **~1.5 KiB** of internal SRAM
was free when the Wi-Fi MAC started. PSRAM (8 MiB free) can't back Wi-Fi
DMA buffers; the bootloader-reclaimed heap is hard-capped at **~72 KiB**
(already full); and the only other internal heap shares the DRAM segment
with the **stack** (growing it overflowed the stack in the BLE+Wi-Fi
coex path).

**Fix 1 — BLE/Wi-Fi mutual exclusion.** With a Wi-Fi SSID configured,
skip BLE bring-up entirely. Frees **~36 KiB** internal (1.5 KiB → 37
KiB) for the Wi-Fi stack and removes coex stack pressure. BLE (Hardware
Buddy) still runs when no SSID is set, so provisioning over BLE is
unaffected.

**Fix 2 — socket pool.** Wi-Fi connecting then exposed a second latent
bug: `STACK_SOCKETS` was 6, predating the 4-worker HTTP pool, so
`smoltcp` panicked (`adding a socket to a full SocketSet`) the instant
the link came up. Sized to real demand (`HTTP_WORKER_COUNT` + mDNS +
SNTP + DHCP/DNS + optional outbound clients).

Also reflows one `write_config` line that merged slightly un-formatted
in #579.

## Test plan

- `just clippy-firmware` (`-D warnings`) — clean.
- **On-device (CoreS3, `STACKCHAN.RON` with `ssid=Interweb`):** flashed
`--release`; defmt log shows:
- `ble: skipped — Wi-Fi SSID configured` and free internal **37 KiB**
before start (was 1.5 KiB).
- `wifi: connecting → wifi: connected`, HTTP workers listening, **DHCP
lease acquired**, and `sntp: synced via pool.ntp.org` — i.e. full
internet reachability.
- **Boots once, no reboot loop;** zero `NoMem`, stack-guard, or
`SocketSet` panics (all three were reproduced before the fix).

## Follow-ups (noted, out of scope)

- `coex` feature is now dead weight when an SSID is set (BLE never
co-runs); dropping it would reclaim more internal RAM and could be
revisited.
- `sidecar session: persist failed (Write)` (separate file) and the
BLE-only path's behavior are unchanged here.

Greptile: please review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant