Atome on real silicon — ESP32-WROOM-32 (proof-of-execution)

The 944K-param Atome model running on a physical ESP32-WROOM-32 (ESP32-D0WD-V3,
4 MB flash, no PSRAM), generating coherent text fully offline at ~1.0 tok/s
(240 MHz core, 80 MHz flash). This is the same engine that passes the host unit tests
and the QEMU parity test — now verified on real hardware.

Honest scope: this is a proof-of-execution + reproducibility artifact, not a
benchmark win or a moat. ~1 tok/s for a sub-1M LM on an MCU is known territory; no
head-to-head vs alternatives (e.g. llama2.c-on-MCU) on the same chip has been run.
Throughput is flash-bound (~270 KB of ternary weights read per token).

Verify it yourself in ~2 minutes (no ESP-IDF needed)

Hardware: any ESP32-WROOM-32 + a data USB cable. Host: pip install esptool.

# 1) flash the single merged image at 0x0
esptool.py --chip esp32 --port /dev/ttyUSB0 --baud 460800 write_flash 0x0 atome_esp32_merged.bin

# 2) watch it generate (115200 baud), press the board's EN button to (re)run
python3 -m serial.tools.miniterm /dev/ttyUSB0 115200

Expected: a boot banner showing cpu freq: 240000000 Hz, [state] 159 KB in internal SRAM, then Once -> " upon a time, there" etc. with a tok/s line, then an atome>
prompt you can type into. See evidence/serial_boot_log_esp32_wroom32.txt.

Files

atome_esp32_merged.bin — single image, flash at 0x0 (easiest).
bootloader.bin / partition-table.bin / atome_esp32.bin — individual images
(flash at 0x1000 / 0x8000 / 0x10000).
SHA256SUMS — checksums for all of the above.

Build config (wroom profile)

944K weights, d=256 / L=8 / d_head=64, seq=24 (159 KB state fits the classic-ESP32
~168 KB largest contiguous block), CPU 240 MHz, SPI flash 80 MHz, task watchdog off.
A PSRAM board (WROVER / ESP32-S3-R8) runs the full seq=128 profile — next data point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atome on real silicon — ESP32-WROOM-32

Choose a tag to compare

Sorry, something went wrong.