Atome on real silicon — ESP32-WROOM-32 (proof-of-execution)
The 944K-param Atome model running on a physical ESP32-WROOM-32 (ESP32-D0WD-V3,
4 MB flash, no PSRAM), generating coherent text fully offline at ~1.0 tok/s
(240 MHz core, 80 MHz flash). This is the same engine that passes the host unit tests
and the QEMU parity test — now verified on real hardware.
Honest scope: this is a proof-of-execution + reproducibility artifact, not a
benchmark win or a moat. ~1 tok/s for a sub-1M LM on an MCU is known territory; no
head-to-head vs alternatives (e.g. llama2.c-on-MCU) on the same chip has been run.
Throughput is flash-bound (~270 KB of ternary weights read per token).
Verify it yourself in ~2 minutes (no ESP-IDF needed)
Hardware: any ESP32-WROOM-32 + a data USB cable. Host: pip install esptool.
# 1) flash the single merged image at 0x0
esptool.py --chip esp32 --port /dev/ttyUSB0 --baud 460800 write_flash 0x0 atome_esp32_merged.bin
# 2) watch it generate (115200 baud), press the board's EN button to (re)run
python3 -m serial.tools.miniterm /dev/ttyUSB0 115200Expected: a boot banner showing cpu freq: 240000000 Hz, [state] 159 KB in internal SRAM, then Once -> " upon a time, there" etc. with a tok/s line, then an atome>
prompt you can type into. See evidence/serial_boot_log_esp32_wroom32.txt.
Files
atome_esp32_merged.bin— single image, flash at0x0(easiest).bootloader.bin/partition-table.bin/atome_esp32.bin— individual images
(flash at0x1000/0x8000/0x10000).SHA256SUMS— checksums for all of the above.
Build config (wroom profile)
944K weights, d=256 / L=8 / d_head=64, seq=24 (159 KB state fits the classic-ESP32
~168 KB largest contiguous block), CPU 240 MHz, SPI flash 80 MHz, task watchdog off.
A PSRAM board (WROVER / ESP32-S3-R8) runs the full seq=128 profile — next data point.