This project is in active development. The system has been:
- ✅ Designed with correct KV260 memory constraints (single 512KB BRAM buffer)
- ✅ Compiled through brief-compiler to valid SystemVerilog
- ✅ Verified with Verilator lint and simulation
- ❌ NOT YET tested on actual KV260 hardware
Do not deploy to production hardware without physical validation.
The IMP project is a response to the current status quo of AI into subscription-based cloud services. I created a custom language called Brief which allowed me to write the same syntax for both hardware and software, and I realised I could map neural network operations directly to FPGA gate logic. By combining 1.58-bit ternary quantization with Gated Delta Networks, the system enables a 9B parameter model to run on a standard $250 Kria KV260 board. Executing bare-metal on the ARM processor removes the overhead of a traditional operating system, maximizing memory availability and reducing the 19.2 GB/s bandwidth bottleneck.
I am open-sourcing the IMP engine and the Brief compiler under the GPLv2 license to ensure the logic remains a public, reciprocal resource. This architecture significantly lowers the environmental footprint of AI by replacing data-center power requirements with efficient, edge-native silicon logic. The goal is to provide individuals with the hardware design tools and model access usually reserved for large corporations. By moving AI from a rented service to locally-owned hardware, we ensure that the ability to process and generate information remains a permanent utility under the user's direct control.
I am currently actively developing this project, and current known limitations will likely be solved in the near future.
┌─────────────────┐
│ ARM Cortex-A53 │
│ (Brief Kernel) │
└────────┬────────┘
│ AXI4-Lite MMIO
▼
┌─────────────────┐
│ FPGA Neural │
│ Core Engine │
│ (SystemVerilog)│
└────────┬────────┘
│
┌────────▼────────┐
│ DDR4 │
│ (Model Weights) │
└─────────────────┘
Key Features:
- Ternary computation (-1, 0, +1) for 1.58-bit quantization
- Single 512KB BRAM buffer streamed from DDR4
- Bare-metal ARM execution (no OS overhead)
- Brief language compiles to both ARM and FPGA
- KV260 board
- Rust toolchain for ARM (
thumbv7neon-none-eabihf) - Xilinx Vivado (for FPGA synthesis)
- Verilator (for simulation)
# Generate hardware from Brief specification
./brief-compiler verilog neuralcore.ebv --hw hardware.toml -o generated/
# Run Verilator simulation
./run_sim.sh
# View waveforms
gtkwave sim_build/waveform.vcd./build_sdcard.sh /path/to/sdcardThis project builds on established research in binary neural networks and edge AI acceleration:
- Ternary Quantization: {-1, 0, +1} weights enabling 1.58-bit per parameter
- Binary Neural Networks (XNOR-Net): Courbariaux et al., 2016
- Gated Delta Networks: Low-bandwidth recurrent architectures
- Edge AI Compilation: FPGA-accelerated inference on resource-constrained devices
See RESEARCH_BIBLIOGRAPHY.md for full citation list.
imp/
├── neuralcore.ebv # FPGA neural engine (Brief)
├── kernel.ebv # ARM kernel specification (Brief)
├── hardware.toml # KV260 memory map & constraints
├── arm/
│ ├── kernel.rs # ARM bare-metal implementation
│ └── memory.ld # Linker script (DDR4 at 0x0)
├── generated/ # Compiled SystemVerilog
├── run_sim.sh # Verilator simulation
├── build_sdcard.sh # SD card builder
├── SPEC_v0.1.md # Full architecture specification
└── Imp.svg # Project logo
| Document | Purpose |
|---|---|
| SPEC_v0.1.md | Full architecture specification |
| INFERENCE_GUIDE.md | Usage and TCP protocol |
| Build_Workflow.md | Build pipeline |
| FLASH_GUIDE.md | SD card deployment |
| Hardware_Config_Guide.md | hardware.toml reference |
- DMA Transfer: Weights currently stream via AXI4-Lite MMIO (CPU bound). High-speed AXI-DMA (AXI4-Full) is required for full performance.
- Tokenizer: Simple fallback (no BPE vocabulary loaded)
- TCP Stack: Stub (needs LwIP integration)
- FPGA Synthesis: Not yet run through Vivado
- Physical Testing: No hardware validation performed
Copyright (C) 2026 Randy Smits-Schreuder Goedheijt
See LICENSE file for details.
This project was made possible by the foundational research of many intelligent individuals. See RESEARCH_BIBLIOGRAPHY.md for the papers and resources that informed this work.