CNN accelerator IP core for Zynq-7000 series FPGAs with AXI4 interface.
14x14 PE Array @ 80 MHz - SOC build complete on Zedboard (2025-12-18)
- Unit tests: 680+ passing (Verilator + Vivado XSim)
- Hardware tests: 17/17 passing on Zedboard via JTAG
- SOC Post-P&R: WNS +1.025ns @ 80 MHz (timing met)
- Performance: 31.36 GOP/s INT8 (15.68 GMAC/s)
- Outputs: bitstream (.bit), hardware platform (.xsa), PYNQ handoff (.hwh)
Multi-target Support (via rtl/target_config.vh):
| Target | Array | DSP | Peak GOP/s |
|---|---|---|---|
| Zybo Z7-10 | 7x7 | 49/80 | 9.8 |
| Zybo Z7-20 | 10x10 | 100/220 | 20.0 |
| Zedboard | 14x14 | 196/220 | 39.2 |
| ZCU104 | 24x24 | 576/1728 | 230.4 |
Processing Element Array:
- Row-stationary dataflow
- 1 DSP48E1 per PE (single-DSP MAC)
- 3-cycle pipeline latency
Memory Hierarchy:
- PE register files: Distributed LUTRAM (14 entries/PE)
- Weight buffer: 128KB BRAM dual-port
- Input buffer: 128KB BRAM 4-bank
- Output buffer: 160KB BRAM 8-bank
- Instruction memory: 12KB BRAM
Control Path:
- 32-bit uniform ISA (13 instruction types)
- 3-level nested hardware loop controller
- AXI4-Lite register interface (control)
- AXI4 master DMA engine (DDR3 transfers)
Resource Utilization (Zedboard xc7z020):
- DSP48E1: 197/220 (90%)
- LUT: 8,472/53,200 (16%)
- FF: 34,897/106,400 (33%)
- BRAM: 37/140 (26%)
rtl/
├── pe/ # processing element array
├── memory/ # global buffers + pe array controller
├── compute/ # activation + pooling units
├── control/ # instruction fetch + decode + execution
├── dma/ # axi4 master dma engine
├── accsys_top.v # top-level axi ip core
└── target_config.vh # multi-target device configuration
tb/
├── unit/ # unit tests (verilator + xsim)
└── integration/ # integration tests
sw/
├── driver/ # c driver for arm (accsys.h, accsys.c)
├── pynq/ # pynq python driver and demos
├── demo/ # c demo applications
└── test/ # hardware test programs
vivado/
└── scripts/ # tcl synthesis + simulation scripts
docs/
├── specs/ # architecture specifications
├── research/ # architecture research and analysis
└── *.md # verification, synthesis, known issues
# all unit tests (verilator)
just sim-all
# specific module groups
just sim-pe # pe array + components
just sim-buffers # memory buffers
just sim-control # instruction pipeline
just sim-compute # activation + pooling
just sim-dma # dma engine
# integration test (vivado xsim project mode)
orb -s "just sim-vivado-accsys-top"
# view waveforms
just waves-<module> # e.g., waves-pe-array# single pe synthesis
orb -s "just synth-pe"
# full system synthesis
orb -s "just synth-accsys-top"
# place and route
orb -s "vivado -mode batch -source scripts/place_route_accsys_top.tcl"Hardware Deployment (ready now)
- Deploy bitstream to Zedboard via JTAG:
just program-fpga - Build Vitis test application using
.xsa - Run PYNQ demos using
.hwh:uv run demo_basic.py
Software Stack (optional)
- Model compiler (PyTorch/TensorFlow -> instruction streams)
- Performance profiling and optimization
Scaling (optional)
- 90 MHz timing closure optimization
- 16x16 array on xc7z045 (900 DSPs)
docs/specs/01_architecture_specification.md- hardware architecturedocs/specs/02_isa_specification.md- instruction setdocs/specs/03_memory_hierarchy_specification.md- memory systemdocs/specs/04_ps_pl_integration_specification.md- arm-fpga interfacedocs/specs/05_verification_specification.md- testing strategydocs/specs/06_implementation_roadmap.md- implementation plan
CHANGELOG.md- detailed development historydocs/verification_test_coverage.md- test coverage reportdocs/hardware_deployment_requirements.md- ps integration guidedocs/VIVADO_VERIFICATION.md- synthesis verification checklist