Skip to content

Bavan2002/SIMD_CNN_Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CNN Accelerator RTL Implementation

CNN accelerator IP core for Zynq-7000 series FPGAs with AXI4 interface.

Status: SOC Build Complete

14x14 PE Array @ 80 MHz - SOC build complete on Zedboard (2025-12-18)

  • Unit tests: 680+ passing (Verilator + Vivado XSim)
  • Hardware tests: 17/17 passing on Zedboard via JTAG
  • SOC Post-P&R: WNS +1.025ns @ 80 MHz (timing met)
  • Performance: 31.36 GOP/s INT8 (15.68 GMAC/s)
  • Outputs: bitstream (.bit), hardware platform (.xsa), PYNQ handoff (.hwh)

Architecture

Multi-target Support (via rtl/target_config.vh):

Target Array DSP Peak GOP/s
Zybo Z7-10 7x7 49/80 9.8
Zybo Z7-20 10x10 100/220 20.0
Zedboard 14x14 196/220 39.2
ZCU104 24x24 576/1728 230.4

Processing Element Array:

  • Row-stationary dataflow
  • 1 DSP48E1 per PE (single-DSP MAC)
  • 3-cycle pipeline latency

Memory Hierarchy:

  • PE register files: Distributed LUTRAM (14 entries/PE)
  • Weight buffer: 128KB BRAM dual-port
  • Input buffer: 128KB BRAM 4-bank
  • Output buffer: 160KB BRAM 8-bank
  • Instruction memory: 12KB BRAM

Control Path:

  • 32-bit uniform ISA (13 instruction types)
  • 3-level nested hardware loop controller
  • AXI4-Lite register interface (control)
  • AXI4 master DMA engine (DDR3 transfers)

Resource Utilization (Zedboard xc7z020):

  • DSP48E1: 197/220 (90%)
  • LUT: 8,472/53,200 (16%)
  • FF: 34,897/106,400 (33%)
  • BRAM: 37/140 (26%)

Project Structure

rtl/
├── pe/              # processing element array
├── memory/          # global buffers + pe array controller
├── compute/         # activation + pooling units
├── control/         # instruction fetch + decode + execution
├── dma/             # axi4 master dma engine
├── accsys_top.v     # top-level axi ip core
└── target_config.vh # multi-target device configuration

tb/
├── unit/            # unit tests (verilator + xsim)
└── integration/     # integration tests

sw/
├── driver/          # c driver for arm (accsys.h, accsys.c)
├── pynq/            # pynq python driver and demos
├── demo/            # c demo applications
└── test/            # hardware test programs

vivado/
└── scripts/         # tcl synthesis + simulation scripts

docs/
├── specs/           # architecture specifications
├── research/        # architecture research and analysis
└── *.md             # verification, synthesis, known issues

Running Tests

# all unit tests (verilator)
just sim-all

# specific module groups
just sim-pe          # pe array + components
just sim-buffers     # memory buffers
just sim-control     # instruction pipeline
just sim-compute     # activation + pooling
just sim-dma         # dma engine

# integration test (vivado xsim project mode)
orb -s "just sim-vivado-accsys-top"

# view waveforms
just waves-<module>  # e.g., waves-pe-array

Synthesis

# single pe synthesis
orb -s "just synth-pe"

# full system synthesis
orb -s "just synth-accsys-top"

# place and route
orb -s "vivado -mode batch -source scripts/place_route_accsys_top.tcl"

Next Steps

Hardware Deployment (ready now)

  1. Deploy bitstream to Zedboard via JTAG: just program-fpga
  2. Build Vitis test application using .xsa
  3. Run PYNQ demos using .hwh: uv run demo_basic.py

Software Stack (optional)

  • Model compiler (PyTorch/TensorFlow -> instruction streams)
  • Performance profiling and optimization

Scaling (optional)

  • 90 MHz timing closure optimization
  • 16x16 array on xc7z045 (900 DSPs)

Specifications

  • docs/specs/01_architecture_specification.md - hardware architecture
  • docs/specs/02_isa_specification.md - instruction set
  • docs/specs/03_memory_hierarchy_specification.md - memory system
  • docs/specs/04_ps_pl_integration_specification.md - arm-fpga interface
  • docs/specs/05_verification_specification.md - testing strategy
  • docs/specs/06_implementation_roadmap.md - implementation plan

Documentation

  • CHANGELOG.md - detailed development history
  • docs/verification_test_coverage.md - test coverage report
  • docs/hardware_deployment_requirements.md - ps integration guide
  • docs/VIVADO_VERIFICATION.md - synthesis verification checklist

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors