A synthesizable direct-mapped cache controller implemented in VHDL and targeting a Xilinx FPGA. The design sits between a simulated CPU and a simulated SDRAM, managing cache hits, read misses, and dirty-line writebacks through a 7-state FSM. A Xilinx ChipScope ILA and VIO are included for real-time on-board debug.
Three functional blocks are connected through the cache:
Top-level data flow:
ADD_in[15:0] WR_RD_in cs Din_0[7:0]
│ │ │ │
└────────────┴──────┴───────┘
│
┌─────────┐ ┌──────▼──────┐ ┌──────────────────┐
│ CPU_gen │─────────►│ cache │─────────►│ SDRAM_controller │
│ │◄─────────│ │◄─────────│ │
└─────────┘ └─────────────┘ └──────────────────┘
RDY, Dout_1[7:0] ADD_out[15:0] Din_1[7:0]
WR_RD_out
MSTRB
Inside cache — control path:
ADD_in[15:0]
│
├─ [15:8] tag ──────┐
├─ [7:5] index ───►│ cache_controller
└─ [4:0] offset ───│ ┌─────────────┐ ┌───────────┐
│ │ cache_reg │◄──►│ fsm │
│ │ 8×10-bit │ │ 7-state │
│ │ tag/vld/dty │ │ machine │
└─►└─────────────┘ └───────────┘
│ │
stored_tag DIN_sel, DOUT_sel,
valid, dirty wen, cache_addr,
ADD_out, MSTRB, rdy
Inside cache — data path:
Din_0 (from CPU) ─┐
├─► mux2to1 ──► cache_sram ──► demux2to1 ─┬─► Dout_0 (to SDRAM)
Din_1 (from SDRAM)─┘ (DIN_sel) (8-bit wide) (DOUT_sel) └─► Dout_1 (to CPU)
project1 (top-level)
├── sys_icon [icon] — ChipScope ICON (debug only)
├── sys_ila [ila] — ChipScope ILA 100-bit capture (debug only)
├── sys_vio [vio] — ChipScope VIO 26-bit async output (debug only)
├── Inst_CPU_gen [CPU_gen] — Behavioral simulated CPU
├── Inst_SDRAM_controller [SDRAM_controller] — Behavioral simulated SDRAM
└── Inst_cache [cache]
├── mux [mux2to1] — Selects write data: CPU (Din_0) or SDRAM (Din_1)
├── demux [demux2to1] — Routes read data: to SDRAM (Dout_0) or CPU (Dout_1)
├── Inst_cache_sram [cache_sram] — SRAM storage for cached data
└── Inst_controller [cache_controller]
├── inst_cacheReg [cache_reg] — 8-entry × 10-bit tag/valid/dirty register file
└── inst_fsm [fsm] — 7-state cache control FSM
All 16-bit CPU addresses are split inside cache_controller:
ADD_in[15:0]
├── [15:8] → tag (8 bits) — compared against stored tag in cache_reg
├── [7:5] → index (3 bits) — selects one of 8 cache lines
└── [4:0] → offset (5 bits) — byte offset within a 32-byte cache line
cache_reg entry (10 bits per line):
├── [9:2] → stored tag (8 bits)
├── [1] → dirty flag
└── [0] → valid flag
A cache hit is detected when: stored_tag == incoming_tag AND valid == 1.
The fsm entity implements a 7-state Moore/Mealy machine:
┌─────────────────────────────────────────────────┐
│ │
┌──▼──┐ cs=1 ┌───────┐ │
──► │ IDLE├──────────►│ CHECK │ │
└─────┘ └───┬───┘ │
hit=1,wr=1 │ hit=1,wr=0 │
┌───────────┘ └──────────┐ │
▼ ▼ │
┌───────────┐ ┌──────────┐ │
│ WRITE_HIT │ │ READ_HIT │ │
└─────┬─────┘ └────┬─────┘ │
│ │ │
└──────────┬────────────┘ │
│ │
▼ miss, dirty=0 │
┌────────┐ ─────────────► ┌──────────────┐
│ (miss) │ │ MISS_DIRTY_0 │
└────────┘ ─────────────► └──────┬───────┘
miss, dirty=1 │
┌──────────────┐ │
│ MISS_DIRTY_1 │ │
└──────┬───────┘ │
│ │
┌──────▼───────────────┐ │
│ WAIT_FOR_MEMORY │ │
│ (31 clock cycles) │ │
└──────────┬────────────┘ │
└───► CHECK ─────┘
| State | Code | Action |
|---|---|---|
| IDLE | 000 | Assert rdy=1; wait for cs=1 |
| CHECK | 001 | Compare tag; evaluate hit/dirty |
| WRITE_HIT | 010 | Write to SRAM (wen=1); update tag reg; assert DoutSel=1 |
| READ_HIT | 011 | Read from SRAM; assert DoutSel=0 |
| MISS_DIRTY_0 | 100 | Assert MSTRB for clean miss — fetch from SDRAM |
| MISS_DIRTY_1 | 101 | Assert MSTRB + WR_RDout — write dirty line back to SDRAM, then fetch |
| WAIT_FOR_MEMORY | — | Hold for 31 cycles (simulating SDRAM latency), then return to CHECK |
Two selectors controlled by the FSM route 8-bit data through the SRAM:
mux2to1(DIN_sel): selects what gets written into the SRAM —0= CPU write data,1= SDRAM read data (cache fill).demux2to1(DOUT_sel): routes SRAM read data —0→Dout_0(to SDRAM for writeback),1→Dout_1(to CPU for read hit).
| Port | Dir | Width | Description |
|---|---|---|---|
clk |
in | 1 | System clock |
cpu_trig |
in | 1 | Triggers CPU_gen to issue a transaction |
cpu_rst |
in | 1 | Resets CPU_gen |
mode |
in | 1 | Passed to CPU_gen (purpose not defined in source) |
rdy |
out | 1 | Cache ready / transaction complete |
state |
out | 3 | FSM state code (debug) |
addr_cpu |
out | 16 | CPU-generated address (debug) |
din_cpu / dout_cpu |
out | 8 | Data to/from CPU (debug) |
addr_cache |
out | 16 | Cache-to-SDRAM address (debug) |
din_cache / dout_cache |
out | 8 | Data to/from SDRAM (debug) |
memstrb |
out | 1 | Memory strobe (debug) |
| Port | Dir | Width | Description |
|---|---|---|---|
ADD_in |
in | 16 | Address from CPU |
WR_RD_in |
in | 1 | Write(1) / Read(0) from CPU |
cs |
in | 1 | Chip select — initiates a transaction |
Din_0 |
in | 8 | Write data from CPU |
Din_1 |
in | 8 | Read data from SDRAM (cache fill) |
ADD_out |
out | 16 | Address forwarded to SDRAM |
WR_RD_out |
out | 1 | Direction forwarded to SDRAM |
MSTRB |
out | 1 | Pulse to initiate SDRAM access |
Dout_0 |
out | 8 | Data to SDRAM (dirty writeback) |
Dout_1 |
out | 8 | Data to CPU (read result) |
RDY |
out | 1 | Transaction complete |
| File | Description |
|---|---|
project1.vhd |
Top-level — connects CPU_gen, cache, SDRAM_controller, ChipScope |
cache.vhd |
Cache structural wrapper — instantiates controller, SRAM, mux, demux |
cache_controller.vhd |
Controller — splits address, detects hit, drives all control signals |
fsm.vhd |
7-state cache FSM |
cache_reg.vhd |
8-entry × 10-bit tag/valid/dirty register file |
cache_sram.vhd |
SRAM data storage (instantiates Xilinx BRAM or behavioural) |
CPU_gen.vhd |
Behavioural simulated CPU — generates address/data patterns |
SDRAM_controller.vhd |
Behavioural simulated SDRAM |
mux2to1.vhd |
8-bit 2-to-1 MUX |
demux2to1.vhd |
8-bit 1-to-2 DEMUX |
decoder.vhd |
Address decoder (utility) |
ipcore_dir/bram.vhd |
Xilinx block RAM IP (auto-generated) |
ipcore_dir/ila.vhd |
ChipScope ILA (auto-generated) |
ipcore_dir/icon.vhd |
ChipScope ICON (auto-generated) |
ipcore_dir/vio.vhd |
ChipScope VIO (auto-generated) |
Target device: xc3s500e-5-fg320 (Xilinx Spartan-3E)
| Resource | Used | Available | Utilization |
|---|---|---|---|
| Slices | 439 | 4,656 | 9% |
| Slice Flip-Flops | 513 | 9,312 | 5% |
| 4-input LUTs | 486 | 9,312 | 5% |
| RAMB16s | 5 | 20 | 25% |
- Synthesis estimate: 89.268 MHz (11.202 ns)
- Note: No UCF clock period constraint applied — Trace analyzed zero user-logic timing paths.
Prerequisites: Xilinx ISE 14.7 (or open .vhd files in Vivado with manual project setup)
1. Open the .xise project file in ISE Design Suite
2. Simulate: right-click testbench → Simulate Behavioral Model (ISim)
3. Synthesize: double-click Synthesize-XST
4. Implement: double-click Implement Design
5. Program: double-click Generate Programming File → iMPACT → program the .bit file
Pin assignments: see the .ucf constraint file in src/
- Direct-mapped cache (simplest associativity) — fast hit path with no associativity lookup, but susceptible to conflict misses with adversarial access patterns that alias to the same index.
- 7-state FSM (IDLE → CHECK → WRITE_HIT / READ_HIT / MISS_DIRTY_0 / MISS_DIRTY_1 → WAIT_FOR_MEMORY) — fine-grained control of SDRAM latency states; adding more states later is straightforward.
- Fixed 31-cycle SDRAM wait modelled as a counter — provides a realistic latency model matching a typical SDRAM CAS/RAS cycle, but the value is hard-coded rather than parameterized.
- 25% BRAM utilization reflects realistic cache SRAM storage using Xilinx block RAM rather than distributed LUT RAM, giving predictable timing and higher density.
- No write-allocate on write-miss (inferred from FSM) — reduces write traffic to SDRAM at the cost of future read misses on lines that were written but never subsequently read.
- Parameterize cache size, line size, and SDRAM latency via VHDL generics so the design can be reused across different memory hierarchies.
- Upgrade from direct-mapped to 2-way or 4-way set-associative with LRU replacement to reduce conflict miss rates.
- Implement write-allocate on write-miss to improve locality for write-heavy workloads.
- Add a formal UCF clock period constraint and close timing at a target frequency (currently unconstrained).
- Connect a performance counter to measure and report real-time hit/miss rate via ChipScope VIO.
VHDL · memory hierarchy · cache controller FSM · direct-mapped cache · SDRAM interface · block RAM · Xilinx ISE · ChipScope ILA/VIO · Spartan-3E FPGA
MIT License — see LICENSE for details.