Skip to content

Krisham17/cache-controller-vhdl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Direct-Mapped Cache Controller — COE758 Tutorial 1 / Project 1

A synthesizable direct-mapped cache controller implemented in VHDL and targeting a Xilinx FPGA. The design sits between a simulated CPU and a simulated SDRAM, managing cache hits, read misses, and dirty-line writebacks through a 7-state FSM. A Xilinx ChipScope ILA and VIO are included for real-time on-board debug.

System Architecture

Three functional blocks are connected through the cache:

Top-level data flow:

                   ADD_in[15:0]  WR_RD_in  cs  Din_0[7:0]
                        │            │      │       │
                        └────────────┴──────┴───────┘
                                          │
              ┌─────────┐          ┌──────▼──────┐          ┌──────────────────┐
              │ CPU_gen │─────────►│    cache    │─────────►│ SDRAM_controller │
              │         │◄─────────│             │◄─────────│                  │
              └─────────┘          └─────────────┘          └──────────────────┘
               RDY, Dout_1[7:0]    ADD_out[15:0]             Din_1[7:0]
                                   WR_RD_out
                                   MSTRB

Inside cache — control path:

ADD_in[15:0]
  │
  ├─ [15:8] tag ──────┐
  ├─ [7:5]  index ───►│  cache_controller
  └─ [4:0]  offset ───│  ┌─────────────┐    ┌───────────┐
                       │  │  cache_reg  │◄──►│    fsm    │
                       │  │ 8×10-bit    │    │ 7-state   │
                       │  │ tag/vld/dty │    │ machine   │
                       └─►└─────────────┘    └───────────┘
                              │                    │
                          stored_tag            DIN_sel, DOUT_sel,
                          valid, dirty          wen, cache_addr,
                                                ADD_out, MSTRB, rdy

Inside cache — data path:

Din_0 (from CPU)  ─┐
                   ├─► mux2to1 ──► cache_sram ──► demux2to1 ─┬─► Dout_0 (to SDRAM)
Din_1 (from SDRAM)─┘  (DIN_sel)   (8-bit wide)   (DOUT_sel) └─► Dout_1 (to CPU)

Module Hierarchy

project1  (top-level)
├── sys_icon     [icon]              — ChipScope ICON (debug only)
├── sys_ila      [ila]               — ChipScope ILA 100-bit capture (debug only)
├── sys_vio      [vio]               — ChipScope VIO 26-bit async output (debug only)
├── Inst_CPU_gen [CPU_gen]           — Behavioral simulated CPU
├── Inst_SDRAM_controller [SDRAM_controller]  — Behavioral simulated SDRAM
└── Inst_cache   [cache]
    ├── mux         [mux2to1]        — Selects write data: CPU (Din_0) or SDRAM (Din_1)
    ├── demux       [demux2to1]      — Routes read data: to SDRAM (Dout_0) or CPU (Dout_1)
    ├── Inst_cache_sram [cache_sram] — SRAM storage for cached data
    └── Inst_controller [cache_controller]
        ├── inst_cacheReg [cache_reg] — 8-entry × 10-bit tag/valid/dirty register file
        └── inst_fsm      [fsm]       — 7-state cache control FSM

Address Decomposition

All 16-bit CPU addresses are split inside cache_controller:

ADD_in[15:0]
  ├── [15:8]  →  tag    (8 bits)  — compared against stored tag in cache_reg
  ├── [7:5]   →  index  (3 bits)  — selects one of 8 cache lines
  └── [4:0]   →  offset (5 bits)  — byte offset within a 32-byte cache line

cache_reg entry (10 bits per line):
  ├── [9:2]  →  stored tag   (8 bits)
  ├── [1]    →  dirty flag
  └── [0]    →  valid flag

A cache hit is detected when: stored_tag == incoming_tag AND valid == 1.

Cache FSM

The fsm entity implements a 7-state Moore/Mealy machine:

          ┌─────────────────────────────────────────────────┐
          │                                                 │
       ┌──▼──┐   cs=1    ┌───────┐                         │
  ──►  │ IDLE├──────────►│ CHECK │                         │
       └─────┘           └───┬───┘                         │
                 hit=1,wr=1  │  hit=1,wr=0                 │
                 ┌───────────┘  └──────────┐               │
                 ▼                         ▼               │
          ┌───────────┐            ┌──────────┐            │
          │ WRITE_HIT │            │ READ_HIT │            │
          └─────┬─────┘            └────┬─────┘            │
                │                       │                  │
                └──────────┬────────────┘                  │
                           │                               │
                           ▼     miss, dirty=0             │
                      ┌────────┐ ─────────────► ┌──────────────┐
                      │ (miss) │                │ MISS_DIRTY_0 │
                      └────────┘ ─────────────► └──────┬───────┘
                                 miss, dirty=1          │
                              ┌──────────────┐          │
                              │ MISS_DIRTY_1 │          │
                              └──────┬───────┘          │
                                     │                  │
                              ┌──────▼───────────────┐  │
                              │   WAIT_FOR_MEMORY     │  │
                              │   (31 clock cycles)   │  │
                              └──────────┬────────────┘  │
                                         └───► CHECK ─────┘
State Code Action
IDLE 000 Assert rdy=1; wait for cs=1
CHECK 001 Compare tag; evaluate hit/dirty
WRITE_HIT 010 Write to SRAM (wen=1); update tag reg; assert DoutSel=1
READ_HIT 011 Read from SRAM; assert DoutSel=0
MISS_DIRTY_0 100 Assert MSTRB for clean miss — fetch from SDRAM
MISS_DIRTY_1 101 Assert MSTRB + WR_RDout — write dirty line back to SDRAM, then fetch
WAIT_FOR_MEMORY Hold for 31 cycles (simulating SDRAM latency), then return to CHECK

Data Routing (MUX / DEMUX)

Two selectors controlled by the FSM route 8-bit data through the SRAM:

  • mux2to1 (DIN_sel): selects what gets written into the SRAM — 0 = CPU write data, 1 = SDRAM read data (cache fill).
  • demux2to1 (DOUT_sel): routes SRAM read data — 0Dout_0 (to SDRAM for writeback), 1Dout_1 (to CPU for read hit).

Key Ports

project1 (top-level)

Port Dir Width Description
clk in 1 System clock
cpu_trig in 1 Triggers CPU_gen to issue a transaction
cpu_rst in 1 Resets CPU_gen
mode in 1 Passed to CPU_gen (purpose not defined in source)
rdy out 1 Cache ready / transaction complete
state out 3 FSM state code (debug)
addr_cpu out 16 CPU-generated address (debug)
din_cpu / dout_cpu out 8 Data to/from CPU (debug)
addr_cache out 16 Cache-to-SDRAM address (debug)
din_cache / dout_cache out 8 Data to/from SDRAM (debug)
memstrb out 1 Memory strobe (debug)

cache (mid-level)

Port Dir Width Description
ADD_in in 16 Address from CPU
WR_RD_in in 1 Write(1) / Read(0) from CPU
cs in 1 Chip select — initiates a transaction
Din_0 in 8 Write data from CPU
Din_1 in 8 Read data from SDRAM (cache fill)
ADD_out out 16 Address forwarded to SDRAM
WR_RD_out out 1 Direction forwarded to SDRAM
MSTRB out 1 Pulse to initiate SDRAM access
Dout_0 out 8 Data to SDRAM (dirty writeback)
Dout_1 out 8 Data to CPU (read result)
RDY out 1 Transaction complete

Files

File Description
project1.vhd Top-level — connects CPU_gen, cache, SDRAM_controller, ChipScope
cache.vhd Cache structural wrapper — instantiates controller, SRAM, mux, demux
cache_controller.vhd Controller — splits address, detects hit, drives all control signals
fsm.vhd 7-state cache FSM
cache_reg.vhd 8-entry × 10-bit tag/valid/dirty register file
cache_sram.vhd SRAM data storage (instantiates Xilinx BRAM or behavioural)
CPU_gen.vhd Behavioural simulated CPU — generates address/data patterns
SDRAM_controller.vhd Behavioural simulated SDRAM
mux2to1.vhd 8-bit 2-to-1 MUX
demux2to1.vhd 8-bit 1-to-2 DEMUX
decoder.vhd Address decoder (utility)
ipcore_dir/bram.vhd Xilinx block RAM IP (auto-generated)
ipcore_dir/ila.vhd ChipScope ILA (auto-generated)
ipcore_dir/icon.vhd ChipScope ICON (auto-generated)
ipcore_dir/vio.vhd ChipScope VIO (auto-generated)

Synthesis Results

Target device: xc3s500e-5-fg320 (Xilinx Spartan-3E)

Resource Used Available Utilization
Slices 439 4,656 9%
Slice Flip-Flops 513 9,312 5%
4-input LUTs 486 9,312 5%
RAMB16s 5 20 25%
  • Synthesis estimate: 89.268 MHz (11.202 ns)
  • Note: No UCF clock period constraint applied — Trace analyzed zero user-logic timing paths.

How to Run

Prerequisites: Xilinx ISE 14.7 (or open .vhd files in Vivado with manual project setup)

1. Open the .xise project file in ISE Design Suite
2. Simulate: right-click testbench → Simulate Behavioral Model (ISim)
3. Synthesize: double-click Synthesize-XST
4. Implement: double-click Implement Design
5. Program: double-click Generate Programming File → iMPACT → program the .bit file
Pin assignments: see the .ucf constraint file in src/

Design Decisions & Tradeoffs

  • Direct-mapped cache (simplest associativity) — fast hit path with no associativity lookup, but susceptible to conflict misses with adversarial access patterns that alias to the same index.
  • 7-state FSM (IDLE → CHECK → WRITE_HIT / READ_HIT / MISS_DIRTY_0 / MISS_DIRTY_1 → WAIT_FOR_MEMORY) — fine-grained control of SDRAM latency states; adding more states later is straightforward.
  • Fixed 31-cycle SDRAM wait modelled as a counter — provides a realistic latency model matching a typical SDRAM CAS/RAS cycle, but the value is hard-coded rather than parameterized.
  • 25% BRAM utilization reflects realistic cache SRAM storage using Xilinx block RAM rather than distributed LUT RAM, giving predictable timing and higher density.
  • No write-allocate on write-miss (inferred from FSM) — reduces write traffic to SDRAM at the cost of future read misses on lines that were written but never subsequently read.

Future Improvements

  • Parameterize cache size, line size, and SDRAM latency via VHDL generics so the design can be reused across different memory hierarchies.
  • Upgrade from direct-mapped to 2-way or 4-way set-associative with LRU replacement to reduce conflict miss rates.
  • Implement write-allocate on write-miss to improve locality for write-heavy workloads.
  • Add a formal UCF clock period constraint and close timing at a target frequency (currently unconstrained).
  • Connect a performance counter to measure and report real-time hit/miss rate via ChipScope VIO.

Skills Demonstrated

VHDL · memory hierarchy · cache controller FSM · direct-mapped cache · SDRAM interface · block RAM · Xilinx ISE · ChipScope ILA/VIO · Spartan-3E FPGA

License

MIT License — see LICENSE for details.

About

Direct-mapped cache controller with 7-state FSM and SDRAM interface in VHDL. Xilinx Spartan-3E.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors