# **IOB-CACHE**

User Guide, 0.1, Build f9d8890



May 1, 2022





## **Contents**

| 1  | Intro | oduction                                | 5  |
|----|-------|-----------------------------------------|----|
| 2  | Sym   | bol                                     | 5  |
| 3  | Feat  | rures                                   | 5  |
| 4  | Ben   | efits                                   | 5  |
| 5  | Deli  | verables                                | 6  |
| 6  | Bloc  | k Diagram and Description               | 6  |
| 7  | Inte  | rface Signals                           | 7  |
| 8  | Insta | antiation and External Circuitry        | 7  |
| 9  | Sim   | ulation                                 | 9  |
| 10 | Syn   | thesis                                  | 10 |
|    | 10.1  | Synthesis Macros                        | 10 |
|    | 10.2  | Synthesis Parameters                    | 10 |
|    | 10.3  | Synthesis Script and Timing Constraints | 12 |
| L  | ist d | of Tables                               |    |
|    | 1     | General Interface Signals               | 7  |
|    | 2     | IObundle Master Interface Signals       | 7  |
|    | 3     | Control-Status Interface Signals        | 7  |
|    | 4     | IObundle Slave Interface Signals        | 7  |
|    | 5     | Synthesis Macros                        | 10 |
|    | 6     | Synthesis Parameters                    | 11 |
|    |       |                                         |    |



## **List of Figures**

| 1 | IP Core Symbol.                               | 5 |
|---|-----------------------------------------------|---|
| 2 | High-Level Block Diagram                      | 6 |
| 3 | Core Instance and Required Surrounding Blocks | 8 |
| 4 | Testhench Block Diagram                       | o |



#### 1 Introduction

The IObundle CACHE is an open source pipelined-memory cache. It is a performance-wise and highly configurable IP core. The cache core is isolated from the processor and memory interfaces in order to make it easy to adopt new processors or memory controllers while keeping the core functionality intact. It implements a simple front-end native interface. It also implements an AXI4 interface with configurable data width which allows maximum use of the available memory bandwidth. The IObundle CACHE can be implemented as a Direct-Mapped cache or K-Way Set-Associative cache. It supports both fixed write-through not-allocate policy and write-back policy.

## 2 Symbol



Figure 1: IP Core Symbol.

www.iobundle.com

#### 3 Features

- · AXI4 interface with configurable data width
- · Simple front-end native interface
- · Direct-Mapped or K-Way Set-Associative
- · Fixed write-through not-allocate policy
- · Write-back policy
- Pipelined-memory (1 request/clock-cycle)

#### 4 Benefits

- · Compact and easy to integrate hardware and software implementation
- Can fit many instances in low cost FPGAs and ASICs
- Low power consumption



#### 5 Deliverables

- ASIC or FPGA synthesized netlist or Verilog source code, and respective synthesis and implementation scripts
- ASIC or FPGA verification environment by simulation and emulation
- · Bare-metal software driver and example user software
- User documentation for easy system integration
- Example integration in IOb-SoC (optional)

## 6 Block Diagram and Description

Figure 2 presents a high-level block diagram of the core, followed by a brief description of each block.



Figure 2: High-Level Block Diagram.

FRONT-END Front-end block.

CACHE MEMORY Cache memory block.

BACK-END Back-end block.

**CACHE CONTROL** Cache control block.



## **Interface Signals**

| Name  | Direction | Width | Description                                |
|-------|-----------|-------|--------------------------------------------|
| clk   | INPUT     | 1     | System clock input                         |
| reset | INPUT     | 1     | System reset, asynchronous and active high |

Table 1: General Interface Signals

| Name  | Direction | Width                              | Description                              |
|-------|-----------|------------------------------------|------------------------------------------|
| valid | INPUT     | 1                                  | Native CPU interface valid signal        |
| addr  | INPUT     | CTRL_CACHE + FE_ADDR_W - FE_BYTE_W | Native CPU interface address signal      |
| addr  | INPUT     | CTRL_CACHE + FE_ADDR_W             | Native CPU interface address signal      |
| wdata | INPUT     | FE_DATA_W                          | Native CPU interface data write signal   |
| wstrb | INPUT     | FE_NBYTES                          | Native CPU interface write strobe signal |
| rdata | OUTPUT    | FE_DATA_W                          | Native CPU interface read data signal    |
| ready | OUTPUT    | 1                                  | Native CPU interface ready signal        |

Table 2: IObundle Master Interface Signals

| Name          | Direction | Width | Description                       |
|---------------|-----------|-------|-----------------------------------|
| force_inv_in  | INPUT     | 1     | force 1'b0 if unused              |
| force_inv_out | OUTPUT    | 1     | cache invalidate signal           |
| wtb_empty_in  | INPUT     | 1     | force 1'b1 if unused              |
| wtb_empty_out | OUTPUT    | 1     | write-through buffer empty signal |

Table 3: Control-Status Interface Signals

| Name      | Direction | Width     | Description                              |
|-----------|-----------|-----------|------------------------------------------|
| mem₋valid | OUTPUT    | 1         | Native CPU interface valid signal        |
| mem₋addr  | OUTPUT    | BE_ADDR_W | Native CPU interface address signal      |
| mem₋wdata | OUTPUT    | BE_DATA_W | Native CPU interface data write signal   |
| mem_wstrb | OUTPUT    | BE_NBYTES | Native CPU interface write strobe signal |
| mem₋rdata | INPUT     | BE_DATA_W | Native CPU interface read data signal    |
| mem_ready | INPUT     | 1         | Native CPU interface ready signal        |

Table 4: IObundle Slave Interface Signals

## **Instantiation and External Circuitry**

Figure 4 illustrates how to instantiate the IP core and, if applicable, the required external blocks. A Verilog instantiation template is provided for convenience.

www.iobundle.com

bla bla bla...





Figure 3: Core Instance and Required Surrounding Blocks



#### 9 Simulation

The provided testbench uses the core instance described in Section 8. A high-level block diagram of the testbench is shown in Figure 4. The testbench is organized in a modular fashion, with each test described in a separate file. The test suite consists of all the test case files to make adding, modifying, or removing tests easy.



Figure 4: Testbench Block Diagram

In this preliminary version, simulation is not yet fully functional. The provided testbench merely allows compilation for simulation, and drives the clock and reset signals. Behavioural memory models to allow presynthesis simulation are already included. In the case of ROMs, their programming data is also included in the form of .hex files.



## 10 Synthesis

#### 10.1 Synthesis Macros

The synthesis macros apply to all instances of the core, and are listed in Table 5.

| Parameter | Min | Тур           | Max | Description                                     |
|-----------|-----|---------------|-----|-------------------------------------------------|
| WRITE_POL | ?   | WRITE_THROUGH | ?   | write policy: write-through (0), write-back (1) |

Table 5: Synthesis Macros.

## 10.2 Synthesis Parameters

The generic synthesis parameters of the core are presented in Table 6. Generic parameters can vary from instance to instance.



| Parameter     | Min        | Тур                                   | Max        | Description                                                |
|---------------|------------|---------------------------------------|------------|------------------------------------------------------------|
| FE_ADDR_W     | ر.         | 32                                    | ٥.         | Address width - width of the Master's entire access ad-    |
|               |            |                                       |            | dress (including the LSBs that are discarded, but discard- |
|               |            |                                       | •          |                                                            |
| FE_DATA_W     | <i>ر</i> . | 32                                    | <i>ر</i> . | Data width - word size used for the cache                  |
| N_WAYS        | ٥.         | 2                                     | ٥.         | Number of Cache Ways (Needs to be Potency of 2: 1, 2,      |
|               |            |                                       |            | 4, 8,)                                                     |
| LINE_OFF_W    | <i>د</i> . | 7                                     | <i>د</i> . | Line-Offset Width - 2**NLINE_W total cache lines           |
| WORD_OFF_W    | <i>د</i> . | 3                                     | <i>ر</i> . | Word-Offset Width - 2**OFFSET_W total FE_DATA_W            |
|               |            |                                       |            | words per line - WARNING about LINE2MEM_W (can             |
|               |            |                                       |            | cause word_counter [-1:0]                                  |
| WTBUF_DEPTH_W | <i>د</i> . | 5                                     | ۰.         | Depth Width of Write-Through Buffer                        |
| REP_POLICY    | <i>ر</i> . | PLRU_mru                              | <i>د</i> . | LRU - Least Recently Used; PLRU_mru (1) - mru-based        |
|               |            |                                       |            | pseudoLRU; PLRU_tree (3) - tree-based pseudoLRU            |
| NWAY_W        | <i>ر</i> . | clog2(N_WAYS)                         | <i>د</i> . | Cache Ways Width                                           |
| FE_NBYTES     | ر          | FE_DATA_W/8                           | <i>ر</i> . | Number of Bytes per Word                                   |
| FE_BYTE_W     | ٥.         | clog2(FE_NBYTES)                      | ٥.         | Byte Offset                                                |
| BE_ADDR_W     | ر          | FE_ADDR_W                             | <i>ر</i> . | Address width of the higher hierarchy memory               |
| BE_DATA_W     | <i>د</i> . | FE_DATA_W                             | <i>د</i> . | Data width of the memory                                   |
| BE_NBYTES     | <i>ر</i> . | BE_DATA_W/8                           | <i>ر</i> . | Number of bytes                                            |
| BE_BYTE_W     | <i>د</i> . | clog2(BE_NBYTES)                      | <i>د</i> . | Offset of Number of Bytes                                  |
| LINE2MEM_W    | ٥.         | WORD_OFF_W-clog2(BE_DATA_W/FE_DATA_W) | с·         | Logarithm Ratio between the size of the cache-line and the |
|               |            |                                       | •          | DE'S data widiri                                           |
| CTRL_CACHE    | <i>ر</i> . | 0                                     | <i>ر</i> . | Adds a Controller to the cache, to use functions sent by   |
|               |            |                                       |            | the master or count the hits and misses                    |
| CTRL_CNT      | <i>ر</i> . | 0                                     | ٥.         | Counters for Cache Hits and Misses - Disabling this and    |
|               |            |                                       |            | previous, the Controller only store the buffer states and  |
|               |            |                                       |            | allows cache invalidation                                  |

Table 6: Synthesis Parameters.



#### 10.3 Synthesis Script and Timing Constraints

A simple .tcl script is provided for the Cadence Genus synthesis tool. The script reads the technology files, compiles and elaborates the design, and proceeds to synthesise it. The timing constraints are contained within the constraints file provided, or provided in a separate file.

After synthesis, reports on silicon area usage, power consumption, and timing closure are generated. A post-synthesis Verilog file is created, to be used in post-synthesis simulation.

In this preliminary version, synthesis of the IP core without the memories is functional. The memories are for now treated as black boxes.

It is important not to include the memory models provided in the simulation directory in synthesis, unless they are to be synthesised as logic. In a next version, the memories will be generated for the target technology and included.