# Computer Architectures Session 1

Single-cycle RISC-V Processor

#### TAs:

Jun Yin (jun.yin@kuleuven.be)
Yuanyang Guo (yuanyang.guo@imec.be)
Xiaoling Yi (xiaoling.yi@kuleuven.be)
Yunzhu Chen (yunzhu.chen@imec.be)

## Objective

- RTL design of a RISC-V microprocessor
  - Simple Implementation
  - Pipelined implementation
  - Data Hazard resolution
  - Advanced Acceleration
    - Branch prediction
    - SIMD
    - •

### **Chapter 4**





### Session requirements and criteria:

- The assignments must be completed in groups of ≤2 people.
- After completing each one of the scenarios:
- 1. You have to copy-paste your RTL source code into the corresponding SOLUTION folder.
- You have to complete part of a small report to record performance readings and answer several related questions (see report.docx).

- This project counts for 4 points in the final course grade.
- 1. The performance check in each design scenario.
- 2. The report.

| ITEM                                                           | Points |
|----------------------------------------------------------------|--------|
| Functional pipelined MULT2                                     | 0.5    |
| Functional pipelined MULT3                                     | 0.5    |
| Functional MULT4                                               | 0.4    |
| Functional MULT4<br>#cycles ≤ baseline <u>impl</u> . (1636 cc) | 0.8    |
| Advanced MULT4<br>#cycles ≤ advanced <u>impl</u> . (828 cc)    | 0.8    |
| Report                                                         | 1.0    |
| Total                                                          | 4.0    |

Project handover

Deadline: May 2nd

## Session requirements and criteria:

- We release the same grading script as final.
  - Command: python3 prep\_submission.py

> Backend
> SIM
> Verilog
prep\_submission.py

> Backend

> CA Documents

∨ CA Exercises

- This script works in standalone folders. But it is always wise to backup your ongoing /RTL codes before execution.
- Do not forget to perform at least one dry-run before the final submission.
  - Make sure you have put the right version of source code into the right folder!

```
    Verilog
    RTL
    RTL_SOLUTION1_simple_program_and_MULT1
    RTL_SOLUTION2_multiplication_support_MULT2
    RTL_SOLUTION3_pipeline_basic_MULT2
    RTL_SOLUTION4_pipeline_hazard_MULT3
    RTL_SOLUTION5_pipeline_hazard_advanced_MULT4
```



```
Checking simpleprogram->simple_program ...
Checking mult1->MULT1 ....
Checking mult2->multiplication support MULT2 ...
Checking mult2->pipeline basic MULT2 ...
Checking mult3->pipeline_hazard_MULT3 ...
Checking mult4->pipeline hazard advanced MULT4 ...
Done
Scoreboard:
       Funtional pipelined MULT2:
                                        0.5 pts (9 cc, 2300 cc, 40 cc, 40 cc)
        Funtional pipelined MULT3:
                                        0.5 pts (25 cc)
        Funtional pipelined MULT4:
                                        1.2 pts (1636 cc)
        Total Score: 2.2 pts
Do not forget to complete the GROUP_X folder before your submission!
```

# prep\_submission.py

- Evaluate your RTL\_SOLUTION(s)
- Basic debug tracing if the program fails
- Create the structure for submission

```
ecking simpleprogram->simple program and MULT1 ...
hecking mult1->simple program and MULT1
hecking mult2->pipeline basic MULT2 ...
 erilog src folder not found. Please check file structures.
 hecking mult3->pipeline hazard MULT3 ...
       rm -rf xcelium.d xrun.* xmverilog.* *.vcd *.shm
xrun +sv -f files verilog grading.f -64bit -timescale lns/10ps -access +rwc -allowredefinition
                              23.03-s002: Started on Jan 31, 2024 at 11:10:57 CET
       T00L: xrun(64)
       xrun(64): 23.03-s002: (c) Copyright 1995-2023 Cadence Design Systems, Inc.
       file: ../Verilog/cpu tb.v
               module worklib.cpu tb:v
                       errors: 0, warnings: 0
       file: ../Verilog/sky130 sram 2rw.v
               module worklib.sky130 sram 2rw 32x128 32:v
                       errors: 0, warnings: 0
               module worklib.sky130 sram 2rw 64x128 64:v
                       errors: 0, warnings: 0
                       Caching library 'worklib' ...... Done
               Elaborating the design hierarchy:
                       Caching library 'worklib' ..... Done
       cpu dut(
       xmelab: *E,CUVMUR (../Verilog/cpu tb.v,67|6): instance 'cpu tb.dut' of design unit 'cp
  is unresolved in 'worklib.cpu tb:v'.
       xrun: *E,ELBERR: Error (*E) or soft error (*SE) occurred during elaboration (status 1)
 exiting.
                               23.03-s002: Exiting on Jan 31, 2024 at 11:10:57 CET (total: 0
       T00L: xrun(64)
ake: *** [sim grading] Error 1
Errors occurred when simulating. Please check your source code structure.
hecking mult4->pipeline hazard advanced MULT4
coreboard:
       Funtional pipelined MULT2:
                                       0.0 pts (9 cc, 2300 cc,
                                                               False cc)
       Funtional pipelined MULT3:
                                       0.0 pts (False cc)
       Funtional pipelined MULT4:
                                       1.0 pts (844 cc)
       Total Score: 1.0 pts
```

Debug tracing example.

### Session requirements and criteria:

• The **report** and your **final processor design** need to be handed in through Toledo, following the below file structure:

```
GROUP_X/

MULT4_content

mult4_imem_content.txt

README.txt
report.docx

RTL_SOLUTIONS

RTL_SOLUTION1_simple_program_and_MULT1

RTL_SOLUTION2_multiplication_support_MULT2

RTL_SOLUTION3_pipeline_basic_MULT2

RTL_SOLUTION4_pipeline_hazard_MULT3

RTL_SOLUTION5_pipeline_hazard_advanced_MULT4
```

 Mult4 session (session3) is special so that additional files are allowed.

- Name your project as **Group\_X** (X is your group number);
- Put all the students' name and students' Rnumber of the group in README.txt;
- 3) You can also put anything that you think we need to know before running your project in the README.txt.
- 4) Pack your group folder into a zip file and submit.
- The **deadline** of the project is **May. 2nd, 2025**.

### Course material

Project resources



#### Extra resources

Computer organization and design. RISC-V edition.

#### See Toledo

RISC-V specification

https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

RISC-V green card

https://dejazzer.com/coen2710/lectures/RISC-V-Reference-Data-Green-Card.pdf

Verilog tutorial

https://www.asic-world.com/verilog/veritut.html

RISC-V machine code generator \*

https://github.com/Kritagya-Agarwal/Assembly-To-Machine-Code-RISC-V

LLMs are also good at explaining these open-sourced "facts" now.
 Feedbacks are welcomed if you have tried out!









### Project resources



### Please refer to the CA\_Documents for more information:

- session\_guide
- Jupyter\_VNC\_Manual
- reading\_backend\_report

Verilog

• RTL

RTL source code

- sky130\_sram\_2rw.v
   Memory behaviour model
- cpu\_tb.v

Testbench (modification not allowed)

- Backend
  - CA\_RISCV.ipynb
     Python file to run backend flow
  - Figs
  - OpenRAM\_output
  - Setup.sh
- prep\_submission.py

• SIM

Simulation related resources

• data

Instruction and data memory sources

testcode\_m

Imem and dmem contents used for different scenarios

dmem\_content.txt

Data memory to be loaded in testbench

• imem\_content.txt

Instruction memory to be loaded in testbench

files\_verilog.f

RTL simulation filelist (file structure needs sync)

Makefile

simulation-related commands

xcelium\_23.03.rc simulator

Python script to dry-run the grading before submission

### Remarks: Keep your project safe!



Please keep all 5 versions of your processor (RTL), we will grade based on them!

We highly suggest you to use git as a versioning tool.

### It helps you to:

- keep track of modifications
- automatically sync the project between group members
- allow TAs to support you more easily

Please have a look at the dedicated guide.

# Today's session: Single-cycle processor



### Basic architecture

- Testbench
  - cpu\_tb.v
- RTL
  - alu\_control.v
  - alu.v
  - branch\_unit.v
  - control\_unit.v
  - cpu.v
  - immediate\_extend\_unit.v
  - mux\_2.v
  - pc.v
  - register\_file.v
  - sram.v
- IP Library
  - sky130\_sram\_2rw.v



### **Simulation Tool**

- Simulation (Cadence Xcelium)
  - Tool to simulate the behaviour of the RTL
- In the ./SIM folder, set up the environment source xcelium\_23.03.rc
- Make simulation

#### make sim

uses cpu\_tb.v, and the content of SIM/data/imem\_content and SIM/data/dmem\_content to verify that the current RTL is able to run the program.

#### make sim\_gui

same but invokes waveform for debugging (interactive)



(module instance names)





make sim\_gui

**Terminate: close all three GUI windows** 



# Today's session: Single-cycle processor



# Obj-1: simple program

Single cycle processor skeleton

Functional single cycle processor

- Functional single-cycle processor
  - Complete the RTL/control\_unit.v
    to support BEQ, JUMP, LD, SD, ADDI and
    R-type ALU instructions (ALU\_R)
  - Follow the RUN CYCLE-ACCURATE SIMULATION in session guide.pdf
  - Run SIMPLE\_PROGRAM
    - Get to folder /SIM
    - Overwrite the instruction & data memory for the testbench to imem and dmem

```
cat data/testcode_m/simpleprogram_imem_content.txt >
data/imem_content.txt

cat data/testcode_m/simpleprogram_dmem_content.txt >
data/dmem_content.txt
```

Run simulation with make sim Or make all

Simple Program

```
addi x8, x0, 7
addi x9, x8, 2
sd x9, 0(x0)
ld x17, 0(x0)
ld x18, 8(x0)
add x19, x17, x18
beq x9, x17, FINAL
add x20, x8, x9
FINAL: add x20, x18, x19
sll x21, x18, x8
STOP
```

STOP is a "fake" instruction used for the cpu\_tb to recognize the end of the program (Check line 305 of cpu\_tb.v)

# Obj-2: mult1

Multiplication realized by addition loops

#### Algorithm 1:

- 1. Acc = 0
- 2. N = operand1
- 3. Traverse each bit of operand2 (LSB -> MSB)
  - a. If the bit of operand2 is 1, accumulate. Acc = Acc + N.
  - b. If the bit is 0, Acc = Acc.
  - c. Left shift N by 1 position.

|   |   | 1 | 1 | 0 |   |
|---|---|---|---|---|---|
|   |   | 1 | 0 | 1 |   |
|   |   | 1 | 1 | 0 | _ |
|   | 0 | 0 | 0 |   |   |
| 1 | 1 | 0 |   |   |   |
| 1 | 1 | 1 | 1 | 0 |   |

### • MULT1

- Multiply adjacent operand in pairs (5 multiplications) using Algorithm 1
- Sums results together
- Run MULT1 with your processor

Single cycle processor skeleton



Functional single cycle processor

```
# init
addi x16, x0, 80
                          x16: Total size of data
addi x8, x0, 0
                           memory
addi x9, x0, 0
                          x8: Data Memory Pointer
addi x10, x0, 1
                          x23: Iteration over the
                           number of bits
# Looping over operands
                           x20: Accumulated result of
M1:ld x17, 0(x8)
                          each multiplication
1d \times 18, 8(\times 8)
                           x9: Final result
addi x23, x0, 64
addi x19, x0, 1
addi x20, x0, 0
add x22, x0, x17
# Multiplication
LOOP_0: and x21, x18, x19
beq x21, x0, SHIFTING_0
add x20, x20, x22
SHIFTING 0: sll x22, x22, x10
sll x19, x19, x10
addi x23, x23, -1
beg x23, x0, M2
jal LOOP 0
# Summing results
M2: add x9, x9, x20
addi x16, x16, -16
addi x8, x8, 16
beg x16, x0, FINISH
jal M1
FINISH:
```

# Obj-3: mult2

 Single cycle processor with multiplication support

> Check if you need to modify the alu\_control.v, alu.v and cpu.v to support the mul instruction

Run MULT2 with your processor

Functional single cycle processor



Single cycle processor with multiplication

MULT2



#### The MUL instruction in RISC-V.

| 000001         | XXXXX       | xxxxx       | 000            | xxxxx     | 0110011      |
|----------------|-------------|-------------|----------------|-----------|--------------|
| [31:25] funct7 | [24:20] rs2 | [19:15] rs1 | [14:12] funct3 | [11:7] rd | [6:0] opcode |

# Obj-3: hints



| 0000001        | XXXXX       | xxxxx       | 000            | xxxxx     | 0110011      |
|----------------|-------------|-------------|----------------|-----------|--------------|
| [31:25] funct7 | [24:20] rs2 | [19:15] rs1 | [14:12] funct3 | [11:7] rd | [6:0] opcode |

- RISC-V MUL operation needs more control bits than provided in alu\_control.v
- Refer to the RISC-V ISA and start from the module IO definition.

**Module Definition** 

**Module Instance** 

• Do not forget to update the module instance (in cpu.v) after any modification on IO definitions.

# Today's session: Single-cycle processor



### Digital Backend Flow



### Backend Flow in this exercise

Three key components:

1. Flow tool: OpenLane

2. PDK library: SKY130

3. Memory macros: OpenRAM

### 1. Automated Backend Flow: OpenLane



OpenLane is an automated RTL to GDSII flow based on several components.

### 2. Related: open-source PDK -- SKY130



PDK: Process Design Kit - information about the process technology

#### PDK includes

- Process documentation
- Device models
- Layout design rules
- Design libraries: Pre-designed circuit blocks, such as standard cells, memories, and IO cells
- ...

### 3. Related: open-source SRAM Compiler: OpenRAM



Create SRAM for ASIC design:

- Layout
- Netlists
- Timing and power models
- Placement and routing models





16 Kb (2 banks x 128 words x 64 bits)

### 3. Related: open-source SRAM Compiler: OpenRAM



Create SRAM for ASIC design:

- Layout
- Netlists
- Timing and power models
- Placement and routing models

In the exercise, the memory macros used by instruction memory and data memory are already generated for you.



### Environment settings + Backend tools installation

### **Backend environment setup:**

- Enter the folder: Backend/
- In the folder, run the command: source setup.sh. This should setup the backend tools and jupyter.
- In the terminal, type the command jupyter-lab. This should introduce you to the website of jupyter notebook.

PS1: Every time you want to use the backend tools and jupyter, please run the command source setup.sh to check the integrity of the toolchain.

PS2: If you want to use jupyter-lab at home, see this manual on how to open jupyter notebook remotely. (CA\_Documents/Jupyter\_VNC\_Manual.pdf)

## Synthesis: a critical step in the backend flow





- Pass the RUN CYCLE-ACCURATE SIMULATION and prepare your source code in RTL\_SOLUTION\* folders.
- Follow the RUN BACKEND FLOW in session\_guide.pdf and the instructions in CA\_RISCV.ipynb
- In **Session 1** we only do the **synthesis stage**.

# Today's session: task summary

### With session\_guide.pdf

- Study the RUN CYCLE-ACCURATE SIMULATION and RUN BACKEND FLOW
- Follow the TASKS TO BE DONE and fill in the report.docx

Copy-paste your finished /RTL/\*.v into the SOLUTION folders.

- Obj-1&2 → RTL\_SOLUTION1\_simple\_program\_and\_MULT1
- Obj-3 → RTL\_SOLUTION2\_multiplication\_support\_MULT2

#### Note:

- 1. We use universal test patterns for fair grading.
- 2. Do not modify cpu\_tb.v & sky130\_sram\_2rw.v
- 3. Do not modify \*mem\_content.txt