This project implements a 32-bit RISC-V RV32I pipelined processor in Verilog, targeting simulation in ModelSim. It is a classic 5-stage in-order pipeline with forwarding and hazard detection.
The design is split into the following source files:
| File | Description |
|---|---|
dut.v |
Top-level pipeline datapath — connects all stages |
control.v |
Main decoder — generates control signals from opcode |
alu.v |
32-bit ALU with 10 operations |
alu_control.v |
ALU decoder — maps alu_op + funct3 + funct7 to alu_ctrl |
imm_gen.v |
Immediate generator — sign-extends all 5 encoding types |
register_bank.v |
32x32 register file, write on negedge, read combinational |
data_memory.v |
Byte-addressed data memory with funct3-controlled access width |
forwarding_unit.v |
EX-EX and MEM-EX forwarding |
hazard_detection.v |
Load-use hazard detection and stall generation |
PC.v |
Program counter register |
instruction_memory.v |
Instruction memory, loaded from .mem file |
The processor is instantiated inside top.sv which provides clock, reset, and memory file loading for simulation.
The pipeline follows the classic 5-stage RISC-V structure. Each stage is separated by a set of pipeline registers that latch values on the rising clock edge.
IF → IF/ID → ID → ID/EX → EX → EX/MEM → MEM → MEM/WB → WB
The PC register outputs the current instruction address. The instruction memory reads the instruction at that address combinationally. On each rising clock edge, the PC advances to mux_to_pc, which selects between PC+4, a branch target, or a jump target.
The IF/ID pipeline register latches the instruction and PC address. It is frozen when a load-use stall is detected (if_id_stall) and flushed when a branch is taken or a jump resolves (pipeline_flush).
Signals latched into IF/ID:
if_id_instruction— raw 32-bit instruction wordif_id_pc_addr— address of the fetched instruction
The instruction word is decoded by the control unit using the opcode field ([6:0]). The register file is read using rs1 and rs2 fields from the instruction. The immediate generator sign-extends the immediate value according to the instruction encoding type.
The ID/EX pipeline register latches all decoded values. It is flushed on reset, load-use hazard (id_ex_flush), or branch/jump taken (pipeline_flush), inserting a NOP bubble into the pipeline.
Signals latched into ID/EX:
- Register data:
id_ex_rs1,id_ex_rs2 - Register addresses:
id_ex_rs1_addr,id_ex_rs2_addr(for forwarding) - Destination register:
id_ex_rd - Instruction fields:
id_ex_funct3,id_ex_funct7 - Immediate:
id_ex_imm - PC values:
id_ex_pc,id_ex_pc_plus4 - Control signals:
id_ex_alu_op,id_ex_alu_src_a,id_ex_alu_src_b,id_ex_reg_write,id_ex_mem_to_reg,id_ex_mem_read,id_ex_mem_write,id_ex_branch,id_ex_jump
The ALU decoder produces a 4-bit alu_ctrl signal from alu_op, funct3, and funct7. The forwarding unit selects the correct ALU inputs — either from the ID/EX registers or forwarded from EX/MEM or MEM/WB. The ALU computes the result, zero flag, and signed/unsigned comparison flags (lt, ltu). The branch target address is computed as PC + imm.
Forwarding encoding (fwd_a, fwd_b):
2'b00— no forwarding, use ID/EX register data2'b10— forward from EX/MEM ALU result2'b01— forward from MEM/WB write data
ALU source mux encoding:
alu_src_a:00=RS1,01=PC,10=zero (LUI)alu_src_b:00=RS2,01=immediate,10=4 (PC+4)
Signals latched into EX/MEM:
ex_mem_alu_result,ex_mem_zero,ex_mem_lt,ex_mem_ltuex_mem_rs2_data— forwarded RS2 value for storesex_mem_branch_target— computed branch destinationex_mem_pc_plus4— for JAL/JALR writebackex_mem_rd,ex_mem_funct3- Control signals:
ex_mem_reg_write,ex_mem_mem_to_reg,ex_mem_mem_read,ex_mem_mem_write,ex_mem_branch,ex_mem_jump
The data memory is accessed for load and store instructions. The funct3 field controls access width (byte, halfword, word) and sign extension. The PC mux is also resolved here — branch condition is evaluated using the ALU flags and funct3, and the correct next PC is selected.
Branch condition logic:
| funct3 | Instruction | Condition |
|---|---|---|
000 |
BEQ | zero == 1 |
001 |
BNE | zero == 0 |
100 |
BLT | lt == 1 |
101 |
BGE | lt == 0 |
110 |
BLTU | ltu == 1 |
111 |
BGEU | ltu == 0 |
PC mux priority:
- Jump (
ex_mem_jump) →ex_mem_alu_result(computed target) - Branch taken →
ex_mem_branch_target - Default →
pc_addr + 4
Signals latched into MEM/WB:
mem_wb_alu_result,mem_wb_read_data,mem_wb_pc_plus4mem_wb_rd,mem_wb_reg_write,mem_wb_mem_to_reg
The writeback mux selects the value to write back to the register file based on mem_to_reg:
mem_to_reg |
Source |
|---|---|
2'b00 |
ALU result |
2'b01 |
Memory read data |
2'b10 |
PC+4 (JAL/JALR link) |
The register file write occurs on the falling clock edge, one half-cycle after the MEM/WB latch updates on the rising edge.
Detected by hazard_detection.v when a load instruction is in EX and the immediately following instruction needs its result. Resolution: freeze the PC and IF/ID register for one cycle, flush ID/EX to insert a NOP bubble.
Handled by forwarding_unit.v. EX/MEM forwarding takes priority over MEM/WB forwarding when both match the same destination register.
Branches resolve in the MEM stage. When a branch is taken or a jump executes, two incorrectly fetched instructions are in IF and ID. These are flushed by zeroing IF/ID and ID/EX (pipeline_flush).
Branch penalty: 2 cycles on every taken branch or jump.
R-type instructions that depend on the result of a previous R-type instruction in the immediately preceding cycle do not forward correctly in all cases. The fwd_a signal (ALU input A) does not always assert when it should, causing the instruction to read a stale value from the register file. The root cause is a 1-cycle misalignment between when id_ex_rs1_addr is valid after a branch flush and when the result is available in the forwarding network. This affects loops and any sequence of dependent R-type instructions.
Workaround: Insert an independent instruction between dependent R-type instructions, or use immediate instructions (addi, etc.) instead.
Branch Penalty Not Hidden
The 2-cycle branch penalty is not mitigated. No branch prediction or delayed branching is implemented. Every taken branch wastes 2 cycles.
There is no CSR (Control and Status Register) support, no trap mechanism, and no privilege levels. ecall, ebreak, and all CSR instructions are unsupported.
Instruction and data memory are separate modules (Harvard architecture). Data memory is byte-addressed starting at address 0. There is no memory-mapped I/O.
The following base RV32I instructions are implemented and tested:
- I-type:
addi,slti,sltiu,xori,ori,andi,slli,srli,srai,lw,lh,lb,lhu,lbu,jalr - S-type:
sw,sh,sb - B-type:
beq,bne,blt,bge,bltu,bgeu - R-type:
add,sub,sll,slt,sltu,xor,srl,sra,or,and(forwarding issues, see above) - U-type:
lui,auipc - J-type:
jal
The following are not implemented: fence, ecall, ebreak, all CSR instructions, and the M/A/F/D extensions.
A two-pass Perl assembler (rvc.pl) is provided that translates RISC-V assembly to the .mem format expected by the simulation. It supports all base RV32I instructions and the standard pseudoinstructions (li, mv, la, not, neg, j, ret, call, tail, beqz, bnez, blez, bgez, bltz, bgtz, bgt, ble, bgtu, bleu, seqz, snez, sltz, sgtz).
Usage:
perl rvc.pl <input.asm> <output.mem>
Note: Immediate values must be in decimal. Hex literals (0x...) are not currently supported and will be silently misassembled.
A key focus of this project was building a repeatable, automated simulation flow using Perl and ModelSim, reducing manual steps and enabling consistent regression testing across design iterations.
A two-pass Perl assembler translates RISC-V assembly source files into the .mem binary format consumed by the ModelSim simulation environment.
Design decisions:
- Two-pass architecture — Pass 1 strips comments, resolves label addresses, and computes instruction sizes (accounting for pseudoinstructions that expand to 2 words such as
li,la,call, andtail). Pass 2 performs the actual encoding with all label addresses known, enabling correct forward references in branches and jumps. - Recursive pseudo expansion — Pseudoinstructions are expanded by recursively calling the
assemblesubroutine with rewritten operands, keeping encoding logic in a single place and avoiding duplication. - Immediate encoding correctness — B-type and J-type immediates use the RISC-V scrambled bit layout. The assembler correctly handles the upper/lower split for 32-bit
li, accounting for sign extension of the lower 12 bits when constructing theluiupper immediate. - Padding — Output is zero-padded to fill the full memory size, matching the array dimensions declared in the HDL memory module.
Supported pseudoinstructions: nop, mv, li (full 32-bit), la, not, neg, seqz, snez, sltz, sgtz, beqz, bnez, blez, bgez, bltz, bgtz, bgt, ble, bgtu, bleu, j, jr, ret, call, tail.
Known limitation: Immediate values must be decimal. Hex literals (0x...) are not currently parsed and will be misassembled silently.
A Perl script automates the full ModelSim compile and simulate flow, replacing a static .do file with a dynamically generated one. This was motivated by the need to support multiple source file configurations without manually editing the simulation script each time.
What it does:
- Reads a
sources.txtfile listing all HDL source paths to compile, allowing the file list to be updated without touching the script itself. - Dynamically constructs a
vlogcompile command with all source files, then avsiminvocation with the.memfile passed as a plusarg (+MEMFILE=). - Generates a
.doscript on the fly that includes wave group definitions, arun -all, andwave zoom full, then invokes ModelSim with it viasystem(). - Integrates with VS Code via a
tasks.jsonbuild task, allowing the simulation to be launched with a single keypress.
Design benefit: The separation between source list (sources.txt), simulation parameters (passed as arguments), and the script logic means the runner is reusable across different test programs and design configurations without modification — matching the goal of multiple-project compatibility described in automation tooling roles.
Usage:
perl run_waveform.pl scripts/p1.mem
The HDL is compiled and simulated using ModelSim (Intel FPGA Edition). The flow is:
vdel -all ← clean previous compilation
vlib work ← create work library
vmap work work ← map library
vlog -sv <files> ← compile all HDL sources
vsim top +MEMFILE= ← launch simulation with memory initialisation
Wave groups are organised by pipeline stage and module (control, alu, dut_signals, reg_bank) for efficient signal navigation during debug. The +MEMFILE plusarg mechanism allows different test programs to be loaded without recompiling the HDL, which is important for regression testing where the same design is exercised with many different stimulus files.
Each test program is a hand-written RISC-V assembly file that exercises a specific subset of the instruction set. Tests follow a common convention: on completion, register x10 holds 1 for pass or 0 for fail, and the processor halts in an infinite loop (j end). This makes result extraction straightforward — reading x10 from the register file at end of simulation gives an unambiguous pass/fail indicator without requiring a complex self-checking testbench.
This approach mirrors directed stimulus methodology used in gate-level verification, where targeted sequences are written to exercise specific datapath conditions (hazards, forwarding paths, branch penalties) and verify conformance to the ISA specification.