# IITB-RISC The Pipelined Implementation Design Report

# **Submitted by**

Shashank OV 14D070021 IIT Bombay

shashankov@ee.iitb.ac.in

Yogesh Mahajan

14D070022 IIT Bombay

y.mahajan456@gmail.com

**Pratik Brahma** 

**14D070003 IIT Bombay** 

pratikbrahma96@gmail.com

**Avineil Jain** 

14D170002 IIT Bombay

avineil96@gmail.com

Note: This document represents what has been done before we presented to Sir. The edits and the code written later will be submitted later separately.

# THE DATA PATH



## **DATA PATH COMPONENTS**

# Instruction Fetch

- 1. PC : Stores the PC value of the current instruction.
- 2. IM : Instruction Memory from which the instruction is obtained.
- 3. PC++ : It increments the PC value by 1 for the next possible instruction.
- 4. Brach Prediction Table Block : Store the already encountered BEQ instruction PCs and the branch address along with the only more recent history of the instruction (whether it was taken or not)
- 5. MUX : Chooses between the taken address given by the branch prediction table and PC++. The control signal is also generated by the table block itself

# Instruction Decoder

- 1. SE : It sign extends the 6-bit or 9-bit immediate data to 16 bit. It also observes the extra LLI bit in which case it only packs the immediate data by zeros.
- 2. SE+PC: It adds the SE value and the PC value of the current instruction.
- 3. Decoder: It gives the control signals according to the current instruction.
- 4. MUX : Steers input between PC+1 and (SE+PC) value. Chooses (SE+PC) when JAL instruction.

# Register Read

- 1. LS : Left shifts the sign extended value.
- 2. RF: Register file which contains the register (RO-R7).
- 3. MUX1: It steers inputs between (SE+PC) and LS. SE+PC is chosen during JAL instruction and LS is chosen during LHI instruction.
- 4. MUX2: The mux before AR2. It steers inputs between AR2 from IDRR register and the output from LM\_SM block.
- 5. MUX3: The mux after DO1. It steers inputs between DO1 and the ALU output which gives the DO+1 value during LM\_SM instruction.
- 6. Hazard Mux : Controlled by Hazard\_RR.
- 7. Hazard\_RR: The Hazard Block present in RR stage. Sends Input to the PC register according to the instruction.
  - a. SE during LLI and R7 is the destination.
  - b. SE LS during LHI and R7 is the destination.
  - c. DO1 during the JLR instruction.
- 8. LM\_SM mux : Choses the input for the LM\_SM block between the data after decoder or from the ID\_RR pipeline.
- 9. LM\_SM block : It operates the LM and SM operation.

#### Execution

- 1. ALU : Perform ADD, NAND and Comparator operation.
- 2. Flags : Contains the carry, zeros and overflow flags.
- 3. Forwarding Blocks: Checks if dependency present between RR\_EX pipeline and the pipeline registers in the further stages. It accordingly controls the forward logic muxes.
- 4. MUX1: The second mux before ALU second input. It steers inputs according to the arithmetic operation or address calculation.
- 5. MUX2: The mux just before the second input of ALU. Controlled by the LM\_SM block. Sends 1 during the LM or SM operation to increment the address.
- 6. Hazard Mux: Mux controlled by the hazard logic block in execution stage.
- 7. Staller: It stalls the pipeline registers when there is an immediate dependency after the load instruction.
- 8. Hazard\_EX: Controls the hazard when during arithmetic instruction the destination is R7. It loads the PC with the required value controlling the hazard mux. Flushes the pipeline.

# Memory Write/Read

- 1. MEM : Data memory from which the instruction read data or writes data.
- 2. MUX1: Mux before address of data memory. Steers inputs between ALU output (address calculation) or DO1 (during LM or SM operation).
- 3. Hazard MUX : Mux being controlled by the hazard block in MW stage.
- 4. Hazard\_MM : Checks the hazard during load instruction when the destination is R7. It suitably loads the PC value by controlling the hazard mux.

#### Write Back

1. Flags\_user : The flags which are visible to user.

2. WB mux : It decides the input to be written in the register file. The inputs according to the instructions are

a. ALU output : During arithmetic instructions.

b. LS\_PC : LHI instruction.
c. SE : LLI instruction.
d. PC+1 : JAL, JLR instruction.
e. Mem\_out : LW, LM instruction.

3. R7 mux : It decides the input to the R7 in the register file. The inputs according to the instructions are

a. DO1 : During JLR instruction.b. PC+1 : Normal Program Flow.

c. LS\_PC : For BEQ instruction when the branch is taken.

- 4. Conditional and Hazard Control: Takes care of the following cases
  - a. Conditional arithmetic instruction is not taken and dependency present in previous pipeline registers.
  - b. Conditional arithmetic instruction when the destination is R7.
  - c. JLR instruction when the destination is R7.
  - d. JAL instruction when the destination is R7.
  - e. Check for BEQ instruction if the branch is taken or not.
  - f. Flushes the pipeline if any of the above condition is true.
- 5. Hazard Mux : Controlled by the above logic block. Decides the input to PC register.
  - a. (SE+PC) when branch is taken in BEQ.
  - b. PC+1 when JLR and JAL hazard is seen.
  - c. Otherwise the default value coming at 0 input.

## **CONTROL SIGNALS FLOW**

## Instruction Fetch

- 1. PC is incremented and sent to PC register
- 2. Instruction is fetched from Memory
- 3. Branch Prediction Table (BPT) is used if possible

# Instruction Decode

- 1. Creation of Signals:
  - a. Control Signals for multiplexors in the other pipeline stages
  - b. Sign Extended value of the Immediate data in the instruction if present
  - c. The addition of the sign extended value and the current PC value (SE + PC)
  - d. Control line for the mux which decides if PC+1 has to be sent or (SE+PC) value to the PC register (SE+PC) value is sent to PC when decoder encounters with JAL instruction
  - e. Clear bit for when JAL instruction arrives to clear the pipeline register (IF-ID)
  - f. Control Signal for SE which decides if to sign extend 6 bit or 9 bit immediate data
  - g. LLI bit for SE which just packs the immediate data with zeros to make the data 16bit

- h. Address of the register file (AR1, AR2, AR3)
- i. The LM or SM data to be given to the LM\_SM block This contains the data of the registers whose data has to be either stored or loaded
- j. The ALU control bits which tells the ALU of its operations
- k. Flag control bits which enables or disables the flag registers
- I. The condition bits which indicates the presence of conditional arithmetic instruction
- m. Register write and Memory write signals
- n. Control Signal for the BPT telling it whether the instruction is BEQ type or not
- o. Branch Prediction Table produces the index of the instruction from the table and also information on whether the branch was taken or not

# Register Read

- 1. Consumption of Signals:
  - p. The LM or SM data is consumed by the LM\_SM block and also LM or SM bit which activates the LM\_SM block
  - q. AR1 and the AR2 data for the register file
  - r. The control signal for the mux which steers inputs between (SE+PC) and the left shifted value of the sign extended immediate data
  - s. LM\_SM block creates the control signal for the mux before AR2 which steers the input between the original AR2 and the AR2 created by the LM\_SM block which sets data bits to zero
  - t. LM\_SM block also creates the signal for the mux which decides the data to be sent in the DO1 register. The original DO1 value or the auto incremented value of DO1 by the LM\_SM block
  - u. RR\_PC created by the hazard block in RR stage which controls the mux which decides the input to PC should be PC+1 or SE value (when R7 is AR3 in LLI) or left shifted SE (when R7 is AR3 in LHI) or DO1 (when the instruction is JLR)
- 2. Creation of Signals: DO1 and DO2 data created from the register file

## Execution

- 1. Consumption of Signals:
  - a. Two control signals for the forwarding mux created by the forwarding logic block
  - b. The control signal for the mux which steers input between DO2 or the sign extended value DO2 is used for arithmetic operations whereas sign extended value is used for address calculation
  - c. The control signal for the mux which is created by the LM\_SM block to select 1 such that the required address is always incremented by 1 during the LM or SM instruction
  - d. The control signal for the hazard mux
  - e. The control signal for the mux created by LM\_SM block to store the created address in the register file for storing data during LM from the memory
- 2. Creation of Signals:
  - a. The flag register values after the ALU operation
  - b. The ALU output data after ALU operation

# Data Memory Write/Read

- 1. Consumption of Signals:
  - a. Memory write control for the data memory during instructions like SW or SM
  - b. The control signal for the mux created by the LM\_SM block which steers inputs between the ALU output or the address calculated by the LM\_SM block
  - c. The control signal for the hazard mux in the Memory Write stage
- 2. Creation of Signals:
  - a. The data after reading from the data memory

# Write Back

- 1. Consumption of Signals:
  - a. Flag values to be written in the user flag registers
  - b. The control signals for the mux which decides the data to be written in the register file according to the instruction (ALU output, left shifted sign extended value, (SE+PC), SE, PC+1, Memory out data)
  - c. Flag control bits, Condition bits, Control Signals is sent to the hazard logic block in WB stage
  - d. The control signal for the mux which decides the input to R7 in register file This created by the hazard block

The data after WB mux, AR3 and some of the control signals (valid, the control signals for WB mux) is sent to a temporary register This register holds data for the forwarding logic.

## The LM and SM

The LM\_SM block is responsible generating the control signals to run the Load and Store multiple instructions. Inherently, the instruction is a multi-cycle instruction and thus has to be performed by halting the pipeline.

The LM SM block has the following inputs and outputs:

#### Inputs

- 8-bit data (from the instruction to be fed to the priority encoder)
  - Note that there is a mux connected to it, since the inputs for LM and SM (which are the same) will be in different clock cycles, the details of which are explained later.
- LM bit and SM bit
  - The LM and SM bits specifically tell when the instruction is LM or SM and are used to activate the block, so that priority encoder can start giving address
- clk and reset

#### Outputs

- Register Address: AR2 in case of SM, AR3 in case of LM
- Clear and disable for the pipeline registers
- RF\_DO1\_mux: Controls the mux connected to the input of DO1 register in RR\_EX
- ALU2\_mux: Controls the mux connected to the second input of ALU, decides when +1 should be the input to the ALU
- AR3\_mux: Used in case of LM, this mux controls the data that is stored in AR3 in EX\_MM
- AR2 mux: Used in case of SM, this mux controls the input to the register file for AR2
- mem\_in\_mux: Controls the input to the address of the memory which is usually the output of the ALU except in LM or SM where it is DO1

NOTE – The LM\_SM block starts outputting the address one cycle after it is activated

The block has been built using an FSM which goes into different states, depending on whether the instruction is LM or SM.

#### FSM Logic for LM

- The block gets activated when the current instruction is in the RF stage. Here it is in the S1 state. The bits that control memory input, AR3, and ALU are '1' and are put through the pipeline register.
- It then moves to the S2 state, where the mux connected to input of DO1 now starts accepting the output of ALU (DO1 + 1), and the block starts outputting AR3 addresses, which will be written in the write back stage into the register file. The disable signal is high, since the registers are now disabled (RR\_EX, ID\_RR, IF\_ID and PC are disabled).

- Note that there is a special enable signal to DO1 in RR\_EX since that has to be enabled during the LM\_SM process.
- The whole cycle continues till the last bit in the input of PE goes to zero, when the valid signal goes low, and one instruction currently in the RR\_EX register has to be disabled. The disable signal goes low as well, meaning the pipeline flow has started again.

# FSM Logic for SM

- The block gets activated when the current instruction is in the decode stage.
- In the next clock cycle, the LM\_SM block starts outputting the AR2 address and thus, SM begins. The control bits follow the same pattern as LM, except the mux controlling the input to DO1 starts accepting the ALU output one cycle later. When the valid signal goes low, the last set of data is in the RR\_EX register and so, in SM, no clear signal is required. The disable signal goes low and pipeline resumes.

# PIPELINE REGISTER CONTENTS

| Pipeline Register | Components              | Length(bits) |
|-------------------|-------------------------|--------------|
| IFID              | PC                      | 16           |
|                   | Instruction             | 16           |
|                   | PC++                    | 16           |
| IDRR              | PC                      | 16           |
|                   | SEPC                    | 16           |
|                   | SE                      | 16           |
|                   | CL(Control Lines)       | 11           |
|                   | LS_PC                   | 1            |
|                   | BEQ                     | 1            |
|                   | • LM                    | 1            |
|                   | • LW                    | 1            |
|                   | • SE_DO2                | 1            |
|                   | WB_mux                  | 3            |
|                   | <ul><li>valid</li></ul> | 3            |
|                   | Opcode                  | 4            |
|                   | ALU Control             | 2            |
|                   | Flag Control            | 3            |
|                   | Condition bits          | 2            |
|                   | Write Bits              | 2            |
|                   | AR1                     | 3            |
|                   | AR2                     | 3            |
|                   | AR3                     | 3            |
|                   | PC++                    | 16           |
|                   | LM input                | 8            |
|                   | BLUT(Branch Table)      | 4            |
| RREX              | PC                      | 16           |
|                   | LSPC                    | 16           |
|                   | SE                      | 16           |
|                   | CL(Control Lines)       | 12           |

| Г    | 250                               | 4      |
|------|-----------------------------------|--------|
|      | BEQ                               | 1      |
|      | • LW                              | 1      |
|      | • SE_DO2                          | 1      |
|      | WB_mux                            | 3      |
|      | • Valid                           | 3<br>3 |
|      | LM_SM control                     |        |
|      | ALU Control                       | 2      |
|      | Flag Control                      | 3      |
|      | Condition                         | 2      |
|      | Write                             | 2      |
|      | BLUT                              | 4      |
|      | DO1                               | 16     |
|      | DO2                               | 16     |
|      | AR1                               | 3      |
|      | AR2                               | 3      |
|      | AR3                               | 3      |
|      | PC++                              | 16     |
| EXMM | LSPC                              | 16     |
|      | SE                                | 16     |
|      | CL(Control Lines)                 | 8      |
|      | BEQ                               | 1      |
|      | WB_mux                            | 3      |
|      | Valid                             | 3      |
|      | <ul> <li>LM_SM control</li> </ul> | 1      |
|      | Flag Control                      | 3      |
|      | Condition                         | 2      |
|      | Write                             | 2      |
|      | AR1                               | 3      |
|      | AR2                               | 3      |
|      | AR3                               | 3      |
|      | Flags                             | 3      |
|      | DO1                               | 16     |
|      | DO2                               | 16     |
|      | ALU output                        | 16     |
| MMWB | LSPC                              | 16     |
|      | SE                                | 16     |
|      | CL(Control Lines)                 | 7      |
|      | • BEQ                             | 1      |
|      | WB_mux                            | 3      |
|      | • valid                           | 3      |
|      | Flag Control                      | 3      |
|      | Condition                         | 2      |
|      | Write                             | 2      |
|      | AR1                               | 3      |
|      | AR2                               | 3      |
|      | AR3                               | 3      |
|      | ALU output                        | 16     |
|      | Memory output                     | 16     |
|      | DO1                               | 16     |
|      | 1 001                             | 10     |

| PC++ | 16 |
|------|----|
|      |    |

## HAZARD DETECTION AND MITIGATION



Data is forwarded through two multiplexers.

Each mux is controlled by separate forwarding unit which cheeks if any of the operand in execution stage depends on output of any instruction in further stages or current value of PC and provides corresponding data to ALU input.

Each pipeline register holds valid bits corresponding to operand and destination register address of corresponding instruction. Forwarding block considers these bits to determine dependency among instruction

- 1. Forwarding Priority: R7 > [EX-MM] > [MM-WB] > [WBT]
- 2. Pseudocode

```
if ([RR-EX](ARi) is Valid):
    if([RR-EX](ARi) = "111") ;--- R7
        Di <= [RR-EX](PC)
    elsif(([EX-MM](AR3) is valid) and ([EX-MM](AR3) = [RR-EX](ARi))):
        Di <= [EX-MM]{Write_Data}
    elsif(([MM-WB](AR3) is valid) and ([MM-WB](AR3) = [RR-EX](ARi))):
        Di <= [MM-WB]{Write_Data}
    elsif(([WBT](AR3) is valid) and ([WBT](AR3) = [RR-EX](ARi))):
        Di <= [WBT](D3)</pre>
```

3. Write\_Data selection: Edit

# Instruction-Wise Hazards and Mitigation

#### Arithmetic Instructions

- 1. Arithmetic instructions not having R7 as destination
  - a. If instruction is unconditional only data forwarding is sufficient.
  - If instruction is conditional, assume instruction to be taken and check instruction in writeback stage. If instruction is false and preceding instructions depend on the current instruction, then flush the pipeline.
     Dependency can be found by AR1/2 and Valid bits in pipeline registers.
- 2. Arithmetic instructions having R7 as destination
  - a. If instruction is unconditional then update PC in execution state and R7 in writeback stage. Flush [IF-ID] and [ID-RR] registers.
  - b. If instruction is conditional assume it to be taken i.e. update PC in execution stage. If condition becomes false (checked in Writeback stage) then flush the pipeline.

#### Load Instructions

- 1. LW: Stall only if instruction in [RR-EX] is LW and instruction in [ID-RR] depends on its output.
- 2. If instruction in MM stage is load type (both LM and LW) and destination register is R7 then flush previous registers and update PC.

#### Branch and Jump Instructions

- 1. JLR: Update PC in RR stage and clear [IF-ID] and [ID-RR] register.
- 2. JAL: Update PC in ID stage and clear [IF-ID] register.
- 3. BEQ: BEQ instructions are assumed to be taken/not-taken based on the branch prediction table.
  - a. If not found in the table, they are assumed to be not-taken as the address to branch to is not known. Condition is checked in WB stage.
  - b. If condition becomes opposite to what was predicted, the table is accordingly edited, the PC updated and the pipeline is flushed.
  - c. New entries to the table are made in the decode stage

# Hazard Mitigation Blocks

## Stalling Block

- 1. If instruction in [ID-RR] is LW and instruction in [IF-ID] depends on its output, then stall for one cycle.
- 2. Clear enable of PC, [IF-ID], [ID-RR]. This will create two copies of LW instruction, one in [ID-RR] and one in [RR-EX]. So, pass the same signal through register to clear [ID-RR] in next signal.
- 3. If instruction in [IF-ID] is SM then delay SM start bit through register and MUX.

## EX Stage Block

- 1. This block updates PC in case of Arithmetic Instructions with R7 as destination.
- 2. It sends output to PC and clear [IF-ID] and [ID-RR] registers.

# MM Stage Block

- 1. If any load type instruction has R7 as destination, then update PC in MM stage.
- 2. Clear [IF-ID], [ID-RR] and [RR-EX] registers.

#### Condition Control Block

| Condition                                                                                                                     | Action                                                                    |  |
|-------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|--|
| Conditional Arithmetic Instruction not having R7 as output destination becomes false and preceding instructions depends on it | Clear register write signal and flush pipeline.                           |  |
| Conditional Arithmetic Instruction not having R7 as output destination becomes false                                          | Clear register write signal and flush pipeline. Write PC and R7 with PC+1 |  |
| (JLR R7 R7)                                                                                                                   | Flush pipeline and write PC and R7 with PC+1                              |  |
| BEQ becomes true                                                                                                              | Flush pipeline and write PC and R7 with (PC+SE)                           |  |

# VHDL COMPONENTS

Individual components have been created in VHDL.