# Computer Organization 5-stage RISCV32I Processor Phase 7: Branch and Jump Instructions Spring 2023

#### Summary of the work you will be doing in this phase:

- Update the diagrams.net schematics from phase 6 to include all the components necessary for the branch and jump instructions.
- Update all the specified Codasip files to implement all branch and jump (jal and jalr) instructions. The files that we will be modifying in this phase are ca\_decoder.codal, ca\_defines.hcodal, ca\_resources.codal, ca\_pipe\_stage2\_id.codal, ca\_pipe\_stage3\_ex.codal, ca\_pipe\_stage4\_me.codal and ca\_pipe\_control hcodal.

#### **Schematic Instructions:**

- ◆ Copy the schematic standardname6.xml to standardname7.xml.
- ♦ Open the schematic in diagrams.net.
- Adding the conditional branch instructions follows the logic shown in Figure 4.49 of the textbook, reproduced below as Figure 1. The functions in the three red rectangles must be added.
- ♦ In the EX stage, add the Adder component which adds the PC value to the correct immediate value. The "Shift Left 1" function in Figure 1 is not necessary, as this function is handled in the IMMGEN logic. The output of the adder (s\_ex\_target\_address) will be used below. Add the zero signal coming out of the ALU.
- ◆ The JAL instruction jumps to the address which is a J-type immediate added to the current PC. This path already exists in the schematic for the branch instructions, although IMMGEN will create a different immediate value based on s\_id\_immsel. No schematic changes are necessary for this piece of the instruction.
- The JALR instruction jumps to the address which is the sum of a register and an I-type immediate. The ALU will be used in this case to compute the branch target address. The ALU already has the correct inputs for this operation to work as expected, no additions are needed on the schematic for this piece of the instruction.
- ◆ Add a 2-1 multiplexor in the EX stage which selects between the output of the address adder and the output of the ALU and produces the final s\_ex\_target\_address, which is pipelined to the ME stage. You will use the signal s\_id\_jump\_instr, produced by the decoder, to control this mux and pipeline it to the EX stage.
- ♦ The jump logic is not included in any of the block diagrams in the textbook because the mini RISC-V processor of the book doesn't have these instructions.
- ♦ Both JAL and JALR write the current PC value plus 4 into the destination register of the instruction. Implementing this requires a multiplexor selecting the write data to the Register File but this needs to be done in the ALU to make sure that forwarding works as expected.
  - Add a 2-1 Multiplexor component in the EX stage after the ALU. One input is the current ALU output (s\_ex\_alu\_result). Add an Adder component which adds 4 to the current PC

- value (r\_id\_pc) and connect it to the second input of the 2-1 mux (this addition is for the jump instructions).
- Add a new Decoder output s\_id\_rfwtsel to control the write data select. Pipeline this signal to the 2-1 mux select.
- ◆ Logic in the ME stage determines whether the conditional branch is taken or not. The zero output from the ALU (s\_ex\_zero) is asserted when the ALU output s\_ex\_alu is zero. This signal needs to be pipelined to the ME stage, and is the only signal required to determine whether a conditional branch is taken or not.
- ♦ A new control signal s\_id\_branchop is required from the Decoder in the ID stage to determine how the zero indicator is used. Pipeline this signal to the ME stage. Add the control block BRNCH which creates the s\_me\_take\_branch signal which selects the next PC source, based on branchop and zero.



Figure 1

Although not required, it is recommended that you submit the updated schematic as soon as it is complete, as standardname7.xml. This will allow for feedback prior to creating the Codasip code. The scoring will reward early submission of the schematic.

#### Instructions for Codasip:

- 1) You will continue to build upon your phase 6 project. If required, import your phase 6 processor CodAL project.
- 2) The following branch and jump instructions will be implemented: BEQ, BNE, BLT, BGE, BLTU, BGEU, JAL, JALR.

- 3) Changes to the ca defines.codal file
  - O Add the enum values and width defines for the new control signal, s\_id\_brnchop,, as shown in the figure below. The four values required are as follows:

BRNCH\_FALSE – never branch (which must be the default and thus first in the list)

BRNCH\_TRUE – always branch (used for jump instructions)

BRNCH COND TRUE – branch if the ALU output is zero (for conditional branches)

BRNCH COND FALSE – branch if the ALU output is not zero (for conditional branches)

```
1289 // -----
129 // ME stage
130 // -----
131
132 // Branch Command
1339 enum brnchop
134 {
135
      BRANCH FALSE,
      BRANCH_TRUE,
136
      BRANCH COND FALSE,
137
138 BRANCH_COND_TRUE,
139 };
140
```

- Create the enum and width defines for the EX stage multiplexor that selects which value to write to the register file (r\_ex\_rfwtsel).
  - WB ALU: selects the output of the alu to write into the register file
  - WB\_PC: selects the current pc + 4 value to write into the register file (this is for the JAL and JALR instructions)
- The branch/jump target address unit must be updated to provide the jump address calculated address for jump instructions
  - Create an enum and corresponding width definition in the EX section for the EX stage mux that selects how to calculate the branch target address. The mux options are:
    - BRADD\_ADDR: calculate the branch target address by adding the immediate to the PC (all branch instructions and the JAL)
    - BRADD\_ALU: get the branch target address from the output of the ALU (JALR)
- Changes to ca\_resources.codal:
  - Add definitions for the branch operation (both signals and registers) to pipeline this value all the way through to the memory stage.
  - Add the definition for the ALU zero output (both signal and registers) to pipeline the value all the way through to the memory stage. s\_ex\_zero is a 1-bit control signal and can be defined with BOOLEAN BIT.
  - O Add the signal definition to keep the branch target address. The corresponding pipeline register has already been provided.

- Add the definition for the control signal for the register file write select value (create both signals and registers to be able to pipeline this value from the decode to the execute stage.
- O Add a new signal in the execute stage to hold the value that will be written into the register file. Note that this will be different from the existing alu\_result value since jal and jalr do not use the alu\_result value to write to the register file and instead use the PC+4.
- Changes to ca\_pipe2\_id.codal
  - Pipeline the new control signals from the decoder
- Changes to ca\_pipe3\_ex.codal
  - o Inside the alu\_operate event, after the aluop switch statement, add the multiplexor which selects among the branch target address calculation choices based on the control value for r\_ex\_jump\_inst. Inside each case statement add the corresponding logic to compute the address, s\_ex\_target\_address, for both branches and jumps accordingly. If r\_ex\_jump\_instr is set then the target address is taken from the ALU output, otherwise add the immediate value to the current pc.
  - Inside the alu\_operate event, after the aluop switch statement, add the generation of s\_ex\_zero, which is asserted when the ALU output s\_ex\_alu is zero.
  - Inside the alu\_operate event, after the aluop switch statement, add a mux that selects between the alu\_result and the PC+ 4 to pass down to write into the register file. You can save this value into a new signal value that will be pipelined to the existing r\_ex\_alu\_result pipeline register.
  - Pipeline all required values in ex\_output.
- Changes to ca\_pipe4\_me.codal
  - Create a new event, branch\_logic, and ensure that you declare this event (as
    done with the me\_output event) inside the me event and call the event before
    me\_output()

```
event branch_logic : pipeline(pipe.ME)
{
}
```

- A branch will occur based on the following logic:
  - r ex branchop == BRANCH COND TRUE, then if r ex zero == 1
  - r ex branchop == BRANCH COND FALSE, then if r ex zero == 0
  - r\_ex\_branchop == BRANCH\_TRUE
- Based on the logic above, assign the corresponding value to the signal s\_me\_take\_branch. The result will depend no only on the value of the r\_ex\_branchop but also the value of the r\_ex\_zero register.
  - You can use a switch statement is follows:

```
switch(r_ex_branchop)
{
    case BRANCH_COND_TRUE:
s_me_take_branch = r_ex_zero;
    break;
```

- Pipeline all required values in me output.
- 4) Changes to ca pipe5 wb.codal:
  - No changes required for this phase.
  - 5) Changes to ca\_decoder.codal:
    - Add the assignment of any new control signals (s\_id\_brnchop, s\_id\_rfwtsl) to
      every existing instruction group. Since none of the existing groups ever branch,
      assign BRANCH\_FALSE for s\_id\_brnchop in each case.
    - Search for btype until you find the element i\_hw\_btype\_branches and uncomment the complete i\_hw\_btype\_branches element
    - Set all control signals:
      - s\_id\_regwrite: branch instructions do not write to the register file
      - s\_id\_alusrc1 and s\_id\_alusrc2: branch instructions use registers for both inputs.
      - s\_id\_imm\_gen\_sel: use the corresponding branch immediate value.
      - s\_id\_branch\_inst: this signal indicates if we need to use the PC to calculate the target address, so set it to true for all branch instructions.
      - s\_id\_jump\_inst: this signal indicates if we need to use a register to compute the target address. Set this value to false for all branch instructions.
      - s\_id\_mem\_ops: since these are not memory instructions set this value to MEM\_NOP as with all the prior elements.
      - s\_id\_memread: since these instructions are not memory instructions set this value to false.
      - s\_id\_halt: set this value to false.
      - s\_id\_rfwrtsl: since write enable is false we don't care about this value for branches (DONT\_CARE)
      - s\_id\_branchop: assign the correspoding branch operation based on the following logic:
        - BEQ: branch taken if the result of the ALU is zero.
        - BNE: branch taken if the result of the ALU is not zero.
        - BLT/BLTU: branch taken if the result of the ALU is not zero.
        - BGE/BGEU: branch taken if the result of the ALU is zero
    - the i hw itype ilink and i hw itype ilreg elements
      - First, you will need to **uncomment** these element frameworks
      - For these elements, s\_id\_branchop will be set to BRANCH\_TRUE
      - For the **jal** instruction, set s\_id\_branch\_inst = true and the s\_id\_jump\_instr = false since the target address will be calculated in a similar manner as the branch instructions (PC+IMMED).
      - For the **jalr** instruction, set s\_id\_branch\_inst = false and the s\_id\_jump\_instr = true to indicate that the target address will come from the register file plus the immed (SRC1+IMMED).
      - For the aluop control signal, set it to ALU\_ADD since we are going to be adding the two values for jalr. The jal instruction won't be using the ALU so this signal's value does not matter.

- You will need to assign a value to the remaining control signals defined in the instruction decoder
- Add alll of these elements to the inst\_decode set at the top of ca\_decoder.codal (line 36)
- 6) All the updates to the model have been to determine whether a branch should occur based on an ALU compare operation and calculating the target branch address, changing the program flow. The changes to the IF stage have already been provided.
- 7) The last piece to implement is the flushing of pipeline stages in case of a branch misprediction (we are doing static branch prediction to always predict not taken), and it is performed in the ca\_pipe\_control.codal file
  - Changes to the ca\_pipe\_control.codal file:
  - Inside the pipeline\_control event, add an IF statement based on the signal that that identifies when a branch has been taken. (s\_me\_take\_branch).

#### Standard 5-stage pipeline flow with no branches taken or jumps



pc+4 to target address

With a Branch taken or jump, the Branch taken / Jump flushing pipeline registers instructions in the pipeline that came after the Branch or Jump ID EX MEM WB SW instruction must be flushed. Clearing the pipeline registers to MEM Branch **BEQ** IF ID WB FX 0 equates to a NOP instructions taken due to control lines are only MEM - NOP ADDI ΙF WB - NOP active if true (value = 1) ID EX - NOP MEM - NOP WEB - NOP XOR IF - Branch ID - Branch EX - Branch MEM -WB -**BGEU** Branch taken or jump decided Branch add Branch add add add add in MEM stage Address changed from

#### If true:

- pipe.ID.clear(): This command sets the clear input signal to the ID pipeline register as true for this clock cycle.
   Transforming the instruction in the ID pipeline stage to a NOP operation.
- pipe.EX.clear(): This command sets the clear input signal to the EX pipeline register as true for this clock cycle.
   Transforming the instruction in the EX pipeline stage to a NOP operation.

- else false:
  - No requirement to disable the pipeline clear. CodAL disables clear input to the pipeline register any clock cycle that it is not actively specified to be clear
- pipe.MEM.clear() is not included in the flushing of instructions because the instruction in the MEM, memory, stage is not an earlier instruction than the branch or jump. It is actually the branch or jump instruction.
- The pipeline control is now complete for the branch operations

## Validating branch operations

- Import the control\_inst\_assembly\_test into your Codasip workspace
  - git clone https://github.com/CompOrg-RISCV/control\_inst\_assembly\_test.git
  - After building both the IA and CA SDK, right click the Assign SDK for this project in the workflow perspective and assign your project(ia) model
- Set the standard assembly Compiler Configuration to:
  - No Startup or Default Lib (-nostdlib)
- After a successful compile, assign the CA SDK to the control\_inst\_assembly\_test
- NOTE: If you will need to recompile the test after assigning the CA model to your project, you will first need to reassign the IA model to the software project
- Set the standard assembly **Debug Configuration** to Stop at Startup after 0 instructions and set a breakpoint on the second instruction of the test source file.
- Launch the debugger for control\_inst\_assembly\_test
- Debug control inst assembly test
  - Place breakpoints on the following lines:
    - 35 and 161: If the program reaches these breakpoints/halts, the branch operations have failed
    - 52: If the program reaches this breakpoint, the branche BEQ appear to be working
      - Confirm successful branches by comparing your register results with the program's comment statements
  - Step through the program
    - If you reach the halt after the FAIL label, a branch occurred when it should not have
      - Time to debug
    - If you reach the halt after PASS2 label, your branch, BEQ, successfully completed
      - Verify that your NOP bubbles were correctly implemented
        - x3 = 1, a branch not taken did not create a NOP bubble
        - x4 = 0, EX stage NOP bubble occurred as required
        - x5 = 0, ID stage NOP bubble occurred as required

- For validating the jal and jalr instructions, you will continue to use the control\_inst\_assembly\_test project.
- Comment out the halt instruction after you branched to the PASS2 label, line 52. This will enable the code to reach the jump test code
  - You will need to reassign your IA project to the control\_inst\_assembly\_test to compile the updated test with the halt statement on line 52 commented out
  - No need to reassign the project to CA after the compile. Once the software project is aware of the CA model, it does not need to be reassigned
- Place breakpoints at:
  - 35, 104, 124, 161: These halt statements will indicate a jump or branch failure
  - 86: Reaching this breakpoint with register x10 = 1 indicates that the branches and jumps in this routine successfully executed
- Step through the code to see that the jump operations performed as programmed. Refer
  to the comments to understand the flow of the code and whether your project
  successfully completed the test
- For the last validation step, you will use your regression test.
- Comment out the **halt** statements after the immediate instructions, the r-type instructions, shift-immediates, and the data-hazard detection / forwarding test code
- Your code should now run up to and through your regression test branch and jump test code and halting before the load and store set of instructions
- Debug your regression test and step through your code until it fails or it reaches the halt statement after your branch and jump test sequences
  - If your test fails an earlier test sequence that had passed, it most likely is an error in your processor's data-path or control-signals since these tests had passed in an earlier phase
  - If your test fails within your branch and jump test sequence, you will need to evaluate whether the failure is in your regression test sequence and/or in your branch/jump CodAL implementation
- After your regression test passes all the way to the halt after your branch and jump sequence with correct results through all the tests, you are ready to submit your project for grading
- Complete the phase
  - Clean your SDK
  - Export your SDK
  - Rename the file to the standardname?
  - Submit both your xml schematics and your compressed file into the Canvas assignment for phase 7.

IF YOU PREFER THE OLD STYLE OF PROJECT DESCRIPTION, SEE BELOW. OTHERWISE, THE ABOVE DESCRIBES THE COMPLETE PROJECT.

# Computer Organization 5-stage RISCV32I Processor Assignment 7: Branch and Jump Instructions Spring 2023

Objective: For a program to make decisions such as an IF then Else or to perform a loop, a set of control instructions need to be implemented. These control instructions are broken into branch instructions which are conditional operations while jumps are unconditional operations. A common use of branch instructions are in IF statements to perform a set of instructions based on a condition or to jump back to the top of a loop until a condition is met. The jump instructions perform the same operation as a branch operation in that they change the flow of the program but they are not conditional, it is an unconditional branch operation. In this assignment, you will implement both the branch and jump instructions for a RISCV32I processor. Since these instructions change the flow of the program and require the Execute (EX) stage, any instruction in the processor's pipelines that entered after these instructions must be flushed if a branch or jump is taken. This is because they would not have entered the processor pipeline if the change of program flow could be performed immediately after the control instruction entered the processor.

For this assignment, additional ALU ops (operations) will need to be added to the ALU to decide whether a branch should be taken. In the EX stage, the branch and jump target addresses must be calculated by implementing a specific ADD execution unit to calculate these addresses. There are jump instructions that use a register to determine its target address which creates the need to perform a data hazard detection and data forwarding for this branch/jump ADD unit as you did for the standard ALU operands. If a change of flow of instructions does occur, a branch is taken or a jump, all the instructions in the earlier stages of the pipeline must be flushed (become NOPs) since the change of flow results in these instructions not being taken.

#### Key Learning Outcomes of this assignment:

• Control Instruction (Branch & Jump): Without control operations, programs would only be able to perform a single pass through the program, going from the first line of code straight to the end. Control operations change the flow of a program based on either a test condition such as whether a variable is true or false, or an unconditional jump such as making a function call. In a single-cycle processor, no pipeline, the next instruction would either be the Program Counter + 4 or the new branch/jump address. For pipeline processors, there are opportunities for Control Hazards. A Control Hazard occurs when the instructions in the pipelines are not the instructions to be executed. For a pipeline processor that is performing parallel instructions, one in each stage, if the branch/jump is taken, the instruction in the Instruction Fetch (IF) stage would be the branch/jump target address, but the pipeline would have already started the processing of the Program Counter + 4 instruction. With the control operation's change of address,

the Program Counter + 4 and any other instruction that has entered the pipeline would need to be flushed (a NOP bubble) since they would not have occurred in a single cycle processor model.

#### Instructions:

- You will continue to build upon your assignment 6 project. If required, import your assignment 6 processor CodAL project.
- To get started, you will need to add the Branch and Jump instructions into the Cycle Accurate (CA) model.
  - Open up isa.codal in "project"/model/share/isa/ and search for "btype" until you find a list of DEF\_OPC of btype instructions. It should be located around line 500 in the file.
    - Make a list of all the different btype instructions which are after "BTYPE XXX"
  - Now open up ca\_defines.hcodal in "project"/model/ca/includes to add all the "microarchitecture" alu opcodes for these branch instructions
    - Go to the aluop enum and add an enum for <u>each btype instruction</u> using the same nomenclature as the existing aluops, **ALU** "byte instruction"
    - An example branch aluop enum is ALU\_BGE



- The next step is to add these instructions into the Instruction Decoder. Go to ca\_decoder.codal under "project"/model/decoders and search for btype until you find the element i hw btype branches
  - Uncomment the complete i\_hw\_btype\_branches element
- The first thing to add in this btype element is the assigning of the s\_id\_aluop signal by adding a case statement for each of the btype opcodes in the "switch (opc) statement." Please refer to the current decoder itype and rtype elements

- as an example. The switch statement s\_id\_aluop should equate, case statement, to one of the new branch aluop enums that you have just added.
- The decision of what each stage of a pipeline must do or operate on to complete the instruction is done in the ID stage within the instruction decoder. This results in the decoder turning on the required portions of the processor and disabling others. Turning-on or disabling these functions are done by the processor's control bits. These control bits are passed along to the next pipeline state until they reach their respective function.
  - For example, the regwrite, register write, control bit is passed to the EX, MEM, and then to the Write Back (WB) stage so that in the WB stage, a decision can be made whether to update the Register File
- With the s\_id\_aluop signal assigned, the remaining control-signals must be declared using the appropriate define from ca\_defines.hcodal or the appropriate boolean value (true/false)
  - s\_id\_regwrite does the btype of operations update the rd, destination register?
  - s\_id\_alusrc1 What is the source of the src1, rs1, operand?
  - s id alusrc2 What is the source of the src2, rs2, operand?
  - s\_id\_imm\_gen\_sel What immediate value should be passed to the EX stage?
  - s\_id\_branch\_inst Is it a branch operation?
  - s\_id\_jump\_inst Is it a jump operation?
  - s\_id\_mem\_ops What memory operation should be performed?
  - s\_id\_memread Is it a load memory operation?
  - s id halt Is it a halt operation?
  - For additional details on the btype of instructions, please refer to the RISC-V Instruction Set Manual (Unprivileged ISA)
    - https://riscv.org/technical/specifications/
    - The Conditional Branch information details are within the "Control Transfer Instruction" section
- With the i\_hw\_btype\_branches element fully defined, it is time to include it in the list of allowable instructions to decode. At the top of ca\_decoders.codal, add the i\_hw\_btype\_branches to the "inst\_decode set"
- With the parsing of the instruction completed in assignment 5 and the data hazard detection and data forwarding completed in assignment 6, the branch instructions have been added, parsed, and are ready to proceed to the EX stage
- The ALU will be enhanced by adding a signal whose value is true if a branch is to be taken
  - Go to ca\_resources.codal and in the EX stage pipeline section, create the Boolean bit-wide signal s\_ex\_branch\_true
  - Now go to ca\_pipe3\_ex.codal's alu\_operand event, and for each of the current
    ALU case statements in the ALU, assign s\_ex\_branch\_true to false. The current
    ALU operations are not branch operations, so branch taken signals must be set
    to false.

```
event alu operate : pipeline(pipe.EXMEM)
123
124
          semantics
125
              // A switch statement will be used to determine and evaluate the ALU operation using
126
127
              // the aluop code provided by the IDEX pipeline register whose value is determined
128
              // by the instruction decoder.
129
              // Hint: Using the standard naming protocols for this project, what prefix would you add to
130
              // to aluop to build the complete name of input value for this switch statement?
131
              switch (r idex aluop) {
132
                  case ALU ADD:
133
134
                  default:
135
                      s ex alu result = s ex soperand1 + s ex soperand2;
136
                      s ex branch true = false;
                      preak;
137
```

- Next, add a case statement for each of the new <u>branch</u> aluops for the branch operations.
  - s\_ex\_branch\_true will be set true based on the result of the respective branch compare condition
    - Best Coding Practice: For comparison operations, use the proper type casting for each operand. For unsigned operations, use an unsigned type cast such as (uint32) and for signed operations, the default, an integer type cast such as (int32)
    - Example: BGEU, Branch if Greater than or Equal Unsigned
       b\_ex\_branch\_true = ((uint32) src1 >= (uint32) src2)) ?
       true: false:
  - It is **Best Coding Practice** to assign the s\_ex\_branch\_true signal in each of the case statements in the ALU as well as the default statement
  - With the s\_ex\_alu\_result, you can choose its assignment as 0.
- Should the decision of taking a branch, branch\_instr & branch\_true, be decided in the EX or MEM pipeline stage? From the previous assignment, the assumption is that the EX pipeline stage will define the clock period of the processor due to the complexity of the ALU processing element. Adding any additional logic before or after the ALU will increase the overall processor's clock period resulting in a lower clock frequency for all operations. To maximize the performance of the majority of operations, the 3rd logic block in the below block diagram, Branch Logic, will be moved to the MEM pipeline stage. In the branch timing path, there is actually a 4th element which is located in the Instruction Fetch (IF) stage. This additional element takes the control signal branch\_taken to either request the next instruction pc + 4 or the branch calculated address, a 2:1 mux.



- Moving the Branch Logic to the MEM stage will reduce the worst case timing
  path from three elements to two. The s\_ex\_branch\_true must be passed to the
  next stage through the pipeline register
  - Go to ca\_resources.codal and add a new EX pipeline register to pass
     s\_ex\_branch\_true along to the MEM stage using the proper pipeline register nomenclature
  - Go to ex\_output() event and assign the s\_ex\_branch\_true signal to your new pipeline register
- o The ALU now will provide the logical signal out whether a branch is to be taken
- In the EX stage, the target address, the address to be taken, must be calculated. Per the RISCV Instruction Set Manual, the branch target address is the Program Counter + the signed extended btype immediate.
  - In the instruction decoder, the BTYPE\_IMM\_SEL must be assigned to the s\_id\_imm\_gen\_sel signal even though the source operands to the ALU are both from the Register File. Selecting the btype immediate will mux this immediate value into the ID pipeline register to pass it into the EX stage.
  - The branch and jump will be performed in its own event (analogous to a function)
     Below is the syntax for an event.
    - event: declares the following as an event
    - branch\_jump: the event name
    - pipeline: specifies that is associated with a pipeline stage
    - pipe.EX: specifies which pipeline stage

```
event branch_operation : pipeline(pipe.EX)
{
    semantics
    {
    };
};
```

- Add this branch\_operation event somewhere below ex event and outside of any event
- For the ex event to access or call this new event, it must be made aware of it.
  - In events before the semantics section is where the events are declared.
    - add "use branch\_operation;" to declare your new event at the top

```
of ex event
30
31
     // Execute Stage
32
     // -----
     event ex : pipeline(pipe.EXMEM)
33
34
35
         use alu operate;
36
         use branch_operation;
37
         use ex_output;
38
         semantics
39
40
```

- Add the call to this event after the call to the alu\_operate event, but before the
  call to the ex\_output() event. If you call this event after the ex\_output() event, the
  changes to the signals in branch\_operations() that are inputs into setting the EX
  pipeline registers will have not been updated before the registers inputs are
  determined.
  - For simulation purposes, the code is ran linearly from top to bottom
- Add a new signal in ca\_resources.codal which is required for the target address
  - s ex target address: calculated branch address
    - Being an address, the signal width is the same as the Program Counter
      - What definition will you use in the bit field?
- r\_ex\_target\_address is provided in the starting assignment: The pipeline register to pass the address into the MEM stage
- The code for an event/function, is within the semantics statement
  - In the branch\_operation() event, assign the s\_ex\_target\_address signal with the value of the branch instruction which is PC + Branch Offset, the btype immediate value
    - What two r\_id pipeline registers will be used for this addition?



- Next, go to ex\_output() event and assign the calculated target address to the pipeline register r\_ex\_target\_address to pass the target address to the MEM stage
- The updates to the EX stage for the branch operation is now complete
- The addition of the branch circuitry is almost complete. The memory stage will determine if the target address will become the next PC address based on whether the instruction in the MEM stage is a branch instruction and the branch instruction's ALU compare result, s\_ex\_branch\_true, is true.
  - Create a new event, branch\_logic, in ca\_pipe4\_me.codal and ensure that you
    declare this event in the me event and call the event before me\_output()
  - A branch will occur based on the logical equation of r\_ex\_branch\_true &&
     r\_ex\_branch\_inst. The output of this logic will need to be assigned to a signal
  - You will use the signal already provided to change the program flow if required.
     Assign the signal s\_me\_take\_branch the value of this logic expression



- The updates to the MEM stage are now complete
- All the updates to the model have been to determine whether a branch should occur based on an ALU compare operation and calculating the target branch address, changing the program flow.
- To change the program flow, the next instruction address to fetch must be updated to
  the branch target address if a branch is to be taken. The current Instruction Fetch (IF)
  stage only requests the previous Program Counter (PC) + 4. With the branch and jump
  control instructions, the next instruction address to fetch must be selected using a 2:1
  mux
  - Inputs to the mux
    - r\_pc: previous PC address + 4
    - r\_ex\_target\_address: Address if a branch or jump is taken
  - Mux output
    - **s** if nextpc: address used to request the next instruction memory fetch
  - Mux select line(s):
    - The signal that you have assigned in your ca\_pipe4\_me.codal branch\_operation event that evaluates whether a branch should be taken
      - If false, no branch and use r\_pc
      - If true, branch/jump taken and use r\_ex\_target\_address

### Instruction Fetch (IF) Stage Block Diagram





 This mux has been provided in the base project. It is implement using the following if/else statement:

```
// If branch, the next PC is not the previous pc + 4, but the calculate branch or jump address if (s_me_take_branch) s_if_nextpc = r_exmem_target_address; else s_if_nextpc = r_pc;
```

- With this mux already implemented, no changes to the IF, Instruction Fetch, stage is required
- In a single stage processor, all instructions complete in one cycle (or single stage) but performance is reduced due to lack of parallelism. If a branch is taken, the instructions immediately following the branch instruction will not be executed. In a pipeline processor whose performance is increased by executing multiple instructions simultaneously, one in each stage of the pipeline, when a branch is taken, there are instructions that need to be flushed (become NOPs) since they would not have been executed in a single stage design, a Control Hazard.
  - The branch or jump is taken in the MEM stage which implies there are instructions in the Instruction Decode (ID) and Execute (EX) stage that must be flushed, to become NOP operations.
    - A flush occurs by "clearing" all the pipeline registers whose instruction in them would not have been executed in a single stage processor.
    - With the control signals defined as true, 1, to assert their control function such as a branch to be taken or to write to the register file, clearing these control signals, making them 0, disables their operation by setting them to false
    - The pipeline registers are defined to have a clear control input in addition to the clock signal
      - If the clear input is true, instead of latching in the input signals upon the CPU positive clock edge into the register file, all the registers effectively latch in 0s. The instruction that moves to the

# next stage is replaced with all zeros or equivalent to a NOP operation



- The Instruction Fetch (IF) stage does not need to be flushed because the 2:1 mux that you have added is effectively changing the IF stage's instruction request to the branch address
- The flushing of pipeline stages is performed in the ca\_pipeline\_control.codal file
  - Inside the pipe\_control event, add an IF statement based on the signal that you have added signifying that a branch has been taken. It is the same signal used as the select line for the instruction memory address mux



pc+4 to target address



#### If true:

- pipe.ID.clear(): This command sets the clear input signal to the ID pipeline register as true for this clock cycle.
   Transforming the instruction in the ID pipeline stage to a NOP operation.
- pipe.EX.clear(): This command sets the clear input signal to the EX pipeline register as true for this clock cycle.
   Transforming the instruction in the EX pipeline stage to a NOP operation.
- else false:
  - No requirement to disable the pipeline clear. CodAL disables clear input to the pipeline register any clock cycle that it is not actively specified to be clear
- pipe.MEM.clear() is not included in the flushing of instructions because the instruction in the MEM, memory, stage is not an earlier instruction than the branch or jump. It is actually the branch or jump instruction.
- The pipeline control is now complete for the branch operation

## **Checkpoint 1: Validating branch operations**

# **Assignment 7: Checkpoint 1: Validating branch instructions**

To validate a branch instruction, you need to validate its conditional statement,



whether it branches to the target address correctly (both forward and backwards), and whether it correctly handles the associated Control Hazard.

This checkpoint will be used to validate that one branch operation, beq, branches correctly and handles its associated Control Hazard. If an error is indicated, the video comments will help indicate what the failure could be such as a lack of NOP bubbles inserted in the ID and/or EX pipeline registers.

- Import the control\_inst\_assembly\_test into your Codasip workspace
  - git clone https://github.com/CompOrg-RISCV/control\_inst\_assembly\_test.git
  - Right click the Assign SDK for this project in the workflow perspective and assign your project(ia) model
- Set the standard assembly Compiler Configuration to:
  - No Startup or Default Lib (-nostdlib)
- Set the standard assembly Debug Configuration to Stop at Startup after 0 instructions
- Launch the debugger for control\_inst\_assembly\_test assembly program using your CA model
- Debug control inst assembly test
  - Place breakpoints on the following lines:
    - 35 and 161: If the program reaches these breakpoints/halts, the branch operations have failed
    - 52: If the program reaches this breakpoint, the branches appear to be working
      - Confirm successful branches by comparing your register results with the program's comment statements
  - Step through the program
    - If you reach the halt after the FAIL label, a branch occurred when it should not have
      - Time to debug
    - If you reach the halt after PASS2 label, your branches successfully completed
      - Verify that your NOP bubbles were correctly implemented
        - x3 = 1, a branch not taken did not create a NOP bubble
        - x4 = 0, EX stage NOP bubble occurred as required
        - x5 = 0, ID stage NOP bubble occurred as required
- Note: This test only tested the beq instruction, branch on equal. It is not a
  comprehensive test that evaluates all the branch instructions. A comprehensive test
  should be included in your regression test which will be covered in Checkpoint 3
- Once you have completely pass control\_inst\_assembly\_test, you have completed
   Checkpoint 1 and ready to proceed to the next phase of this assignment

- The conditional branch functions perform the decision of IF statements and FOR/WHILE loops. With the branch instructions using both the rs1 and rs2 fields of the binary instruction to make decisions (compare) and the branch immediate field for the offset, which overlaps the rd (destination) field, these instructions cannot be used to store a return address required by a call to a function. Jump instructions are used for the function calls.
  - The return address is the current PC (pointing to the jump instruction) plus one instruction word, 4 bytes for a 32-bit instruction word. PC + Instruction word equates to the instruction following the jump instruction



- Go to ca\_defines.codal and add the enum ALU\_SRC1\_SEL\_FOUR for the EX stage src1 mux for the constant 4
- Add the enum ALU\_SRC2\_PC for the EX stage src2 mux to select the r\_id\_pc
- Next, go to the ex event in the EX stage and add case statements for both of ALU operand muxes using these new enums.
  - The src1 constant enum assigns the constant 4 to the ALU src1 signal
  - The src2 constant enum assigns the ID pipeline pc register to the ALU src2 signal
- The updates to enable the return address calculation through the EX stage ALU is now complete
- To perform this return address calculation, make the appropriate changes in ca\_decoders.codal for the i\_hw\_jtype\_jlink and i\_hw\_itype\_jlreg elements
  - First, you will need to **uncomment** these element frameworks
  - For these elements, s\_id\_jump\_inst is set to true to pass this control signal to the EX and MEM stages
  - For the jal instructions, set s\_id\_branch\_inst = true. For the jalr, set s id branch inst = false.
    - s\_id\_branch\_inst = true indicates that the register to be used for the address for either a branch or jump instruction will be the PC register

- false indicates the register will be from the Register File and will need to be checked for data hazard conditions and where appropriate, perform data forwarding
- s\_id\_jump\_inst indicates that the jump is unconditional and not conditional as in a branch instruction. If the s\_id\_jump\_inst is true in the MEM stage, a jump to the new target address will occur
- For the aluop control signal, set them both to ALU\_ADD since we are going to be adding the pc value (equal to source 2) and 4 (equal to source 1)
- You will need to assign a value to the remaining control signals defined in the instruction decoder
- As you did for the branch decoder element, add both of these elements to the inst\_decode set at the top of ca\_decoder.codal
- With these changes, the decoder will set all the jump control-signals for the EX and MEM stages
- The branch/jump target address unit must be updated to provide the jump address calculated address for jump instructions
  - With the jump address equation including a register file value and an immediate, the input element for the register operand must include data hazard detection in the ID stage and data forwarding in the EX stage
  - Go to ca\_defines.codal, create an enum for a new EX stage mux, jump register operand. The mux input options include:
    - src1 register value
    - EX alu\_result data forwarding
    - MEM alu result data forwarding
    - ID PC



We need to potentially forward values to this mux because the jalr instruction uses the value in rs1. If that value is updated in prior

instructions, we need to use the updated value and not the previous one for the jalr calculations.

 You will need to create a #define that auto selects the number of bits required for this mux select using the bitsizeof() function.
 You can refer to examples at the top of ca\_defines.hcodal

FAQ: bitsizeof() macro: Automation enables increased efficiency with less errors. The



bitsizeof() macro can be used to automate the declaration of the signal width of the multiplexer select lines. This video describes what the bitsizeof() macro returns and how it is used to declare the number of multiplexer select lines. The video walks through an example where bitsizeof() is used in a processor's Cycle Accurate (CA) project files; ca\_defines.hcodal and ca\_resources.codal.

- Using the bitsizeof() is best coding practice for mux select lines as a mux may change in the number of inputs. The grading rubric includes using best coding practices
- You will need to go to ca\_resources.codal to create any signals and ID pipeline registers required
- In the ID stage, the data hazard detection logic was implemented in the previous assignment for the previous instruction types. Within this logic, set the control signal for the jump\_df mux accordingly. This control signal will be used in the EX stage to properly set the value to calculate the s\_ex\_target\_address.
- The ID stage is now updated for the jump instructions
- To complete the EX stage, go to the branch\_operation() event
  - To save time, for signals that are isolated to a signal event, can be declared in the event and not in ca\_defines.codal. View these local signals analogous to a c-program locally declared variables for a function. The ca\_defines.codal signals are analogous to variables stored on the heap, c-program variables that can be used by multiple functions that are not input arguments.
  - Declare a local 32-bit signal in branch\_operation() for the output of the jump reg operand mux. These local signals are declared inside the semantics statement before any code
    - ex: int32 jump\_reg\_value;

- Using the jump reg operand enums, the local signal output, jump\_reg\_value, of the jump reg operand mux, and the select lines from the ID pipeline register, create this mux before the branch target address assignment
  - You can use either a switch statement or an if/else statement
- Now, the target address can be either a branch target address or a jump target address. Because of that, we can't simply say s ex branch addr = PC + branch offset
- Instead, what we need to do is use jump\_register\_value we set in the mux. This value will change according to what instruction we are going to be using.
- So, now it should be,s\_ex\_branch\_addr = jump\_reg\_value + branch/jump offset
- The EX stage is now updated
- For the Memory(MEM) stage, the decision to take a branch/jump must be updated to include jump instructions
  - Branch instructions are conditional while the jump instructions are not-conditional / mandatory
  - The s\_me\_take\_branch signal is used in the Instruction Fetch (IF) stage to select the next PC address mux to output the branch target address and to clear the ID and EX pipeline registers in the ca\_pipe\_control.codal. To reuse all this code (or circuitry), you can just update the s\_me\_take\_branch to be true if a branch instruction condition is true OR if it is a jump instruction. Jump instructions are unconditional which means if the r\_ex\_jump\_inst is true, then s\_me\_take\_branch must be true
- The MEM stage has now been updated
- It is time to try revalidate your branch instructions and to validate your jump instructions

# **Checkpoint 2: Validating jump operations**

- For Checkpoint 2, you will continue to use the control\_inst\_assembly\_test project.
- Comment out the halt instruction after you branched to the PASS2 label, line 52. This
  will enable the code to reach the jump test code

- You will need to reassign your IA project to the control\_inst\_assembly\_test to compile the updated test with the halt statement on line 52 commented out
- No need to reassign the project to CA after the compile. Once the software project is aware of the CA model, it does not need to be reassigned
- Place breakpoints at:
  - o 35, 104, 124, 161: These halt statements will indicate a jump or branch failure
  - 86: Reaching this breakpoint with register x10 = 1 indicates that the branches and jumps in this routine successfully executed
- Step through the code to see that the jump operations performed as programmed. Refer
  to the comments to understand the flow of the code and whether your project
  successfully completed the test
- Note: The Checkpoint 2 test is not an exhaustive test, but covers the majority of cases including both negative and positive offset calculations

# **Checkpoint 3: Running your regression test**

- For Checkpoint 3, you will use your regression test
- Comment out the halt statements after the immediate instructions, the r-type instructions, shift-immediates, and the data-hazard detection / forwarding test code
- Your code should now run up to and through your regression test branch and jump test code and halting before the load and store set of instructions
- Debug your regression test and step through your code until it fails or it reaches the halt statement after your branch and jump test sequences
  - If your test fails an earlier test sequence that had passed, it most likely is an error in your processor's data-path or control-signals since these tests had passed in an earlier assignment
  - If your test fails within your branch and jump test sequence, you will need to evaluate whether the failure is in your regression test sequence and/or in your branch/jump CodAL implementation
- After your regression test passes all the way to the halt after your branch and jump sequence with correct results through all the tests, you are ready to submit your project for grading

Complete the phase

- Clean your SDK
- Export your SDK
- Rename the file to the standardname?
- Submit both your xml schematics and your compressed file into the Canvas assignment for phase 7.

# Appendix A: YouTube videos for Assignment 7

#### Assignment Videos:

#### • Assignment 7: Branch and Jump Instructions

• We need our software programs to make decisions based on input that could be a stimulus from a touchscreen or based on a particular data set. These decisions occur at the assembly programming level through branch instructions. Branches are conditional changes of program flow. Jumps on the other hand are not conditional changes of program flow. They are used by programs to enable programming best practices by supporting modularity and encapsulations through the use of function calls and their associated return.

These instructions are very useful instructions, but when program flow changes in a pipeline processor, it introduces a control hazard. In this assignment, you will learn how to properly change program flow as well as handle the associated control hazard.

 https://www.youtube.com/watch?v=OWc-RzXpy38&list=PLTUn6Ox9e6q2ienoql3KClFtRPMqO28uj&index=14

#### Assignment 7: Checkpoint 1: Validating branch instructions

 To validate a branch instruction, you need to validate its conditional statement, whether it branches to the target address correctly (both forward and backwards), and whether it correctly handles the associated Control Hazard.

This checkpoint will be used to validate that one branch operation, beq, branches correctly and handles its associated Control Hazard. If an error is indicated, the video comments will help indicate what the failure could be such as a lack of NOP bubbles inserted in the ID and/or EX pipeline registers.

 https://www.youtube.com/watch?v=YJbDHQDGKQ4&list=PLTUn6Ox9e6q2ienoql3K CIFtRPMqO28uj&index=15

#### Frequently Asked Questions (FAQs) Videos

- It is intended for students to provide them **real-time support** who have been assigned a project-based learning assignment, based on the Codasip Curriculum.
- FAQ: bitsizeof() macro
  - Automation enables increased efficiency with less errors. The bitsizeof()
    macro can be used to automate the declaration of the signal width of the
    multiplexer select lines. This video describes what the bitsizeof() macro
    returns and how it is used to declare the number of multiplexer select lines.
    The video walks through an example where bitsizeof() is used in a
    processor's Cycle Accurate (CA) project files; ca\_defines.hcodal and
    ca resources.codal.
  - https://www.youtube.com/watch?v=SF6edheACHk&list=PLTUn6Ox9e6q1ii0fp-N\_GDPZjAtkDZmqe&index=13

#### FAQ: enums and switch statements

- enums is a level of abstraction that makes programming easier and less error prone. The enum is a list of symbols where each symbol represents a distinct number which can be used by a switch statement's case statements. A switch statement will execute whose case's value matches the switch statement variable. An important benefit of the enum is not just the abstraction of a number, but the automation of any change of enumeration's value. For example, if the enum's symbol changes value from 0 to 1, all locations throughout the design will now be evaluated as a 1 instead of 0. This automation improves efficiency while reducing errors.
- https://www.youtube.com/watch?v=UNbDe-XOCWY&list=PLTUn6Ox9e6q1ii0fp-N GDPZjAtkDZmge&index=14