## Chapter 7 - Microarchitecture 




## Exercises:

#### Exercise 7.1

(a) <br> 
Based on table 7.5 (p.387), `RegWrite` stuck at 0 will cause `sw`, `beq` and `j` to remain functional. This makes sense since no register value is altered during those instruction calls. On the other hand, all R-type instructions, along with `lw` and `addi` will no longer be functional since they involve changing the values of certain registers. 

(b) <br> 
Based on table 7.5, `ALUOp_1` stuck at 0 will cause R-type instruction to not be functional. The Control Unit sets `ALUOp` to `01` and `00` when the input instruction is not of type R (i.e. `opcode` not full of zero). If the instruction is of type R, the Control Unit sets `ALUOp` to `10` and the <i>stuck-at-0</i> fault will corrupt the result.     <br>

(c) <br> 
Based on table 7.5, `MemWrite` stuck at 0 will only cause `sw` instruction to not be functional. This result is expected since `sw` is the only instruction that overrides part of the data memory.   <br>




#### Exercise 7.2


(a) <br> 
Based on table 7.5, `RegWrite` stuck at 1 will cause `sw`, `beq` and `j` to not be functional. This is expected since those instruction do not intend to override the value of a register. 

(b) <br> 
Based on table 7.5, `ALUOp_1` stuck at 1 will cause `lw`, `sw`, `beq` and `addi` instructions to not be functional.   <br>

(c) <br> 
Based on table 7.5, `MemWrite` stuck at 1 will cause R-type, `lw`, `beq`, `addi` and `j` instructions to not be functional.   <br>




#### Exercise 7.3

To make things more interesting, all instructions are added together. Additionally, the starting single-cycle MIPS processor also includes the hardware for `j` and `addi` instructions: <br> 

(a) <br>
`sll` requires to use a left logical shifter, where the `shamt` value comes from instruction bits 10:6 and the input from signal from `SrcB`. A 4:1 multiplexer, controlled by the signal `DataMemSrc` is added to control the Data memory input. When `DataMemSrc = 01`, `MemToReg = 0`, `RegWrite = 1` and `RegDst = 1`, the shifted value is saved in register `rd`. Lastly, since `sll` and other r-type instructions have the same opcode, their funct field ($\text{instr}_{5:0}$) can be used to differentiate `sll` from other r-type instructions.    <br>

(b)<br>
`lui` requires a left logical shifter with a constant shamt value of 16. When `DataMemSrc = 10`, `MemToReg = 0`, `RegWrite = 1` and `RegDst = 0`, the 16 bits of the instruction's immediate are saved to the 16 msb of register `rt`, with its 16 lsb set to 0. 

(c)<br> 
`slti` requires to take the already existing signal `ALUResult` obtained after an ALU subtraction (`ALUOp = 01`) and right logical shift it by 31, such that `ALUResultNeg = 1` is `SignImm` > `SrcA`. When `DataMemSrc = 11`, `MemToReg = 0`, `RegWrite = 1` and `RegDst = 0`, the value of `ALUResultNeg` is saved to register `rt`. 


(d)<br>
`blez` uses the LSB of `ALUResultNeg` following a subtraction of the ALU, such that if `Branch = 1` and `BranchLessThan = 1`, then `ALUResult <= 0` will trigger `PCSrc = 1` which will then increment the program counter by the instruction's immediate value. 

<img src="images\P7_3_table.PNG" />
<img src="images\P7_3_Circuit.PNG" />



#### Exercise 7.4




(a) `jal` can be called by: 
1. Setting `MemToReg = 10` such that `PCPlus4` is used as `Result`, which can then store in the `$ra` register by setting `RegDst = 10`. 
2. Simultaneously setting `JumpR = 1` and `Jump = 1` will set `PC` to be set to JTA, like it should in the normal `j` instruction. 

(b) `lh` can be called by setting `ALUSrc = 01`, `ALUOp = 00` and `MemToReg = 11` such that only the sign-extended least significant halfword is loaded. <br> 

(c) `jr` is tricky to implement since its an r-type instruction that does not request register write (unlike other r-type instruction implemented) and it also overrides the value of `PC`. A Control Unit update is necessary. The chosen update here is to add 2 outputs to the ALU decoder truth table: 
1. `JumpR` allows to directly override the content of the `PC` register with the content of `$rs`. 
2. `RegWriteALU` allows to differentiate R-type instructions that need to override register values from those who don't. A RegWrite Truth Table is added to control the final value of `RegWrite`. 

(d) `srl` requires a modification of the ALU such that it can handle shift right logical. 

<img src="images\P7_4_MIPS.PNG" />
<img src="images\P7_4_ALU.PNG" />


<img src="images\P7_4_ALUDecTruthTable.PNG" />



#### Exercise 7.5

It would be impossible to not modify the register file box, since `lwinc` requires two register write operation on a single-cycle processor. 

Extra hardware (highlighted in blue)  can be added in order to increment `$rs` by 4 while performing the `addi $rt, imm($rs)` using the base hardware: <br> 

<img src="images\P7_5.PNG" />
<img src="images\P7_5_table.PNG" />


#### Exercise 7.6

The data and control paths must be changed such that they also include the presence of the floating values registers part of the <i>coprocessor I</i> (refer to section 6.7.4.). The following control signals must also be added: <br> 
1. `FloatFlip`: Flip the input of the floating point adder to obtain a floating point subtractor. 
2. `FloatWriteSrc`: Allows to save either the output of the floating point adder or multiplicator. 
3. `RegWriteFloat`: Allows to override the values of the floating point registers when `add.s`, `subs.s` and `mul.s` are called. 

Note that the `cop` field does not need to be taken into account in this problem since we are only dealing with single precision foating point values. 

<img src="images\P7_6.PNG" />
<img src="images\P7_6_table.PNG" />

#### Exercise 7.7

Based on equation 7.3 (p.388), one can assert halving the time delay of memory reads(`t_mem`) should reduce the most the minimal cycle time. The base cycle time of a single-cycle process using delays of table 7.6 is calculated in example 7.4 and is 925ps (assuming `lw` is the slowest instruction). Halving `t_mem` reduces this value to 675ps. 





#### Exercise 7.8


Based on the baseline cycle time obtained in example 7.4 of 925ps, a 20ps reduction in the ALU delay will result in a 905 ps cycle time. 

100 billions instruction should take: <br> 
$(100 \cdot 10^9 \text{instruction})(905\cdot 10^{-9}\dfrac{\text{sec}}{\text{instruction}}) = 90500\text{ seconds or }25.14 \text{ hr}$




#### Exercise 7.9

In this problem, it is assumed the FSM in Figure 7.42 is used. 

(a) `MemtoReg` stuck at at zero would cause `lw` instruction only to fail because it is the only instruction overriding register values using data memory. 

(b) `ALUOP_0` stuck at zero would cause the `beq` instruction only to fail since `ALUOp = X1` is associated to a subtraction from an I-type instruction, and `beq` performs a subtraction. 

(c) `PCSrc` stuck at zero would also cause the `beq` instruction only to fail since it is needed in state 8 to transmit the new `PC` value to the `PC` registers if `PCEn = 1`.  


#### Exercise 7.10

In this problem, it is assumed the FSM in Figure 7.42 is used. 

(a) `MemtoReg` stuck at at one would cause `R-type` instruction only to fail because the signal `ALUout` holding the result of r-type operation could not be written in the register. 

(b) `ALUOP_0` stuck at one would cause every instruction to fail because the first state of the multicycle processor FSM requires to increment the PC by 4, which require the use of the ALU to perform a sum (`ALUOp = 00`). 

(c) `PCSrc` stuck at one would also cause every instruction to fail for the same reason as stated in (b). `PCSrc = 0` is required to stored the result of `PC+4` into the PC register.  


#### Exercise 7.11
The operation `sll` is added to the HDL code. The instruction `sll $2, $2, 3` (000210C0) is added as the second last instruction of the test input, which result in register `$2` multipled by 3 before being stored in `mem[84]`. Thus `mem[84] = 56`

<b>memfile.dat</b>
```
20020005
2003000c
2067fff7
00e22025
00642824
00a42820
10a7000a
0064202a
10800001
20050000
00e2202a
00853820
00e23822
ac670044
8c020050
08000011
20020001
000210C0
ac020054
```
<b>Verilog code</b>
```
// ========== CHANGED ========== 
// added signal dataMemSrc
module mips(input logic clk, reset,
            output logic [31:0] pc,
            input logic [31:0] instr,
            output logic memwrite,
            output logic [31:0] aluout, writedata,
            input logic [31:0] readdata);
  logic memtoreg, alusrc, regdst, regwrite, jump, pcsrc, zero, dataMemSrc; 
  logic [2:0] alucontrol;
  controller c(instr[31:26], instr[5:0], zero, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, dataMemSrc);
  datapath dp(clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, zero, pc, instr, aluout, writedata, readdata, dataMemSrc);
endmodule


// ========== CHANGED ========== 
// added signal dataMemSrc
module controller(input logic [5:0] op, funct,
                  input logic zero,
                  output logic memtoreg, memwrite,
                  output logic pcsrc, alusrc,
                  output logic regdst, regwrite,
                  output logic jump,
                  output logic [2:0] alucontrol,
                  output logic dataMemSrc);
  logic [1:0] aluop;
  logic branch;
  maindec md(op, memtoreg, memwrite, branch, alusrc, regdst, regwrite, jump, aluop);
  aludec ad(funct, aluop, dataMemSrc, alucontrol);
  assign pcsrc = branch & zero;
endmodule


module maindec(input logic [5:0] op,
               output logic memtoreg, memwrite,
               output logic branch, alusrc,
               output logic regdst, regwrite,
               output logic jump,
               output logic [1:0] aluop);

  logic [8:0] controls;
  assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;
  always @(*) // replacement for always_comb
    case(op)
      6'b000000: controls <= 9'b110000010; // RTYPE
      6'b100011: controls <= 9'b101001000; // LW
      6'b101011: controls <= 9'b001010000; // SW
      6'b000100: controls <= 9'b000100001; // BEQ
      6'b001000: controls <= 9'b101000000; // ADDI
      6'b000010: controls <= 9'b000000100; // J
      default: controls <= 9'bxxxxxxxxx; // illegal op
    endcase
endmodule


// ========== CHANGED ========== 
// Added instruction sll 
module aludec(input logic [5:0] funct,
              input logic [1:0] aluop,
              output logic dataMemSrc,           // added output for 7.10  
              output logic [2:0] alucontrol);
  
  logic[3:0] control; 
  assign {alucontrol, dataMemSrc} = control;   
  
  always @(*) // replacement for always_comb
        case(aluop)
        2'b00: control <= 4'b0100; // add (for lw/sw/addi)
        2'b01: control <= 4'b1100; // sub (for beq)
        default: case(funct) // R-type instructions
            6'b100000: control <= 4'b0100; // add
            6'b100010: control <= 4'b1100; // sub
            6'b100100: control <= 4'b0000; // and
            6'b100101: control <= 4'b0010; // or
            6'b101010: control <= 4'b1110; // slt
            6'b000000: control <= 4'b1111; // sll   (added for 7.10) 
            default: control <= 4'bxxxx; // ???
        endcase
    endcase
endmodule

// ========== CHANGED ========== 
// added signal dataMemSrc + shift logic operation bypassing ALU 
module datapath(input logic clk, reset,
                input logic memtoreg, pcsrc,
                input logic alusrc, regdst,
                input logic regwrite, jump,
                input logic [2:0] alucontrol,
                output logic zero,
                output logic [31:0] pc,
                input logic [31:0] instr,
                output logic [31:0] aluout, writedata,
                input logic [31:0] readdata, 
                input logic dataMemSrc);
  
    logic [4:0] writereg;
    logic [31:0] pcnext, pcnextbr, pcplus4, pcbranch;
    logic [31:0] signimm, signimmsh;
    logic [31:0] srca, srcb;
    logic [31:0] result;
    logic [31:0] aluout_post; // added for 7.10  
  
    // next PC logic
    flopr #(32) pcreg(clk, reset, pcnext, pc);
    adder pcadd1(pc, 32'b100, pcplus4);
    sl2 immsh(signimm, signimmsh);
    adder pcadd2(pcplus4, signimmsh, pcbranch);
    mux2 #(32) pcbrmux(pcplus4, pcbranch, pcsrc, pcnextbr);
    mux2 #(32) pcmux(pcnextbr, {pcplus4[31:28],
    instr[25:0], 2'b00}, jump, pcnext);

    // register file logic
    regfile rf(clk, regwrite, instr[25:21], instr[20:16], writereg, result, srca, writedata);
    mux2 #(5) wrmux(instr[20:16], instr[15:11], regdst, writereg);
    mux2 #(32) resmux(aluout_post, readdata, memtoreg, result); // changed for 7.10 
    signext se(instr[15:0], signimm); 
  
    // ALU logic
    mux2 #(32) srcbmux(writedata, signimm, alusrc, srcb);
    alu32 alu(srca, srcb, alucontrol, aluout, zero);

	// added sll operation
    logic[31:0] sll_out; 
    assign sll_out = srcb << instr[10:6]; 
     mux2 #(32) sll_mux(aluout, sll_out, dataMemSrc, aluout_post);
   
endmodule



module regfile(input logic clk,
               input logic we3,
               input logic [4:0] ra1, ra2, wa3,
               input logic [31:0] wd3,
               output logic [31:0] rd1, rd2);
  
    logic [31:0] rf[31:0];
    // three ported register file
    // read two ports combinationally
    // write third port on rising edge of clk
    // register 0 hardwired to 0
    // note: for pipelined processor, write third port
    // on falling edge of clk
   always @(posedge clk)// replaced for always_ff
     if (we3) rf[wa3] <= wd3;
            assign rd1 = (ra1 != 0) ? rf[ra1] : 0;
            assign rd2 = (ra2 != 0) ? rf[ra2] : 0;
  		// added for debugging 
  
  //always @(posedge clk)
  //  $display("[%d, %d, %d, %d, %d]",rf[2], rf[3], rf[4], rf[5], rf[7]); // debug logic
  
endmodule


module adder(input logic [31:0] a, b,
             output logic [31:0] y);
    assign y = a + b;
endmodule

module sl2(input logic [31:0] a,
           output logic [31:0] y);
    // shift left by 2
    assign y = {a[29:0], 2'b00};
endmodule

module signext(input logic [15:0] a,
               output logic [31:0] y);
    assign y = {{16{a[15]}}, a};
endmodule


module flopr #(parameter WIDTH = 8)
              (input logic clk, reset,
              input logic [WIDTH-1:0] d,
              output logic [WIDTH-1:0] q);
  always @(posedge clk, posedge reset) // replaced for always_ff
        if (reset) q <= 0;
        else q <= d;
endmodule


module mux2 #(parameter WIDTH = 8)
             (input logic [WIDTH-1:0] d0, d1,
             input logic s,
             output logic [WIDTH-1:0] y);
    assign y = s ? d1 : d0;
endmodule

// taken from 5.9 
module alu32(input logic [31:0] A, B,
             input logic [2:0] F,
             output logic [31:0] Y, 
             output logic zero);

  logic [31:0] S, Bout;
  assign Bout = F[2] ? ~B : B;
  assign S = A + Bout + F[2];
  assign zero = (Y==32'b0); 
  always@(*)
    case (F[1:0])
      2'b00: Y <= A & Bout;
      2'b01: Y <= A | Bout;
      2'b10: Y <= S;
      2'b11: Y <= S[31];
    endcase

endmodule
```
<b>Test Code</b>

```
module testbench();
  logic clk;
  logic reset;
  logic [31:0] writedata, dataadr;
  logic memwrite;

  // instantiate device to be tested
  top dut (clk, reset, writedata, dataadr, memwrite);

  // initialize test
  initial
    begin
    reset <= 1; # 22; reset <= 0;
    end

  // generate clock to sequence tests
  always
    begin
    clk <= 1; # 5; clk <= 0; # 5;
    end

  // check results
  always @(negedge clk)
    begin
      if (memwrite) 
        begin
          if (dataadr===84 & writedata===56) begin
              $display("Simulation succeeded");
              $stop;
          end 
      else 
        if (dataadr !==80) 
          begin
              $display("Simulation failed");
              $stop;
            end
      end
    end
endmodule



module top(input logic clk, reset,
           output logic [31:0] writedata, dataadr,
           output logic memwrite);
  logic [31:0] pc, instr, readdata;

  // instantiate processor and memories
  mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata);
  imem imem(pc[7:2], instr);
  dmem dmem(clk, memwrite, dataadr, writedata, readdata);
endmodule


module dmem(input logic clk, we,
            input logic [31:0] a, wd,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  assign rd = RAM[a[31:2]]; // word aligned
  always @(posedge clk) // reaplaced always_ff
  if (we) RAM[a[31:2]] <= wd;
endmodule



module imem(input logic [5:0] a,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  initial
    $readmemh("memfile.dat", RAM);
    assign rd = RAM[a]; // word aligned
endmodule
```

#### Exercise 7.12

`jal`, `lh` and `jr` are functional, but not `srl`. 

<b> Verilog Code</b> 
```
// ========== UPDATED ==========
// added control signal jumpR
// Generalized control signals  regdst, alusrc and memtoreg to 2 bits. 
module mips(input logic clk, reset,
            output logic [31:0] pc,
            input logic [31:0] instr,
            output logic memwrite,
            output logic [31:0] aluout, writedata,
            input logic [31:0] readdata);
  logic regwrite, jump, pcsrc, zero, jumpR; 
  logic [1:0] regdst, alusrc, memtoreg;
  logic [5:0] aluCtrlOutput;
  
  logic [3:0] alucontrol;
  controller c(instr[31:26], instr[5:0], zero, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, jumpR, aluCtrlOutput);
  datapath dp(clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, zero, pc, instr, aluout, writedata, readdata, jumpR, aluCtrlOutput);
endmodule


// ========== UPDATED ==========
// added output control signal jumpR
// Generalized control signals  regdst, alusrc and memtoreg to 2 bits while preserving arguments orderse
module controller(input logic [5:0] op, funct,
                  input logic zero,
                  output logic [1:0] memtoreg, 
				  output logic memwrite, pcsrc, 
				  output logic [1:0] alusrc,
                  output logic [1:0] regdst, 
				  output logic regwrite,
                  output logic jump,
                  output logic [3:0] alucontrol,
				  output logic jumpR, 
				  output logic [5:0] aluCtrlOutput
);
				
			
  logic [1:0] aluop;
  logic branch;
  logic RegWriteMainDec; 
  //logic [5:0] aluCtrlOutput;
  logic RegWriteALU; 
  
  maindec md(op, memtoreg, memwrite, branch, alusrc, regdst, RegWriteMainDec, jump, aluop);
  aludec ad(funct, aluop, aluCtrlOutput);
  
  assign jumpR = aluCtrlOutput[1]; 
  assign RegWriteALU = aluCtrlOutput[0];
  assign alucontrol = aluCtrlOutput[5:2];
  
  //always @(RegWriteALU, RegWriteMainDec)
  //  $display(" RegWriteALU, RegWriteMainDec = [%b, %b]",RegWriteALU, RegWriteMainDec); // debug logic
  
  assign regwrite = RegWriteALU | RegWriteMainDec; // based on third truth table added in 7.
  assign pcsrc = branch & zero;
endmodule

// ========== UPDATED ==========
// updated truth table and control vector 
module maindec(input logic [5:0] op,
               output logic [1:0] memtoreg, 
			   output logic memwrite,
               output logic branch, 
               output logic [1:0] alusrc, regdst, 
			   output logic RegWriteMainDec,
               output logic jump,
               output logic [1:0] aluop);

  logic [11:0] controls;
  assign {RegWriteMainDec, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;
  always @(*) // replacement for always_comb
    case(op)
      6'b000000: controls <= 12'b001000000010; // RTYPE (+SRL, JR)
      6'b100011: controls <= 12'b100010001000; // LW
      6'b101011: controls <= 12'b000010100000; // SW
      6'b000100: controls <= 12'b000001000001; // BEQ
      6'b001000: controls <= 12'b100010000000; // ADDI
      6'b000010: controls <= 12'b000000000100; // J
	  6'b000011: controls <= 12'b110000010100; // JAL
      6'b100001: controls <= 12'b100010011000; // LH
      default: controls <= 12'bxxxxxxxxxxxx; // illegal op
    endcase
endmodule

// ========== UPDATED ==========
// updated truth table and aluCtrlOutput vector 
module aludec(input logic [5:0] funct,
              input logic [1:0] aluop,
              output logic [5:0] aluCtrlOutput);
    always @(*) // replacement for always_comb
        case(aluop)
        2'b00: aluCtrlOutput <= 6'b001010; // add (for lw/sw/addi)
        2'b01: aluCtrlOutput <= 6'b101010; // sub (for beq)
        default: case(funct) // R-type instructions
            6'b100000: aluCtrlOutput <= 6'b001011; // add
            6'b100010: aluCtrlOutput <= 6'b101011; // sub
            6'b100100: aluCtrlOutput <= 6'b000011; // and
            6'b100101: aluCtrlOutput <= 6'b000111; // or
            6'b101010: aluCtrlOutput <= 6'b101111; // slt
			6'b000010: aluCtrlOutput <= 6'b000111; // srl (ADDED)
            6'b001000: aluCtrlOutput <= 6'b101100; // jr (ADDED) 
            default: aluCtrlOutput <= 6'bxxxxxx; // ???
        endcase
    endcase
	
	
	  //always @(aluCtrlOutput, aluop)
	//	$display(" aluCtrlOutput, funct = [%d, %d]",aluCtrlOutput, funct); // debug logic
endmodule

// ========== UPDATED ==========
// Updated input signal size, added jumpR, added 2:1 multiplexor + updated 3 other mutliplexors 
module datapath(input logic clk, reset,
                input logic [1:0] memtoreg, 
				input logic pcsrc,
                input logic [1:0] alusrc, regdst,
                input logic regwrite, jump,
                input logic [3:0] alucontrol,
                output logic zero,
                output logic [31:0] pc,
                input logic [31:0] instr,
                output logic [31:0] aluout, writedata,
                input logic [31:0] readdata, 
				input logic jumpR,
				input logic [5:0] aluCtrlOutput
				);
				
	logic [4:0] writereg;
	logic [31:0] pcnext, pcnextbr, pcplus4, pcbranch;
	logic [31:0] signimm, signimmsh;
	logic [31:0] srca, srcb;
	logic [31:0] result;
	logic [4:0] shamt; 
	logic [31:0] pcnextnext; 
  	logic[15:0] RCLower; 
  	assign RCLower = readdata[15:0];
	
	// Update: changed pcnext for pcnextnext
    // next PC logic
	flopr #(32) pcreg(clk, reset, pcnextnext, pc);
	 
    adder pcadd1(pc, 32'b100, pcplus4);
    sl2 immsh(signimm, signimmsh);
    adder pcadd2(pcplus4, signimmsh, pcbranch);
    mux2 #(32) pcbrmux(pcplus4, pcbranch, pcsrc, pcnextbr);
    mux2 #(32) pcmux(pcnextbr, {pcplus4[31:28], instr[25:0], 2'b00}, jump, pcnext);
	

	// Update: Added 2:1 multiplexor controlled by jumpR between PC register and jump multiplexor
	mux2 #(32) pcmux2(srca, pcnext, jumpR, pcnextnext);

    // register file logic
    regfile rf(clk, regwrite, instr[25:21], instr[20:16], writereg, result, srca, writedata, aluCtrlOutput);
    
	
	// Update regdst multiplexor
	// mux2 #(5) wrmux(instr[20:16], instr[15:11], regdst, writereg);
  mux4 #(5) wrmux(instr[20:16], instr[15:11], 5'b11111, 5'bxxxxx, regdst, writereg);
    
	
	// Update MemToReg multiplexor 
	// mux2 #(32) resmux(aluout, readdata, memtoreg, result);
  mux4 #(32) resmux(aluout, readdata, pcplus4, {{16{RCLower[15]}}, RCLower}, memtoreg, result); 
	
    signext se(instr[15:0], signimm);

    // ALU logic
    // Update ALU logic multiplexor 
	// mux2 #(32) srcbmux(writedata, signimm, alusrc, srcb);
  assign shamt = instr[10:6];
	mux4 #(32) srcbmux(writedata, signimm, 32'b0, {{27{shamt[4]}}, shamt}, alusrc, srcb); 


    alu32 alu(srca, srcb, alucontrol, aluout, zero);
endmodule



module regfile(input logic clk,
               input logic we3,
               input logic [4:0] ra1, ra2, wa3,
               input logic [31:0] wd3,
               output logic [31:0] rd1, rd2,
			   input logic [5:0] aluCtrlOutput
			   );
    
	
	logic [31:0] rf[31:0];
    // three   ported register file
    // read two ports combinationally
    // write third port on rising edge of clk
    // register 0 hardwired to 0
    // note: for pipelined processor, write third port
    // on falling edge of clk
   always @(posedge clk)// replaced for always_ff
        if (we3) rf[wa3] <= wd3;
              assign rd1 = (ra1 != 0) ? rf[ra1] : 0;
            assign rd2 = (ra2 != 0) ? rf[ra2] : 0;
    
   always @(posedge clk)
     $display("[%d, %d, %d, %d, %d], we3 = %b, aluCtrlOutput = %b",rf[2], rf[3], rf[4], rf[5], rf[7], we3, aluCtrlOutput); // debug logic
  
endmodule


module adder(input logic [31:0] a, b,
             output logic [31:0] y);
    assign y = a + b;
endmodule

module sl2(input logic [31:0] a,
           output logic [31:0] y);
    // shift left by 2
    assign y = {a[29:0], 2'b00};
endmodule

module signext(input logic [15:0] a,
               output logic [31:0] y);
    assign y = {{16{a[15]}}, a};
endmodule


module flopr #(parameter WIDTH = 8)
              (input logic clk, reset,
              input logic [WIDTH-1:0] d,
              output logic [WIDTH-1:0] q);
  always @(posedge clk, posedge reset) // replaced for always_ff
        if (reset) q <= 0;
        else q <= d;
endmodule


module mux2 #(parameter WIDTH = 8)
             (input logic [WIDTH-1:0] d0, d1,
             input logic s,
             output logic [WIDTH-1:0] y);
    assign y = s ? d1 : d0;
endmodule

// ADDED mux4
module mux4 #(parameter WIDTH = 8)
			(input logic [WIDTH-1:0] d0, d1, d2, d3,
			input logic [1:0] s,
			output logic [WIDTH-1:0] y);
	assign y = s[1] ? (s[0] ? d3 : d2): (s[0] ? d1 : d0);
endmodule

// taken from 5.9 
module alu32(input logic [31:0] A, B,
             input logic [3:0] F, // updated input control vector to 4 digits
             output logic [31:0] Y, 
             output logic zero);

  logic [31:0] S, Bout;
  assign Bout = F[3] ? ~B : B;
  assign S = A + Bout + F[3];
  assign zero = (Y==32'b0); 
  always@(*)
    case (F[2:0])
      3'b000: Y <= A & Bout;
      3'b001: Y <= A | Bout;
      3'b010: Y <= S;
      3'b011: Y <= S[31];
	  3'b100: Y <= A >> B; // added shift right operation   
    endcase

endmodule
```

<b>Test Module</b> 
```
module testbench();
  logic clk;
  logic reset;
  logic [31:0] writedata, dataadr;
  logic memwrite;

  // instantiate device to be tested
  top dut (clk, reset, writedata, dataadr, memwrite);

  // initialize test
  initial
    begin
    reset <= 1; # 22; reset <= 0;
    end

  // generate clock to sequence tests
  always
    begin
    clk <= 1; # 5; clk <= 0; # 5;
    end

  // check results
  always @(negedge clk)
    
    
    if (memwrite & dataadr===88) 
      begin
      $stop;
      end
        //begin
        //      if (dataadr===84 & writedata===7) begin
        //      $display("Simulation succeeded");
        //      $stop;
        //  end 
      //else 
        //if (dataadr !==80) 
        //  begin
        //      $display("Simulation failed");
        //      $stop;
        //    end
      // end
    //end
endmodule



module top(input logic clk, reset,
           output logic [31:0] writedata, dataadr,
           output logic memwrite);
  logic [31:0] pc, instr, readdata;

  // instantiate processor and memories
  mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata);
  imem imem(pc[7:2], instr);
  dmem dmem(clk, memwrite, dataadr, writedata, readdata);
endmodule


module dmem(input logic clk, we,
            input logic [31:0] a, wd,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  assign rd = RAM[a[31:2]]; // word aligned
  always @(posedge clk) // reaplaced always_ff
  if (we) RAM[a[31:2]] <= wd;
endmodule



module imem(input logic [5:0] a,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  initial
    $readmemh("memfile.dat", RAM);
    assign rd = RAM[a]; // word aligned
endmodule

```


<b> Test Input</b> 
```
# Assembly                    Description               Address    Machine
main:   addi  $2, $0, 5       # initialize $2 = 5       0          20020005  
        addi  $3, $0, 12      # initialize $3 = 12      4          2003000c  
        addi  $7, $3, −9      # initialize $7 = 3       8          2067fff7  
        or    $4, $7, $2      # $4 = (3 OR 5) = 7       c          00e22025  
        and   $5, $3, $4      # $5 = (12 AND 7) = 4     10         00642824  
        add   $5, $5, $4      # $5 = 4 + 7 = 11         14         00a42820  
        beq   $5, $7, end     # shouldn't be taken      18         10a7000a  
        slt   $4, $3, $4      # $4 = 12 < 7 = 0         1c         0064202a  
        beq   $4, $0, around  # should be taken         20         10800001  
        addi  $5, $0, 0       # shouldn’t happen        24         20050000  
around: slt   $4, $7, $2      # $4 = 3 < 5 = 1          28         00e2202a  
		add   $7, $4, $5      # $7 = 1 + 11 = 12        2c         00853820  
		sub   $7, $7, $2      # $7 = 12 − 5 = 7         30         00e23822  
		sw    $7, 68($3)      # [80] = 7                34         ac670044  
		lw    $2, 80($0)      # $2 = [80] = 7           38         8c020050  
		j end                 # should be taken         3c         08000011  
		addi  $2, $0, 1       # shouldn't happen        40         20020001  
end:    sw    $2, 84($0)      # write mem[84] = 7       44         ac020054  
============================================================================== ADDED 
		addi  $2, $2, 33      # $2 = 7 + 33 = 40        48         20420021
		srl   $2, $2, 3       # $2 = 40 >> 3 = 5        4c         000210C2 // did not work
        jal   point           # should be taken         50         0C000016 // worked 
		j     final           # should be taken         54         08000019
point: 	sw    $5, 80($0)      # [80] = 11               58         AC050050
		lh    $2, 80($0)      # $2 = 11                 5c         84020050 // worked 
		jr    $31             # jump to return address  60         03E00008 // worked
final:  addi  $5, $5, 1       # $5 = 11 + 1 = 12        64         20A50001
        addi  $5, $5, 1       # $5 = 12 + 1 = 13        68         20A50001
		$2, 88($0)            # write mem[88] = 11      44         AC020058  
```

<b>memfile.dat</b>
```
20020005
2003000c
2067fff7
00e22025
00642824
00a42820
10a7000a
0064202a
10800001
20050000
00e2202a
00853820
00e23822
ac670044
8c020050
08000011
20020001
ac020054
20420021
000210C2
0C000016
08000019
AC050050
84020050
03E00008
20A50001
20A50001
AC020058
```

#### Exercise 7.13

To make things more interesting, all instructions are added together. All modification to are highlighted in blue. <br> 

(a) The `srlv` instruction can be implemented by updating the hardware of the ALU such that it can handle right logical shift operation (see bottom ALU schematic). Additionally, it does not require an update in the state machine since r-type instructions are already handled, although it does require an update of the `ALU Decoder` since a new instruction function is handled (see ALU decoding truth table below). 


(b) The `ori` operation instruction can be implemented by adding state 9 to the control unit's FSM (see updated control unit's FSM schematic below). It also requires an update of the `ALU Decoder` truth table. 


(c) The `xori` operation instruction can be implemented by adding state 10 to the control unit's FSM. This instruction also requires an update of the hardware of the ALU such that it can handle xor instruction (see bottom ALU schematic). The `ALUOp` needs another bit to include the request for the ALU's `xor` operator since `xori` is an I-type instruction.  

(d) `jr` instruction can be implemented multiple ways. The method chosen here is to add state 11 to the FSM while updating the multi-cycle processor such `SrcA` can bypass the ALU through the 2-bit `PCSrc` signal. 

<img src="images\P7_13_ALUDecTruthTable.png" />

<img src="images\P7_13_ALU.PNG" />

<img src="images\P7_13_MIPS.PNG" />

<img src="images\P7_13_FSM.PNG" />

#### Exercise 7.14

#### Exercise 7.15

The base multi-cycle processor (figure 7.27) does not allow to override the value of `$rs`. This can be corrected by updating the `RegDst` multiplexor such that it can use `instr_[25:21]`. Since the multi-cycle processor allows for an arbitrary amount of register write per instruction, there is no need to alter the register file.  


<img src="images\P7_15.PNG"/>

#### Exercise 7.16

1. <b>Data path changes needed:</b> The register file for floating points must be added, along with a single precision multiplier and adder. A register must also be added at the output of the floating points register files in order to create state `S9` in the control FSM where the register file for floating point is overriden. 

2. <b>Control path changes needed:</b> 
The only additional control signal control needed is `regWriteFloat` which must be set to 1 in state `S9`. `FloatFlip` and `FloatWriteSrc` could also be added as control signals but it is more convenient to substitute them for bits contained in the `instr` signal. 



<img src="images\P7_16_table.PNG"/>
<img src="images\P7_16_MIPS.PNG"/>

#### Exercise 7.17
The controls FSM from 7.16 only need to change. State `S9` must be emptied from any action (since nothing happens during the first cycle of a floating point multiplication/addition) and state `S9` must transition to a new state `S10` where `RegWritefloat = 1`. 



#### Exercise 7.18

The critical path of the multistep MIPS processor has a delay of `325 ps` and is between the `PC` register and the `instr` register. On the other hand, the second largest critical path has a delay of `300 ps` and is between the register file output register and `PC` register. 


The biggest contributor of the critical path delay is the memory read, which takes `250 ps` of the `325 ps`. Reducing the memory read delay by `25 ps` would be just enough to match the delays of the first and second largest critical paths. 


#### Exercise 7.19

The ALU delay in the multicycle processor does not influence the delay of the critical path. Therefore, reducing the ALU delay by `20ps` would not reduce the baseline `325ps` cycle time of the MIPS multicycle processor. 




#### Exercise 7.20


$0.25(5) + 0.1(4) + 0.11(3) + 0.02(3) + 0.52(4) = 4.12$ CPI on average


$(4.12 \text{cycle/instruction})(100 \cdot 10^9\text{instruction})(325 \cdot 10^{-9}s/\text{cycle}) = 133900\text{s} = 37.194 \text{hr}$

#### Exercise 7.21

Below is a suggested multicycle MIPS architecture that accomodates a single input/output port for the register file. Note that: 
1. State `S1` is modified such that it only store the value of the first register. 
2. State `Sx` is added. It is used to store the value of the second register. It is also the only state where the register `AluResult` is momentarily disabled in order to store the information needed if the FSM transitions to state `S8` after `Sx`. 
3. The new state `Sx` is needed only for `r-type` instruction and `beq` since they use the second register read of the register file. 
4. The control signal `RegDst` is removed and replaced with a more regeral control signal `RegId` which is used both to pick a register to read or write. 

<img src="images\P7_21_MIPS.PNG"/>
<img src="images\P7_21_FSM.PNG"/>

#### Exercise 7.22


$0.25(5) + 0.1(4) + 0.11(3 + 1) + 0.02(3) + 0.52(4 + 1) = 4.75$ CPI on average


#### Exercise 7.23

In this machine code instruction: 
1. `beq` has a CPI of 3 and is executed 6 times
2. `addi` has a CPI of 4 and is executed 6 times
3. `j` has a CPI of 3 and is executed 5 time

Hence, the program takes 57 cycles to run. 


#### Exercise 7.24

#### Exercise 7.25

#### Exercise 7.26

### Verilog Code
```
module top(input logic clk, reset,
			output logic [31:0] writedata, adr,
			output logic memwrite);

logic [31:0] readdata;
// instantiate processor and memories
mips mips(clk, reset, adr, writedata, memwrite, readdata);
mem mem(clk, memwrite, adr, writedata, readdata);
endmodule


module mips(input  logic clk, reset,
			output logic [31:0] adr, writedata,
			output logic memwrite,
			input  logic [31:0] readdata);
			
	logic memtoreg, regDst, RegWrite, IorD, AluSrcA, ir_write, branch, pcwrite;
	logic [1:0] AluSrcB, pc_src; 
	logic [2:0] ALUCtrl;
	logic [31:0] instr; 
	
  controller c(clk, instr[31:26], instr[5:0], reset, IorD, memwrite, ir_write, pcwrite, branch, pc_src, ALUCtrl, AluSrcB, AluSrcA, RegWrite, regDst, memtoreg);
  datapath dp(clk, reset, memtoreg, regDst, RegWrite, ALUCtrl, writedata, IorD, AluSrcB, pc_src, AluSrcA, ir_write, branch, pcwrite, adr, readdata, instr);
  
endmodule



module controller(input logic clk,
				  input logic [5:0] Op, 
                  input logic [5:0] Funct,
                  input logic reset,
				  output logic IorD, MemWrite, IRWrite, PCWrite, Branch, 
				  output logic [1:0] PCSrc, 
                  output logic [2:0] ALUCtrl,
				  output logic [1:0] AluSrcB, 
				  output logic AluSrcA, RegWrite, regDst, memtoreg); 
  
  
  typedef enum logic [3:0] {S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12} statetype; 
  statetype state, nextstate;
    	 
	// state register 
	always @(posedge clk, posedge reset)
		if (reset) state <= S0; 
		else state <= nextstate; 
	
  logic [1:0] ALUOp; 
		
  logic[14:0] out; 
  assign {IorD, MemWrite, IRWrite, PCWrite, Branch, PCSrc, ALUOp, AluSrcB, AluSrcA, RegWrite, regDst, memtoreg} = out;
	
// next state logic 
  always @(*)
		case (state) 
			S0: 
              begin
                out  = 15'b001100000010000; 
                nextstate = S1; 
              end
			S1:
              begin
                out  = 15'b000000000110000;
                casez(Op)
                  6'b10?011: nextstate = S2; // LW or SW
                  6'b000000: nextstate = S6; // R-type
                  6'b000100: nextstate = S8; // BEQ
                  6'b001000: nextstate = S9; // ADDI
                  6'b000010: nextstate = S11; // JUMP
                  default: nextstate = S12; // error
                endcase
              end
          S2: 
              begin
                out  = 15'b000000000101000;
                if (Op == 6'b101011) nextstate = S5; 
                else nextstate = S3; 
              end
			S3: 
              begin
                out  = 15'b100000000000000;
                nextstate = S4; 
              end
			S4:
              begin
                out  = 15'b000000000000101; 
                nextstate = S0; 
              end
			S5: 
              begin
                out  = 15'b110000000000000;
                nextstate = S0; 
              end
			S6: 
              begin
                out  = 15'b000000010001000;
                nextstate = S7; 
              end
			S7: 
              begin
                out  = 15'b000000000000110;
                nextstate = S0; 
              end
			S8: 
              begin
                out  = 15'b000010101001000;
                nextstate = S0; 
              end
			S9: 
              begin
                out  = 15'b000000000101000;
                nextstate = S10; 
              end
			S10: 
              begin
                out = 15'b000000000000100;
                nextstate = S0; 
              end
			S11: 
              begin
                out = 15'b000101000000000;
                nextstate = S0; 
              end
         
			default: nextstate = S12; 
        endcase

  // ALUControl Logic 
  always@(*) 
    casez({ALUOp,Funct}) 
      8'b00??????: ALUCtrl = 010; // add
      8'b?1??????: ALUCtrl = 110; // subtract 
      8'b1?100000: ALUCtrl = 010; // add
      8'b1?100010: ALUCtrl = 110; // subtract
      8'b1?100100: ALUCtrl = 000; // and 
      8'b1?100101: ALUCtrl = 001; // or 
      8'b1?101010: ALUCtrl = 111; // set less than 
    endcase
  
endmodule




// DATAPATH ========================================
module datapath(input logic clk, reset,
				input logic memtoreg,
				input logic regdst,
				input logic regwrite,
				input logic [2:0] alucontrol,
				output logic [31:0] writedata,
				input logic IorD, 
				input logic [1:0] alusrc_b, pc_src, 
				input logic alusrc_a,
				input logic ir_write,
				input logic branch, pcwrite,
				output logic [31:0] adr, 
				input logic [31:0] rd,
				output logic [31:0] instr);
		
  
  // debug 
  //    always@(negedge clk)
  //      $display("IorD, alu_out = %b, %h", IorD, alu_out);
  // debug 
//      always@(negedge clk)
//        $display("alusrc_a, srca, srcb, pc, pcen, reset, alu_result, pc_src, pcnext = %b, %h, %h, %h, %b, %b, %h, %h, %h", alusrc_a, srca, srcb, pc , pcen, reset, alu_result, pc_src, pcnext);
     // always@(negedge clk)
     //   $display("srca, srca, alu_result, alucontrol =  %h, %h, %h, %h", srca, srcb, alu_result, alucontrol); 

  
	logic [31:0] pcnext, pc;
	logic pcen, zero; 
	logic [31:0] data, wd3, rd1, rd2;
	logic [4:0] wa3; 
	logic [31:0] signimm;
	logic [31:0] srca, srcb;
	logic [31:0] a_sig, b_sig;
	logic [31:0] alu_result, alu_out; 
	logic [31:0] pc_jump; 

	
	assign pcen = (branch & zero) | pcwrite; 
	
  
  
	// PC register and mux
	flopenr #(32) adr_reg(clk, reset, pcen, pcnext, pc);
	mux2 #(32) adr_mux(pc, alu_out, IorD, adr);
	

	// memory should fit somewhere here 
	
	// Pre register file 
	flopenr #(32) reg_instr(clk, reset, ir_write, rd, instr);	
	flopr #(32) reg_data(clk, reset, rd, data);
	mux2 #(5) reg_a3(instr[20:16], instr[15:11], regdst, wa3);
	mux2 #(32) reg_wd3(alu_out, data, memtoreg, wd3);

	
	// register file + output registers 
	regfile rf(clk, regwrite, instr[25:21], instr[20:16], wa3, wd3, rd1, rd2);
	flopr #(32) reg_asig(clk, reset, rd1, a_sig);
	flopr #(32) reg_bsig(clk, reset, rd2, b_sig);
	assign writedata = b_sig; 
	
	// SrcA and SrcB Mux 
	signext se(instr[15:0], signimm);
  mux2 #(32) srcA_mux(pc, a_sig, alusrc_a, srca);
  mux4 #(32) srcB_mux(b_sig, 32'd4, signimm, {signimm[29:0], 2'b00}, alusrc_b, srcb);

  
	// ALU logic
	alu32 alu(srca, srcb, alucontrol, alu_result, zero);


	// Alu Out register
	flopr #(32) reg_aluout(clk, reset, alu_result, alu_out);
	
	//  PCSrc multiplexor 
	assign pc_jump = {pc[31:28], instr[25:0], 2'b0};
  mux4 #(32) mux_pcsrc(alu_result, alu_out, pc_jump, 32'b0, pc_src, pcnext);	
endmodule






// ADDED FROM PROBLEM STATEMENT 
module mem(input logic clk, we,
		   input logic [31:0] a, wd,
		   output logic [31:0] rd);
  
  logic [31:0] RAM[20:0]; // used to be logic [31:0] RAM[63:0];
  
  initial
    begin
      $readmemh("memfile.dat", RAM);
    end
  
  assign rd = RAM[a[31:2]]; // word aligned
  
  always @(posedge clk)
    if (we)
      RAM[a[31:2]] <= wd;
    
endmodule



// REGFILE, UNCHANGED =================================
module regfile(input logic clk,
				input logic we3,
				input logic [4:0] ra1, ra2, wa3,
				input logic [31:0] wd3,
				output logic [31:0] rd1, rd2);
  
  
	logic [31:0] rf[31:0];
	// three ported register file
	// read two ports combinationally
	// write third port on rising edge of clk
	// register 0 hardwired to 0
	// note: for pipelined processor, write third port
	// on falling edge of clk
	always @(posedge clk)
	if (we3) rf[wa3] <= wd3;
	assign rd1 = (ra1 != 0) ? rf[ra1] : 0;
	assign rd2 = (ra2 != 0) ? rf[ra2] : 0;
    
endmodule



module flopenr #(parameter WIDTH = 8)
			(input logic clk, reset, en,
			input logic [WIDTH-1:0] d,
			output logic [WIDTH-1:0] q);
	always @(posedge clk, posedge reset)
	if (reset) q <= 0;
	else if (en) q <= d;
endmodule

module flopr #(parameter WIDTH = 8)
              (input logic clk, reset,
              input logic [WIDTH-1:0] d,
              output logic [WIDTH-1:0] q);
  always @(posedge clk, posedge reset) // replaced for always_ff
        if (reset) q <= 0;
        else q <= d;
endmodule




module mux4 #(parameter WIDTH = 8)
            (input logic [WIDTH-1:0] d0, d1, d2, d3,
            input logic [1:0] s,
            output logic [WIDTH-1:0] y);
    assign y = s[1] ? (s[0] ? d3 : d2): (s[0] ? d1 : d0);
endmodule


module adder(input logic [31:0] a, b,
			 output logic [31:0] y);
	assign y = a + b;
endmodule


module mux2 #(parameter WIDTH = 8)
			(input logic [WIDTH-1:0] d0, d1,
			input logic s,
			output logic [WIDTH-1:0] y);
	assign y = s ? d1 : d0;
endmodule

module sl2(input logic [31:0] a,
           output logic [31:0] y);
    // shift left by 2
    assign y = {a[29:0], 2'b00};
endmodule

module signext(input logic [15:0] a,
               output logic [31:0] y);
    assign y = {{16{a[15]}}, a};
endmodule


module alu32(input logic [31:0] A, B,
             input logic [2:0] F, // updated input control vector to 4 digits
             output logic [31:0] Y, 
             output logic zero);

  logic [31:0] S, Bout;
  assign Bout = F[2] ? ~B : B;
  assign S = A + Bout + F[2];
  assign zero = (Y==32'b0); 
  always@(*)
    case (F[1:0])
      3'b00: Y <= A & Bout;
      3'b01: Y <= A | Bout;
      3'b10: Y <= S;
      3'b11: Y <= S[31];   
    endcase

endmodule
```

#### Exercise 7.27

#### Exercise 7.28
On the fifth cycle: <br> 
- The `addi` instruction is the write-back stage. The result of `$s2` + 5 is written into `$s1`. 
- The `sub` instruction is in the memory stage and is neither writing nor reading a register. 
- The `lw` instruction is in the execute stage and is neither writing nor reading a register.
- The `sw` instruction is in the decode stage and is reading the registers `$t0`. 
- The `or` instruction is in the fetch stage is neither writing nor reading a register. 

#### Exercise 7.29
On the fifth cycle: <br> 
- `$s0`is being written by the `add` instruction
- `$t4` and `$t5` are being read by the `or` instruction

Notice that `$s0` in the writeback stage is being forwarded to the `and` instruction in the fifth cycle. 

<img src="images\P7_29.PNG"/>

#### Exercise 7.30

Red and black arrows represent forwards and stalls respectively. 

<img src="images\P7_30.PNG"/>



#### Exercise 7.31

#### Exercise 7.32

Note: Figure 7.58 which illustrates the pipelined MIPS processor with full hazard handling does not include the necessary hardware for jumps instruction (first introduced in fig. 7.14) necessary in this exercise. As explained in section 7.5.4, it will be assumed jump instructions are executed in the decode stage, meaning they always have a CPI of 2. 

The table below shows the progression of the program through the pipelined MIPS processor. 
In total, 22 clock cycles are needed to execute the program, this corresponds to a CPI of 22/17 = 1.29

On cycle 4, the red arrow represents a foward of the register $s0. <br>
On cycle 5, 7, 11, 15, 19, the blue box represent a single cycle register write/read operation.  

<img src="images\P7_32.PNG"/>


#### Exercise 7.33
#### Exercise 7.34
Both the data path and control path of the presented MIPS pipeline processor are already capable of handling `addi` instructions. Only the control unit needs an update (see table 7.4). Recall that the pipelined and single cycle MIPS processors use the same the control decoder.  

#### Exercise 7.35
The `j` instruction can be added to the decode stage by: <br> 
- Adding the signal $\text{PCJump} = [\text{PCPlus4}_{31:28}, \text{instr}_{25:0} << 2]$
- Adding the control output signal `Jump`
- Adding a 2:1 multiplexor right upstream of the decode register such that `PCJump` is saved into the decode register when `Jump = 1` and `PC'` is saved into the decode regiser when `Jump = 0`. 
Refer to figure to 7.14 for an example. Table 7.5 also describes the logic behind the `Jump` signal. 

Control path modifications: 
- $\text{DecodeReg}_{CLR} = \text{PCSrcD } | \text{ JumpD}$ 

Hazard handling modification: <br> 
- $\text{FlushE} = \text{lwstall } | \text{ branchstall } | \text{ JumpD}$

#### Exercise 7.36
#### Exercise 7.37
#### Exercise 7.38
#### Exercise 7.39
#### Exercise 7.40
#### Exercise 7.41
#### Exercise 7.42



# Appendix 

### MIPS Verilog Code

```
module mips(input logic clk, reset,
			output logic [31:0] pc,
			input logic [31:0] instr,
            output logic memwrite,
            output logic [31:0] aluout, writedata,
            input logic [31:0] readdata);
  logic memtoreg, alusrc, regdst, regwrite, jump, pcsrc, zero;
  logic [2:0] alucontrol;
  controller c(instr[31:26], instr[5:0], zero, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol);
  datapath dp(clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, zero, pc, instr, aluout, writedata, readdata);
endmodule



module controller(input logic [5:0] op, funct,
                  input logic zero,
                  output logic memtoreg, memwrite,
                  output logic pcsrc, alusrc,
                  output logic regdst, regwrite,
                  output logic jump,
                  output logic [2:0] alucontrol);
  logic [1:0] aluop;
  logic branch;
  maindec md(op, memtoreg, memwrite, branch, alusrc, regdst, regwrite, jump, aluop);
  aludec ad(funct, aluop, alucontrol);
  assign pcsrc = branch & zero;
endmodule

module maindec(input logic [5:0] op,
               output logic memtoreg, memwrite,
               output logic branch, alusrc,
               output logic regdst, regwrite,
               output logic jump,
               output logic [1:0] aluop);
  
  logic [8:0] controls;
  assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;
  always @(*) // replacement for always_comb
    case(op)
      6'b000000: controls <= 9'b110000010; // RTYPE
      6'b100011: controls <= 9'b101001000; // LW
      6'b101011: controls <= 9'b001010000; // SW
      6'b000100: controls <= 9'b000100001; // BEQ
      6'b001000: controls <= 9'b101000000; // ADDI
      6'b000010: controls <= 9'b000000100; // J
      default: controls <= 9'bxxxxxxxxx; // illegal op
    endcase
endmodule


module aludec(input logic [5:0] funct,
			  input logic [1:0] aluop,
			  output logic [2:0] alucontrol);
	always @(*) // replacement for always_comb
		case(aluop)
		2'b00: alucontrol <= 3'b010; // add (for lw/sw/addi)
		2'b01: alucontrol <= 3'b110; // sub (for beq)
		default: case(funct) // R-type instructions
			6'b100000: alucontrol <= 3'b010; // add
			6'b100010: alucontrol <= 3'b110; // sub
			6'b100100: alucontrol <= 3'b000; // and
			6'b100101: alucontrol <= 3'b001; // or
			6'b101010: alucontrol <= 3'b111; // slt
			default: alucontrol <= 3'bxxx; // ???
		endcase
	endcase
endmodule

module datapath(input logic clk, reset,
				input logic memtoreg, pcsrc,
				input logic alusrc, regdst,
				input logic regwrite, jump,
				input logic [2:0] alucontrol,
				output logic zero,
				output logic [31:0] pc,
				input logic [31:0] instr,
				output logic [31:0] aluout, writedata,
				input logic [31:0] readdata);
				logic [4:0] writereg;
				logic [31:0] pcnext, pcnextbr, pcplus4, pcbranch;
				logic [31:0] signimm, signimmsh;
				logic [31:0] srca, srcb;
				logic [31:0] result;
	// next PC logic
	flopr #(32) pcreg(clk, reset, pcnext, pc);
	adder pcadd1(pc, 32'b100, pcplus4);
	sl2 immsh(signimm, signimmsh);
	adder pcadd2(pcplus4, signimmsh, pcbranch);
	mux2 #(32) pcbrmux(pcplus4, pcbranch, pcsrc, pcnextbr);
	mux2 #(32) pcmux(pcnextbr, {pcplus4[31:28],
	instr[25:0], 2'b00}, jump, pcnext);
	
	// register file logic
	regfile rf(clk, regwrite, instr[25:21], instr[20:16], writereg, result, srca, writedata);
	mux2 #(5) wrmux(instr[20:16], instr[15:11], regdst, writereg);
	mux2 #(32) resmux(aluout, readdata, memtoreg, result);
	signext se(instr[15:0], signimm);
	
	// ALU logic
	mux2 #(32) srcbmux(writedata, signimm, alusrc, srcb);
	alu32 alu(srca, srcb, alucontrol, aluout, zero);
endmodule



module regfile(input logic clk,
			   input logic we3,
			   input logic [4:0] ra1, ra2, wa3,
			   input logic [31:0] wd3,
			   output logic [31:0] rd1, rd2);
			   logic [31:0] rf[31:0];
	// three ported register file
	// read two ports combinationally
	// write third port on rising edge of clk
	// register 0 hardwired to 0
	// note: for pipelined processor, write third port
	// on falling edge of clk
   always @(posedge clk)// replaced for always_ff
		if (we3) rf[wa3] <= wd3;
      		assign rd1 = (ra1 != 0) ? rf[ra1] : 0;
			assign rd2 = (ra2 != 0) ? rf[ra2] : 0;
endmodule


module adder(input logic [31:0] a, b,
			 output logic [31:0] y);
	assign y = a + b;
endmodule

module sl2(input logic [31:0] a,
		   output logic [31:0] y);
	// shift left by 2
	assign y = {a[29:0], 2'b00};
endmodule

module signext(input logic [15:0] a,
			   output logic [31:0] y);
	assign y = {{16{a[15]}}, a};
endmodule


module flopr #(parameter WIDTH = 8)
			  (input logic clk, reset,
			  input logic [WIDTH-1:0] d,
			  output logic [WIDTH-1:0] q);
  always @(posedge clk, posedge reset) // replaced for always_ff
		if (reset) q <= 0;
		else q <= d;
endmodule


module mux2 #(parameter WIDTH = 8)
			 (input logic [WIDTH-1:0] d0, d1,
			 input logic s,
			 output logic [WIDTH-1:0] y);
	assign y = s ? d1 : d0;
endmodule

// taken from 5.9 
module alu32(input logic [31:0] A, B,
             input logic [2:0] F,
             output logic [31:0] Y, 
             output logic zero);
  
  logic [31:0] S, Bout;
  assign Bout = F[2] ? ~B : B;
  assign S = A + Bout + F[2];
  assign zero = (Y==32'b0); 
  always@(*)
    case (F[1:0])
      2'b00: Y <= A & Bout;
      2'b01: Y <= A | Bout;
      2'b10: Y <= S;
      2'b11: Y <= S[31];
    endcase
      
endmodule
```

### MIPS Verilog testbench function 

```
module testbench();
  logic clk;
  logic reset;
  logic [31:0] writedata, dataadr;
  logic memwrite;
  
  // instantiate device to be tested
  top dut (clk, reset, writedata, dataadr, memwrite);
  
  // initialize test
  initial
    begin
    reset <= 1; # 22; reset <= 0;
    end
  
  // generate clock to sequence tests
  always
    begin
    clk <= 1; # 5; clk <= 0; # 5;
    end
  
  // check results
  always @(negedge clk)
    begin
      if (memwrite) 
        begin
      		if (dataadr===84 & writedata===7) begin
      		$display("Simulation succeeded");
      		$stop;
      	end 
      else 
        if (dataadr !==80) 
          begin
      		$display("Simulation failed");
      		$stop;
      	  end
      end
    end
endmodule



module top(input logic clk, reset,
           output logic [31:0] writedata, dataadr,
           output logic memwrite);
  logic [31:0] pc, instr, readdata;
  
  // instantiate processor and memories
  mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata);
  imem imem(pc[7:2], instr);
  dmem dmem(clk, memwrite, dataadr, writedata, readdata);
endmodule


module dmem(input logic clk, we,
            input logic [31:0] a, wd,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  assign rd = RAM[a[31:2]]; // word aligned
  always @(posedge clk) // reaplaced always_ff
  if (we) RAM[a[31:2]] <= wd;
endmodule



module imem(input logic [5:0] a,
            output logic [31:0] rd);
  logic [31:0] RAM[63:0];
  initial
    $readmemh("memfile.dat", RAM);
    assign rd = RAM[a]; // word aligned
endmodule
```

### Input file 

memfile.dat
```
20020005
2003000c
2067fff7
00e22025
00642824
00a42820
10a7000a
0064202a
10800001
20050000
00e2202a
00853820
00e23822
ac670044
8c020050
08000011
20020001
ac020054
```