#### **CS223: Computer Architecture & Organization**

Lecture 28 [18.04.2022]

#### MIPS Pipeline for Multi-Cycle Operations



**Associate Professor** 

Department of Computer Science & Engineering Indian Institute of Technology Guwahati, Assam.

## Pipelined RISC Data path



# **Visualizing Pipelining**

|                    | Clock number |    |    |     |     |     |     |     |  |  |  |  |  |  |  |
|--------------------|--------------|----|----|-----|-----|-----|-----|-----|--|--|--|--|--|--|--|
| Instruction number | 1            | 2  | 3  | 4   | 5   | 6   | 7   | 8   |  |  |  |  |  |  |  |
| i                  | IF           | ID | EX | MEM | WB  |     |     |     |  |  |  |  |  |  |  |
| i+1                |              | IF | ID | EX  | MEM | WB  |     |     |  |  |  |  |  |  |  |
| i+2                |              |    | IF | ID  | EX  | MEM | WB  |     |  |  |  |  |  |  |  |
| i+3                |              |    |    | IF  | ID  | EX  | MEM | WB  |  |  |  |  |  |  |  |
| i+4                |              |    |    |     | IF  | ID  | EX  | MEM |  |  |  |  |  |  |  |

- Can EXE stage complete the operation in 1 cycle?
- ❖ Some operations require more than 1 clock cycle to complete.
  - Floating Point/Integer Multiply
  - Floating Point/Integer Divide
  - Floating Point Add/Sub
- ❖ Dedicated hardware units are available on the processor for performing these operations.





| Functional unit                     | Latency | Initiation interval |
|-------------------------------------|---------|---------------------|
| Integer ALU                         | 0       | 1                   |
| Data memory (integer and FP loads)  | 1       | 1                   |
| FP add                              | 3       | 1                   |
| FP multiply (also integer multiply) | 6       | 1                   |
| FP divide (also integer divide)     | 24      | 25                  |

Latencies and initiation intervals for functional units.



| Functional unit                     | Latency | Initiation interval |
|-------------------------------------|---------|---------------------|
| Integer ALU                         | 0       | 1                   |
| Data memory (integer and FP loads)  | 1       | 1                   |
| FP add                              | 3       | 1                   |
| FP multiply (also integer multiply) | 6       | 1                   |
| FP divide (also integer divide)     | 24      | 25                  |

- ❖ Latency: The number of intervening cycles between an instruction that produces a result and an instruction that uses the result.
- ❖ Initiation / Repeat Interval: The number of cycles that must elapse between issuing two operations of a given type.



| MUL.D | IF | ID | M1 | M2 | M3 | M4  | M5  | M6  | M7 | MEM | WB |
|-------|----|----|----|----|----|-----|-----|-----|----|-----|----|
| ADD.D |    | IF | ID | AI | A2 | A3  | A4  | MEM | WB |     |    |
| L.D   |    |    | IF | ID | EX | MEM | WB  |     |    |     |    |
| S.D   |    |    |    | IF | ID | EX  | MEM | WB  |    |     |    |

The pipeline timing of a set of independent FP operations. The stages in italics show where data are needed, while the stages in bold show where a result is available. The ".D" extension on the instruction mnemonic indicates double-precision (64-bit) floating-point operations. FP loads and stores use a 64-bit path to memory so that the pipelining timing is just like an integer load or store.

### **Issues in Longer Latency Pipelines**

- Since divide unit is not-pipelined structural hazard
- Instruction have varying runtimes—more register writes/cycle
- WAW hazards possible
- Out of order completion imprecise exceptions
- Stalls for RAW hazards will be more.

|        |          | Clock cycle number |    |    |       |    |       |       |       |       |       |       |     |    |       |       |       |     |
|--------|----------|--------------------|----|----|-------|----|-------|-------|-------|-------|-------|-------|-----|----|-------|-------|-------|-----|
| Instru | ction    | 1                  | 2  | 3  | 4     | 5  | 6     | 7     | 8     | 9     | 10    | 11    | 12  | 13 | 14    | 15    | 16    | 17  |
| L.D    | F4,0(R2) | IF                 | ID | EX | MEM   | WB |       |       |       |       |       |       |     |    |       |       |       |     |
| MUL.D  | F0,F4,F6 |                    | IF | ID | stall | M1 | M2    | M3    | M4    | M5    | M6    | M7    | MEM | WB |       |       |       |     |
| ADD.D  | F2,F0,F8 |                    |    | IF | stall | ID | stall | stall | stall | stall | stall | stall | A1  | A2 | A3    | A4    | MEM   | WB  |
| S.D    | F2,0(R2) |                    |    |    |       | IF | stall | stall | stall | stall | stall | stall | ID  | EX | stall | stall | stall | MEM |

## **Issues in Longer Latency Pipelines**

| Instruction    | Clock cycle number |    |    |    |     |     |    |     |     |     |    |  |  |
|----------------|--------------------|----|----|----|-----|-----|----|-----|-----|-----|----|--|--|
|                | 1                  | 2  | 3  | 4  | 5   | 6   | 7  | 8   | 9   | 10  | 11 |  |  |
| MUL.D F0,F4,F6 | IF                 | ID | M1 | M2 | M3  | M4  | M5 | M6  | M7  | MEM | WB |  |  |
|                |                    | IF | ID | EX | MEM | WB  |    |     |     |     |    |  |  |
|                |                    |    | IF | ID | EX  | MEM | WB |     |     |     |    |  |  |
| ADD.D F2,F4,F6 |                    |    |    | IF | ID  | A1  | A2 | A3  | A4  | MEM | WB |  |  |
|                |                    |    |    |    | IF  | ID  | EX | MEM | WB  |     |    |  |  |
|                |                    |    |    |    |     | IF  | ID | EX  | MEM | WB  |    |  |  |
| L.D F2,0(R2)   |                    |    |    |    |     |     | IF | ID  | EX  | MEM | WB |  |  |

- Single write port (Serialize completion) vs multiple write ports
- \*Resolve write port conflicts in ID stage and stall issue by 1
- Stall either of the instruction (priority basis) at MEM / WB stage
- ❖ Stall at MEM will force a stall to trickle at EX, M7, A4 stages

### **Issues in Longer Latency Pipelines**

| Instruction    | Clock cycle number |    |    |    |     |     |    |     |     |     |    |  |  |
|----------------|--------------------|----|----|----|-----|-----|----|-----|-----|-----|----|--|--|
|                | 1                  | 2  | 3  | 4  | 5   | 6   | 7  | 8   | 9   | 10  | 11 |  |  |
| MUL.D F0,F4,F6 | IF                 | ID | M1 | M2 | M3  | M4  | M5 | M6  | M7  | MEM | WB |  |  |
|                |                    | IF | ID | EX | MEM | WB  |    |     |     |     |    |  |  |
|                |                    |    | IF | ID | EX  | MEM | WB |     |     |     |    |  |  |
| ADD.D F2,F4,F6 |                    |    |    | IF | ID  | A1  | A2 | A3  | A4  | MEM | WB |  |  |
|                |                    |    |    |    | IF  | ID  | EX | MEM | WB  |     |    |  |  |
| L.D F2.0(R2)   |                    |    |    |    |     | IF  | ID | EX  | MEM | WB  |    |  |  |

- ❖ WAW hazard at register F2
- ❖ Delay issue (ID→ EX) of L.D until ADD.D enters MEM stage
- \*Keep the result of ADD.D and give it to needed instruction.
- ❖ Hence only L.D will write on F2.

#### How to handle Issues in Longer Pipelines

- Check for structural hazard in DIV.D and write ports
- ❖ Check for RAW data hazard at ID stage: If the source of an instruction in ID is Fi then Fi should be there as the name of destination of instruction in ID/A1, A1/A2, A2/A3 and ID/M1, M1/M2,... M6/M7



#### How to handle Issues in Longer Pipelines

- ❖ Check for WAW data hazard: If any instruction in A1,...A4, M1,..M7 has the same destination as the instruction in ID and the time at which they reach WB is same, delay issue by 1 cycle and repeat.
- ❖ Perform operand forwarding from EX/MEM, A4/MEM, M7/MEM, D/MEM, MEM/WB



#### Reference

- ❖ Computer Architecture-A Quantitative Approach (5th edition),
  - John L. Hennessy, David A. Patterson, Morgan Kaufman.
- ❖ Appendix C: Pipelining: Basic and Intermediate Concepts
  - **❖** Section C5: **Extending MIPS Pipeline to Multicycle Operations**

**❖ NPTEL Video Link: https://tinyurl.com/y9comntt** 



johnjose@iitg.ac.in http://www.iitg.ac.in/johnjose/