计算机组成
计算机组成
MIPS体系结构概述
高小鹏
北京航空航天大学计算机学院

# 提纲 异常/中断 协处理器

# Motivation for Input/Output

系统结构研究所

- I/O is how humans interact with computers
- I/O gives computers long-term memory.
- I/O lets computers do amazing things:



MIT Media Lab "Sixth Sense"

 Computer without I/O like a car without wheels; great technology, but gets you nowhere

//31/2012

Summer 2012 -- Lecture #25

# I/O Device Examples and Speeds

• I/O speeds: 7 orders of magnitude between mouse and LAN

| Device            | Behavior        | Partner | Data Rate (KB/s) |
|-------------------|-----------------|---------|------------------|
| Keyboard          | Input           | Human   | 0.01             |
| Mouse             | Input           | Human   | 0.02             |
| Voice output      | Output          | Human   | 5.00             |
| Floppy disk       | Storage         | Machine | 50.00            |
| Laser printer     | Output          | Human   | 100.00           |
| Magnetic disk     | Storage         | Machine | 10,000.00        |
| Wireless network  | Input or Output | Machine | 10,000.00        |
| Graphics display  | Output          | Human   | 30,000.00        |
| Wired LAN network | Input or Output | Machine | 125.000.00       |

When discussing transfer rates, use SI prefixes (10<sup>x</sup>)

31/2012 Summer 2012 -- Lecture #25

# What do we need for I/O to work? 1) A way to connect many types of devices 2) A way to control these devices, respond to them, and transfer data 3) A way to present them to user programs so they are useful 7/31/2012 Summer 2012 - Lecture 225 Summer 2012 - Lecture 225

#### Instruction Set Architecture for I/O

- What must the processor do for I/O?
  - Input: reads a sequence of bytes
  - Output: writes a sequence of bytes
- Some processors have special input and output instructions
- Alternative model (used by MIPS):
  - Use loads for input, stores for output (in small pieces)
  - Called Memory Mapped Input/Output
  - A portion of the address space dedicated to communication paths to Input or Output devices (no memory there)

31/2012 Summer 2012 -- Lecture #25

1

# Memory Mapped I/O

- · Certain addresses are not regular memory
- Instead, they correspond to registers in I/O devices



#### Processor-I/O Speed Mismatch

- 1 GHz microprocessor can execute 1 billion load or store instr/sec (4,000,000 KB/s data rate)
  - Recall: I/O devices data rates range from 0.01 KB/s to 125,000 KB/s
- Input: Device may not be ready to send data as fast as the processor loads it
  - Also, might be waiting for human to act
- Output: Device not be ready to accept data as fast as processor stores it
- What can we do?

1/2012 Summer 2012 -- Lecture #25

# **Processor Checks Status Before Acting**

- Path to a device generally has 2 registers:
  - Control Register says it's OK to read/write (I/O ready)
  - Data Register contains data
- 1) Processor reads from control register in a loop, waiting for device to set *Ready bit*  $(0 \rightarrow 1)$
- 2) Processor then loads from (input) or writes to (output) data register
  - Resets Ready bit of control register  $(1 \rightarrow 0)$
- This process is called "Polling"

7/31/2012

ummer 2012 -- Lecture

# I/O Example (Polling in MIPS)

• Input: Read from keyboard into \$v0

Output: Write to display from \$a0

"Ready" bit is from processor's point of view!

81/2012 Summer 2012 -- Lecture #25

### Cost of Polling?

- Processor specs: 1 GHz clock, 400 clock cycles for a polling operation (call polling routine, accessing the device, and returning)
- Determine % of processor time for polling:
  - Mouse: Polled 30 times/sec so as not to miss user movement
  - Floppy disk: Transferred data in 2-Byte units with data rate of 50 KB/sec. No data transfer can be missed.
  - Hard disk: Transfers data in 16-Byte chunks and can transfer at 16 MB/second. Again, no transfer can be missed.

7/31/2012

Summer 2012 -- Lecture #25

#### % Processor time to poll

- · Mouse polling:
  - Time taken: 30 [polls/s] × 400 [clocks/poll] = 12K [clocks/s]
  - -% Time:  $1.2 \times 10^4$  [clocks/s]  $/ 10^9$  [clocks/s] = 0.0012%
  - Polling mouse little impact on processor
- Disk polling:
  - Freq: 16 [MB/s] / 16 [B/poll] = 1M [polls/s]
  - Time taken: 1M [polls/s] × 400 [clocks/poll] = 400M [clocks/s]
  - % Time: 4×108 [clocks/s] / 109 [clocks/s] = 40%
  - Unacceptable!
- Problems: polling, accessing small chunks

7/31/2012 Summer 2012 -- Lecture #25

2

# Alternatives to Polling?

- Wasteful to have processor spend most of its time "spin-waiting" for I/O to be ready
- Would like an unplanned procedure call that would be invoked only when I/O device is ready
- Solution: Use exception mechanism to help trigger I/O, then interrupt program when I/O is done with data transfer
  - This method is discussed next

7/31/2012

Summer 2012 -- Lecture #25

提纲
- 输入输出
- 异常/中断
- 协处理器
- 协处理器

# **Exceptions and Interrupts**

- "Unexpected" events requiring change in flow of control
  - Different ISAs use the terms differently
- Exception
  - Arises within the CPU

     (e.g. undefined opcode, overflow, syscall, TLB Miss)
- Interrupt
  - From an external I/O controller
- Dealing with these without sacrificing performance is difficult!

7/31/2012

r 2012 -- Lecture #25

# Handling Exceptions (1/2)

- In MIPS, exceptions managed by a System Control Coprocessor (CP0)
- Save PC of offending (or interrupted) instruction
  - In MIPS: save in special register called Exception Program Counter (EPC)
- · Save indication of the problem
  - In MIPS: saved in special register called *Cause* register
  - In simple implementation, might only need 1-bit (0 for undefined opcode, 1 for overflow)
- Jump to exception handler code at address 0x80000180

7/31/201

Summer 2012 -- Lecture #25

# Handling Exceptions (2/2)

- · Operating system is also notified
  - Can kill program (e.g. segfault)
  - For I/O device request or syscall, often switch to another process in meantime
    - $\bullet\,$  This is what happens on a TLB misses and page faults

7/31/2012

Summer 2012 -- Lecture #25

#### **Exception Properties**

- Re-startable exceptions
  - Pipeline can flush the instruction
  - Handler executes, then returns to the instruction
    - · Re-fetched and executed from scratch
- PC+4 saved in EPC register
  - Identifies causing instruction
  - PC+4 because it is the available signal in a pipelined implementation
    - Handler must adjust this value to get right address

7/31/2012

nmer 2012 -- Lecture #25

3

#### **Handler Actions**

- Read Cause register, and transfer to relevant handler
- · OS determines action required:
  - If restartable exception, take corrective action and then use EPC to return to program
  - Otherwise, terminate program and report error using EPC, Cause register, etc.
     (e.g. our best friend the segfault)

7/31/2013

ımmer 2012 -- Lecture #25

#### I/O Interrupt

- An I/O interrupt is like an exception except:
  - An I/O interrupt is "asynchronous"
  - More information needs to be conveyed
- "Asynchronous" with respect to instruction execution:
  - I/O interrupt is not associated with any instruction, but it can happen in the middle of any given instruction
  - I/O interrupt does not prevent any instruction from running to completion

31/2012 Summer 2012 -- Lecture #25





















#### 软件实现:中断服务程序

- □ 框架结构:保存现场、中断处理、恢复现场、中断返回
- □ 1、保存现场
  - 将所有寄存器都保存在堆栈中
- □ 2、中断处理
  - 读取特殊寄存器了解哪个硬件中断发生
  - 执行对应的处理策略 (例如读写设备寄存器、存储器等)
- □ 3、恢复现场
  - 从堆栈中恢复所有寄存器
- 4、中断返回
  - 执行eret指令

1、3、4: 通用 2: 针对特定设备

北京航空航天大学计算机学!

#### 中断响应机制: 检测异常与中断(1)

- 每条指令的W阶段检测异常与中断
  - 最终异常: 流水过来的前级异常
  - 是否有中断
- 中断检测时需要判断是否中断允许位
  - ◆ 解决方法: 用HWINT/IM/IE/EXL产生中断请求

assign IntReq = |(HWInt[7:2] & IM[7:2]) & IE & !EXL ;

- 注意:中断优先级高于异常
  - Q: 怎么实现呢?
  - A: 清除各级指令时,先判断中断再判断异常流水标志位

北京航空航天大学计算机学!







# Interrupt-Driven I/O Example (1/2)

- Assume the following system properties:
  - 500 clock cycle overhead for each transfer, including interrupt
  - Disk throughput of 16 MB/s
  - Disk interrupts after transferring 16 B
  - Processor running at 1 GHz
- If disk is active 5% of program, what % of processor is consumed by the disk?
  - $-5\% \times 16 \text{ [MB/s]} / 16 \text{ [B/inter]} = 50,000 \text{ [inter/s]}$
  - $-50,000 [inter/s] \times 500 [clocks/inter] = 2.5 \times 10^7 [clocks/s]$
- 2.5×10<sup>7</sup> [clocks/s] / 10<sup>9</sup> [clock/s] = 2.5% busy

# Interrupt-Driven I/O Example (2/2)

- 2.5% busy (interrupts) much better than 40% (polling)
- Real Solution: Direct Memory Access (DMA)
  mechanism
  - Device controller transfers data directly to/from memory without involving the processor
  - Only interrupts once per page (large!) once transfer is done

7/31/2012

Summer 2012 -- Lecture #25

提纲

□ 输入输出
□ 异常/中断
□ 协处理器

#### 协处理器指令及用途

- 指令: MFC0、MTC0
  - □ 不能直接修改CP0寄存器,必须借助通用寄存器
- MFC0: 读取CP0寄存器至通用寄存器
  - □ SR: 获取处理器的控制信息
  - □ Cause: 获取处理器当前所处于的状态
  - □ EPC: 获取被异常/中断的指令地址
  - □ PRId: 读取处理器ID (可以读取你的个性签名<sup>②</sup>)
- MTC0: 通用寄存器值写入CP0寄存器
  - □ SR: 对处理器进行控制, 例如关闭中断
  - □ EPC: 操作系统中将用于多任务切换

北京航空航天大学计算机

|              |    | 设计CP0:模           | 人及口                      |  |
|--------------|----|-------------------|--------------------------|--|
| 信号名          | 方向 | 用途                | 产生来源及机制                  |  |
| A1[4:0]      | I  | 读CP0寄存器编号         | 执行MFC0指令时产生              |  |
| A2[4:0]      | I  | 写CP0寄存器编号         | 执行MTC0指令时产生              |  |
| DIn[31:0]    | I  | CP0寄存器的写入数据       | 执行MTC0指令时产生<br>数据来自GPR   |  |
| PC[31:2]     | I  | 中断/异常时的PC         | PC                       |  |
| ExcCode[6:2] | I  | 中断/异常的类型          | 异常功能部件                   |  |
| HWInt[5:0]   | I  | 6个设备中断            | 外部硬件设备(如鼠标、键盘)           |  |
| We           | I  | CP0寄存器写使能         | 执行MTC0指令时产生              |  |
| EXLSet       | I  | 用于置位SR的EXL(EXL为1) | 流水线在W阶段产生                |  |
| EXLClr       | I  | 用于清除SR的EXL(EXL为0) | 执行ERET指令时产生              |  |
| clk          | I  | 时钟                |                          |  |
| rst          | I  | 复位                |                          |  |
| IntReq       | О  | 中断请求,输出至CPU控制器    | 是HWInt/IM/EXL/IM的函数      |  |
| EPC[31:2]    | О  | EPC寄存器输出至NPC      |                          |  |
| DOut[31:0]   | О  | CP0寄存器的输出数据       | 执行MFC0指令时产生,输出数据3<br>GPR |  |

#### 设计CP0: SR

- 由于无用位较多,因此只定义有用位
  - $\square$  reg [15:10] im ;
  - □ reg exl, ie ;
- SR整体表示为: {16'b0, im, 8'b0, exl, ie}
- im, ie的行为很简单
  - if (当Wen有效并且Sel为对应的寄存器编号)

 $\{im, exl, ie\} \le \{DIn[15:10],$ 

DIn[1], DIn[0]};

reg [5:0] im与reg [15:10] im 是等价的,但后者编码风格更好

La diam

#### 设计CP0: SR

exl要复杂一些:除了类似im/ie的行为外,还 必须有置位和清除的功能。以置位为例:

if (EXLSet)

ex1 <= 1'b1 ;

北京航空航天大学计算机学院

#### 设计CPO: Cause

- Cause: 只需定义6位寄存器,不断的锁存外部 6个中断(HWInt[5:0])
  - reg [15:10] hwint\_pend;
- Cause整体表示为:
  - □ {16'b0, hwint pend, 10'b0}

北京航空航天大学计算机学员

#### 设计CP0: EPC

- 定义30位寄存器
  - □ reg [32:2] epc;
- ▶ 为什么不需要32位?

北京航空航天大学计算机学8

#### 设计CP0: PRId

- 用于对公司/指令集版本等进行标识 □ Intel处理器也有ID,CPU-Z就可以读取
  - 24
     23
     16
     15
     8
     7

     Company Options
     Company ID
     Processor ID
     Revision

北京航空航天大学计算机

#### 设计CP0:输出CP0寄存器

- 除了SR/Cause/EPC/PRId外,一律输出0。
- 可以设计一个5选1的MUX。
- 也可以用行为描述, 样例代码:

比京航空航天大学计算机学院

GXP, © COCOA