

# DCS Lab 8 Floating Point Computation

Arithmetic

### Floating Point – bfloat16

#### IEEE half-precision 16-bit float



#### IEEE 754 single-precision 32-bit float





Use this format

### Floating Point – bfloat16

- Sign: 0代表正, 1代表負
- Bias Exponent: 2的幾次方



- Fraction: 小數點後的值 (1.Fraction)
- 公式: ((-1)^Sign)\*(2^(Exponent-127))\*(1.Fraction)
- Example:
- $1 = 0.011111111.00000000 = ((-1)^{0})*(2^{127-127})*(1.0)$
- $-2 = 1\ 10000000\ 00000000 = ((-1)^1)^*(2^(128-127))^*(1.0)$

# Floating Point – 加法

- 1. 還原成1.Fraction (8-bit)
- 2. 對齊小數點 (8-bit if 右移小的數對齊大的數)
- 3. 如果是負的(Sign==1)話取補數(~a+1) (signed 9-bit)
- 4. sum = a + b (signed 10-bit)
- 5. 轉成bfloat16格式
  - 判斷正負(sum[9]),可得Sign
  - 找到第一個1的位置,四捨五入(optional),可得Fraction
  - 根據小數點移動多少,就把大的數的Exponent加減多少

## Floating Point – example 1

- Ex: a + b
- = 0 01111111 0101101 + 0 10000000 1010101
- = 1.0101101 \* 2^0 + 1.1010101 \* 2^1
- $\approx$  0.1010111 \* 2^1 + 1.1010101 \* 2^1
- = 10.0101100 \* 2^1
- = 0 10000001 0010110

00.1010111 \* 2^1

+01.1010101 \* 2^1



out exponent = Max(a exponent, b exponent) + 1 = 2

(0捨1入)

## Floating Point – example 2

```
• Ex: a + b
```

小數點右移一位

# Lab – FP + block diagram



# Floating Point – 乘法

- 1. 還原成1.Fraction (8-bit)
- 2. a \* b (16-bit)
- 3. 轉成bfloat16格式
  - 判斷正負(a Sign和b Sign),可得Sign
  - 找到第一個1的位置,四撸五入(optional),可得Fraction
  - Exponent = a Exponent + b Exponent 127 (+1 if .左移1)

### Floating Point – example

- Ex: a \* b
- = 1.0101101 \* 2^0 x 1.1010101 \* 2^1
- $= 10.001111111110001 * 2^{(0+1)} \approx 10.010000 * 2^{1}$
- = 1.0010000 \* 2<sup>2</sup>

x 1.1010101

= 0 10000001 0010000

First 1 - 10.001111111110001

小數點左移一位

Fraction

(0捨1入)

1.0101101

out exponent = 127 + 128 - 127 + 1 = 2 + 127

# Lab – FP \* block diagram



#### Lab

- mode == 0, out = a + b
- mode == 1, out = a \* b
- a的Exponent範圍在135到120之間
- b的Exponent範圍在a的Exponent加減3
- 容許有

|your\_out-correct\_out|<|correct\_out|\*0.1 的誤差(所以不做四捨五入也會過)

# Fpc.sv

| Input Signal | Bit width | Definition                              |
|--------------|-----------|-----------------------------------------|
| clk          | 1         | 20ns clock signal                       |
| rst_n        | 1         | Asynchronous negative edge reset signal |
| in_valid     | 1         | Pulled high during inputs               |
| mode         | 1         | 0 means a+b, 1 means a*b                |
| in_a         | 16        | bfloat16 input                          |
| in_b         | 16        | bfloat16 input                          |

| Output Signal | Bit width | Definition                        |
|---------------|-----------|-----------------------------------|
| out_valid     | 1         | Pulled high 1 cycle during output |
| out           | 16        | bfloat16 output                   |

\* All output signals should be reset to make sure it's not unknown

#### Spec

- 不可以用IP, 用IP視同FAIL
- 不可以超過30個cycles沒有output(從invalid開始數)
- out\_valid只能拉起一個cycle, 之後pattern會檢查out值
- out\_valid放下後, out要reset
- Next input會在out\_valid後2~5個cycle
- 所有output必須非同步負準位reset。
- 01\_RTL需要PASS。
- 02\_SYN不能有error跟latch。
- 02\_SYN時間timing slack必須為MET。
- 03\_GATE需要PASS。



negative trigger asynchronous reset





out\_valid & out pull high 1 cycle



#### Command

- tar -xvf ~dcsta01/Lab06.tar
- cd Lab06/01\_RTL/
- Need 02\_SYN
  - No Latch
  - No error
  - No timing violation (MET)
- Need 03\_GATE

Demo1: 5/4(四), 16:25:00

Demo2: 5/4(四), 23:59:59