#### **Computer-Aided VLSI System Design**

### **Homework 1: Arithmetic Logic Unit**

Graduate Institute of Electronics Engineering, National Taiwan University



#### Goal



- In this homework, you will learn
  - How to read spec
  - How to design ALU with simple operations
  - How to implement various operations by Verilog
  - How to separate combinational circuit and sequential circuit
  - How to write testbench (Optional)

#### Introduction



- An arithmetic logic unit (ALU) is one of the components of a computer processor
- ALU performs arithmetic and bit-level logical operations in a computer
- In this homework, you are going to design an ALU with some special instructions

### **Block Diagram**





# Input/Output



| Signal Name | I/O | Width | Simple Description                                                                                                                                                                                                            |
|-------------|-----|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| i_clk       | I   | 1     | Clock signal in the system                                                                                                                                                                                                    |
| i_rst_n     |     | 1     | Active low asynchronous reset                                                                                                                                                                                                 |
| i_in_valid  | 1   | 1     | The signal is <b>high</b> if input data is ready                                                                                                                                                                              |
| o_busy      | 0   | 1     | Set <b>low</b> if ready for next input data. Set <b>high</b> to pause input sequence.                                                                                                                                         |
| i_inst      | 1   | 4     | Instruction for ALU to perform                                                                                                                                                                                                |
| i_data_a    | 1   | 16    | Signed input data with 2's complement representation 1. For instructions 0000~0100, fixed point number                                                                                                                        |
| i_data_b    | I   | 16    | <ul><li>(6-bit signed integer + 10-bit fraction)</li><li>2. For instructions 0101~1001, integer</li></ul>                                                                                                                     |
| o_out_valid | 0   | 1     | Set <b>high</b> if ready to output result                                                                                                                                                                                     |
| o_data      | 0   | 16    | <ul> <li>Signed output data with 2's complement representation</li> <li>1. For instructions 0000~0100, fixed point number (6-bit signed integer + 10-bit fraction)</li> <li>2. For instructions 0101~1001, integer</li> </ul> |

# Specification (1/2)



- Active low asynchronous reset is used only once.
- All inputs are synchronized with the negative clock edge.
- All outputs should be synchronized with the positive clock edge.
  - Flip-flops should be added before all outputs.
- New pattern (i\_inst, i\_data\_a and i\_data\_b) is ready only when i\_in\_valid is high.
- i in valid will be randomly pulled high only if o busy is low.
- o\_out\_valid should be high for only one cycle for each o\_data.
- The testbench will sample o\_data at negative clock edge if o\_out\_valid is high.
- You can raise o\_out\_valid at any moment.

# Specification (2/2)



- t < 0: ALU reset
- t = 0.5: o\_busy=0  $\rightarrow$  new pattern is presented, i\_in\_valid=1
- t = 2.5: o\_busy=1  $\rightarrow$  no pattern is presented, i\_in\_valid=0
- t = 2.5: o\_out\_valid=1  $\rightarrow$  o\_data is sampled



#### **Instructions**



## Fixed-Point Number (1/2)



- Representation used for instruction 0000~0100
  - 6-bit signed integer + 10-bit fraction = 16-bit fixed-point
- Saturation (for instruction 0000~0100)
  - If the final result exceeds the maximum (minimum) representable value of 16-bit representation, use the maximum (minimum) value as output.



### Fixed-Point Number (2/2)



- Rounding (for instruction 0010 and 0100)
  - The result must be rounded to the nearest[2] representable number with 10-bit fraction first
  - For tie-breaking, round half toward positive infinity
  - Then, apply saturation to ensure the final output is a valid
     16-bit fixed-point number



## Signed Add/Sub/Mul



- Topic: basic operator, sizing and signing
- i\_data\_a is the first operand
- i\_data\_b is the second operand
- o\_data is the final result
- Rounding and saturation must be applied to the output

### **Signed Accumulation**



- Topic: vector array
- Implement an accumulator with 16 independent memory units (initialized to 0 during reset)
- i\_data\_a is the index of the chosen memory unit, and is guaranteed to be from 0 to 15 (inclusive)
- i\_data\_b is the value to be accumulated
- o\_data is the current accumulated result
- Intermediate values are guaranteed not to exceed 20 bits
- Saturation must be applied to the output

```
index = i_{data_a}

data_{acc}[index]_{new} = data_{acc}[index]_{old} + i_{data_b}

o_{data} = SAT(data_{acc}[index]_{new})
```

# Softplus Function (1/2)



- Topic: piecewise linear approximation, constant division
- Implement an activation function, softplus[3], which is a smooth approximation of ReLU
- Use the following piecewise linear approximation to compute

softplus(i\_data\_a):

$$softplus(x) = ln(1 + e^{x})$$

$$\approx f(x) = \begin{cases} x, & x \ge 2\\ (2x+2)/3, & 0 \le x \le 2\\ (x+2)/3, & -1 \le x \le 0\\ (2x+5)/9, & -2 \le x \le -1\\ (x+3)/9, & -3 \le x \le -2\\ 0, & x \le -3 \end{cases}$$



# Softplus Function (2/2)



- Topic: piecewise linear approximation, constant division
- You are not allowed to use the division operator (/) in Verilog code.
- Fixed-point constant division algorithms
  - Long division → Too slow
  - DesignWare → Banned in this homework
  - By multiplication?
  - **—** ...
- The output (after rounding and saturation) must be exactly the same as the golden for any possible input value

### **XOR, Arithmetic Shift Right**



- Topic: bit-level operation, shifting
- XOR: o\_data = i\_data\_a ⊕ i\_data\_b
- ASR: Arithmetically shift i\_data\_a right by i\_data\_b bits
  - i\_data\_b is guaranteed to be from 0 to 16 (inclusive)

#### **Left Rotation**



- Topic: vector concatenation, vector part select (optional)
- Left rotation, also called left circular shift, inserts the bit that got shifted out at one end back to the other end
- i\_data\_a is the original pattern
- i\_data\_b is the shift amount, and is guaranteed to be from 0 to 16 (inclusive)





shift amount = 3

### **Count Leading Zeros**

- Topic: for loop, combinational loop, generate block (optional)
- Count the number of consecutive 0's from MSB
- For example, if a = 8'b0010\_0000, then CLZ(a)=2
- It is recommended to use for loops instead of hand crafting everything
- Be aware of combinational loop, where static timing analysis

(STA) cannot be applied

<u>0</u>010\_0000

 $\mathsf{MSB} \longrightarrow$ 



An error example

#### **Reverse Match4**



- Topic: for loop, vector part select, generate block (optional)
- This is a custom bit-level operation that matches 4 bits of i\_data\_a and i\_data\_b at a time in reverse order

o\_data[i] = 
$$\begin{cases} \text{ (i_data_a[i+3:i] == i_data_b[15-i:12-i]),} & i=0 \sim 12\\ 0, & i=13 \sim 15 \end{cases}$$

For example:



### **Other Requirements**



- Check your code with SpyGlass
  - Goal setup: lint\_rtl and lint\_rtl\_enhanced
  - List of waivable errors will be updated on NTU Cool
  - If you encounter any error that seems waivable, check with TA on NTU Cool
- You CANNOT implement any operation except piecewise linear approximation by look up tables, unless there are good reasons and you have checked with TA
- You are NOT allowed to use DesignWare

#### alu.v



```
module alu #(
    parameter INST_W = 4,
    parameter INT W = 6,
    parameter FRAC W = 10,
    parameter DATA W = INT W + FRAC W
   input
                              i clk,
   input
                              i rst n,
   input
                              i in valid,
   output
                              o busy,
   input
                 [INST_W-1:0] i inst,
   input signed [DATA W-1:0] i data a,
    input signed [DATA_W-1:0] i_data_b,
   output
                              o_out_valid,
   output
                 [DATA_W-1:0] o_data
```

```
// Local Parameters
    // Wires and Regs
    // Continuous Assignments
    // Combinatorial Blocks
    // Sequential Blocks
endmodule
```

#### testbench.v



```
timescale 1ns/10ps
define PERIOD
                10.0
define MAX CYCLE 100000
define RST DELAY 2.0
define SEQ LEN 60
ifdef I0
   define IDATA "../00 TESTBED/pattern/INST0 I.dat"
   define ODATA "../00_TESTBED/pattern/INST0_0.dat"
   define PAT LEN 40
elsif I1
   define IDATA "../00 TESTBED/pattern/INST1 I.dat"
   define ODATA "../00_TESTBED/pattern/INST1_0.dat"
   define PAT_LEN 40
elsif I2
   define IDATA "../00_TESTBED/pattern/INST2_I.dat"
   define ODATA "../00 TESTBED/pattern/INST2 O.dat"
   define PAT_LEN 40
elsif I3
   define IDATA "../00_TESTBED/pattern/INST3_I.dat"
   define ODATA "../00 TESTBED/pattern/INST3 O.dat"
   define PAT LEN 40
elsif I4
   define IDATA "../00 TESTBED/pattern/INST4 I.dat"
   define ODATA "../00 TESTBED/pattern/INST4 O.dat"
   define PAT LEN 40
elsif I5
```

```
elsif I5
   `define IDATA "../00 TESTBED/pattern/INST5 I.dat"
   define ODATA "../00 TESTBED/pattern/INST5 O.dat"
   define PAT LEN 40
elsif I6
  define IDATA "../00 TESTBED/pattern/INST6 I.dat"
   define ODATA "../00 TESTBED/pattern/INST6 O.dat"
   `define PAT LEN 40
elsif I7
   define IDATA "../00_TESTBED/pattern/INST7_I.dat"
   define ODATA "../00 TESTBED/pattern/INST7 O.dat"
   define PAT LEN 40
elsif I8
   define IDATA "../00 TESTBED/pattern/INST8 I.dat"
   define ODATA "../00 TESTBED/pattern/INST8 O.dat"
   define PAT LEN 40
elsif I9
   define IDATA "../00 TESTBED/pattern/INST9 I.dat"
   'define ODATA "../00 TESTBED/pattern/INST9 O.dat"
   define PAT LEN 40
else
   define IDATA "../00 TESTBED/pattern/INST0 I.dat"
   define ODATA "../00_TESTBED/pattern/INST0_0.dat"
   define PAT LEN 40
endif
```

#### **Commands**



- ./01\_run <arg1>
  - vcs -full64 -R -f rtl.f +v2k -sverilog -debug\_access+all +define+\$1
  - For example: ./01\_run I0 (arg1 =  $I0 \sim I9$ )
- ./99\_clean
  - Remove all temporary files
- Before you execute the shell script, change the permission of the file by chmod +x <script filename>

### **Pattern (Input Data)**



i inst i\_data\_a i data b 0000 0101011001100001 0011101000100011 0000 00001101010000011 0001001110101011 000001101010010000111011111000101101 0000 0010111110001111 00100111111110000 000000010010110111011001010100001010 0000 0100110101111011 00011101101111110 0000010111011000000110010011010001111 0000 0111100100110101 0011011111100110 0000100000000010001001100000001111000 0000 0101101000010111 0111011110001010 0000 0001101001011110 0100000111001100

### Pattern (Golden Output)



#### o\_data

#### **Submission**



 Create a folder named studentID\_hw1 and follow the hierarchy below (\*.sv is allowed if you use SystemVerilog)

- Pack the folder studentID\_hw1 into a tar file named studentID\_hw1\_vk.tar (k is the number of version, k =1,2,...)
  - tar -cvf studentID\_hw1\_vk.tar studentID\_hw1
  - Use lowercase for all the letters. (e.g. r13943000\_hw1\_v1.tar)
  - Pack the folder on IC Design LAB server to avoid OS related problems
- Submit to NTU Cool

# **Grading Policy (1/3)**



- Grading command
  - vcs -full64 -R -f rtl.f +v2k -sverilog -debug\_access+all +define+\$1
- Released patterns: 75%

| i_inst[3:0] | Operation              | Score |
|-------------|------------------------|-------|
| 4'b0000     | Signed Addition        | 5%    |
| 4'b0001     | Signed Subtraction     | 5%    |
| 4'b0010     | Signed Multiplication  | 10%   |
| 4'b0011     | Signed Accumulation    | 10%   |
| 4'b0100     | Softplus               | 10%   |
| 4'b0101     | XOR                    | 5%    |
| 4'b0110     | Arithmetic Right Shift | 5%    |
| 4'b0111     | Left Rotation          | 5%    |
| 4'b1000     | Count Leading Zeros    | 10%   |
| 4'b1001     | Reverse Match4         | 10%   |

# **Grading Policy (2/3)**



- Grading command
  - vcs -full64 -R -f rtl.f +v2k -sverilog -debug\_access+all +define+\$1
- Hidden patterns: 25%
  - Mixture of all instructions
  - Only if you pass all patterns will you get the score
- SpyGlass check with error: -20%
  - Check **Discussion** on NTU Cool for waivable errors
- All your code has to be synthesizable or you will get 0 point
- Lose 5 points for any incorrect naming or format
  - Make sure all your files can be correctly unpacked and executed on IC Design LAB server

# **Grading Policy (3/3)**



- No late submission
  - 0 point for this homework
- No plagiarism
  - Plagiarism in any form, including copying from online sources, is strictly prohibited

#### **Discussion**



- NTU Cool Discussion Forum
  - For any questions not related to assignment answers or privacy concerns, please use the NTU Cool discussion forum.
  - TAs will prioritize answering questions on the NTU Cool discussion forum
- Email: r13943005@ntu.edu.tw
  - Title should start with [CVSD 2024 Fall HW1]
  - Email with wrong title will be moved to trash automatically

### **Discussion**



| ■ 電腦輔助積體電路系統設計 (EEE5022) > 討論 > [HW1]Discussion |                                                          |  |  |
|-------------------------------------------------|----------------------------------------------------------|--|--|
| 課程內容課程資訊                                        | [HW1]Discussion<br>所有班別                                  |  |  |
| 公告                                              | HW1相關問題在此討論,並請以下列格式發問,方便助教按照每個問題回答                       |  |  |
| 作業                                              | 1. 問題一                                                   |  |  |
| 討論                                              | 2. 問題                                                    |  |  |
| Gradescope                                      | •••                                                      |  |  |
| 成績                                              | 另外,若需要截圖,請勿把自己的code截圖或code文字上傳,變成大家的參考答案,若違反將扣本次作業總分10分。 |  |  |
| 設定                                              | 祝同學們學習順心                                                 |  |  |
|                                                 | by TA                                                    |  |  |
| 1                                               |                                                          |  |  |
|                                                 | [提醒]                                                     |  |  |
|                                                 | 1                                                        |  |  |
|                                                 | 2                                                        |  |  |
|                                                 | 3                                                        |  |  |

#### References



- [1] Reference for fixed-point representation
  - Fixed-Point Representation
- [2] Reference for rounding to the nearest
  - Rounding MATLAB & Simulink
- [3] Reference for softplus function
  - Softplus Function
- [4] Reference for reciprocal multiplication
  - Reciprocal Multiplication