

# AI ASIC: Design and Practice (ADaP) Fall 2023 Advanced Verilog HDL

燕博南

#### **Outline**



- Part 1
  - Verilog HDL grammar, operators, synthesizable design
- Part 2
  - Finite State Machine
  - Pitfalls
- Part 3
  - Timing Disaster

HDL: Hardware Description Language

LUT: Look-up table Stdcell: standard cell



## Part 1

Verilog HDL grammar, operators, synthesizable design



#### **Digital Circuit Design Flow**



- Using Verilog you can write an executable functional specification that
- documents exact behavior of all the modules and their interfaces
- can be tested & refined until it does what you want

An HDL description is the first step in a mostly automated process to build an implementation directly from the behavioral model



- HDL -> Logic
- Map to target lib (stdcell/LUTs)
- Optimize speed, area

- Create floorplan blocks
- Place cells in blocks
- Route interconnect
- Optimize iteratively



#### Basic Building - Module



```
// single-line comments
    /* multi-line
    comments
    module name(
        input a,b,
 6
      input [31:0] c,
      output z,
       output reg [3:0] s
 9
                          — Don't forget ";" here
        10
    // declarations of internal signals, registers
    // combinational logic: assign
    // sequential logic: always @ (posedge clock)
    // module instances
    endmodule
15
```

In Verilog we design modules, one of which will be identified as our top-level module. Modules usually have named, directional ports (specified as input, output) which are used to communicate with the module.

 Format: HDL ignores space "", it only recognize ";"



Wires

Regs

#### Wires & Registers



```
// 2-to-1 multiplexer with dual-polarity outputs
module mux2(
input a,b,sel,
output z,zbar
);
// again order doesn't matter (concurrent execution!)
// syntax is "assign LHS = RHS" where LHS is a wire/bus
// and RHS is an expression
sassign z = sel ? b : a;
assign zbar = ~z;
multiplexer with dual-polarity outputs
the polarity outputs

purple description

purple description

purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple description
purple descr
```

```
module

a for zbor

sel zbor
```

```
module weird mux2(
        input a,b,sel,
 4
        output z,
 5
        output reg zbar
       again order doesn't matter (concurrent execution!)
    // syntax is "assign LHS = RHS" where LHS is a wire/bus
    // and RHS is an expression
    assign z = sel ? b : a;
11
    always @(posedge clk) begin
13
        zbar = \sim z;
14
    end
15
    endmodule
```

// 2-to-1 multiplexer with dual-polarity outputs







#### **Secrets of Wires & Regs**



- Without "reg" declaration, variables are always wires
- Reg can only be output and inside signals

- Caution!
  - Wires can only be changed using "assign" outside "always"
  - Regs can only be changed inside "always"

```
10 assign zbar = ~z;
```

```
12 always @(posedge clk) begin
13  zbar = ~z;
14 end
```





| Arithmetic | *        | Multiply              |  |
|------------|----------|-----------------------|--|
|            | /        | Division              |  |
|            | ,        |                       |  |
|            | +        | Add                   |  |
|            | -        | Subtract              |  |
|            | %        | Modulus               |  |
|            | +        | Unary plus            |  |
|            | -        | Unary minus           |  |
| Logical    | į        | Logical negation      |  |
|            | &&       | Logical and           |  |
|            |          | Logical or            |  |
| Relational | >        | Greater than          |  |
|            | <        | Less than             |  |
|            | >=       | Greater than or equal |  |
|            | <=       | Less than or equal    |  |
| Equality   | ==       | Equality              |  |
|            | !=       | inequality            |  |
| Shift      | >>       | Right shift           |  |
|            | <<       | Left shift            |  |
|            | <<<, >>> | Arithmetic shift      |  |

| Reduction | ~  | Bitwise negation |
|-----------|----|------------------|
|           | ~& | nand             |
|           |    | or               |
|           | ~  | nor              |
|           | ٨  | xor              |
|           | ^~ | xnor             |
|           | ~^ | xnor             |

| Concatenation | {} | Concatenation |
|---------------|----|---------------|
| Conditional   | ?  | conditional   |

What are the difference between (logic) shift and arithmetic shift?



#### **Numeric Constants**



Constant values can be specified with a specific width and radix:

```
123 // default: decimal radix, 32-bit width
'd123 // 'd = decimal radix
'h7B // 'h = hex radix
'o173 // 'o = octal radix
'b111_1011 // 'b = binary radix, "_" are ignored
'hxx // can include X, Z or ? in non-decimal constants
16'd5 // 16-bit constant 'b0000_0000_0101
11'h1X? // 11-bit constant 'b001_XXXX_ZZZZ
```

By default constants are unsigned and will be extended with 0's on left if need be (if high-order bit is X or Z, the extended bits will be X or Z too).

You can specify a signed constant as follows:

8'shFF // 8-bit twos-complement representation of -1

To be absolutely clear in your intent it's usually best to explicitly specify the width and radix.



#### Hierarchy: module instances

```
北京大学
PEKING UNIVERSITY
```

```
// 4-to-1 multiplexer
module mux4(input d0,d1,d2,d3, input [1:0] sel, output z);
wire z1,z2;
// instances must have unique names within current module.
// connections are made using .portname(expression) syntax.
// once again order doesn't matter...
mux2 m1(.sel(sel[0]),.a(d0),.b(d1),.z(z1)); // not using zbar
mux2 m2(.sel(sel[0]),.a(d2),.b(d3),.z(z2));
mux2 m3(.sel(sel[1]),.a(z1),.b(z2),.z(z));
// could also write "mux2 m3(z1,z2,sel[1],z,)" NOT A GOOD IDEA!
endmodule
```



- Write all original names (style requirement)
- Connection are concurrently executed





#### Example 1: A counter



```
module counter (
    out
            , // Output of the counter
    enable , // enable for counter
    clk
           , // clock Input
          // reset Input
    reset
    );
    output [1:0] out; //Output Ports
    input enable, clk, reset; //Input Ports
10
    reg [1:0] out; //Internal Variables
12
    always @(posedge clk)
    if (reset) begin
14
15
      out <= 2'b0;
    end else if (enable) begin
16
      out <= out + 1;
18
    end
19
    endmodule
```

#### What is the timing diagram?



Beware of "before" & "after" clock edge

#### **Verification: Simulation & Testbench**



```
Design:
    module counter (
            , // Output of the counter
    out
    enable
            . // enable for counter
    clk
            , // clock Input
               // reset Input
    reset
 6
    );
    output [1:0] out; //Output Ports
    input enable, clk, reset; //Input Ports
10
    reg [1:0] out; //Internal Variables
12
    always @(posedge clk)
    if (reset) begin
14
      out <= 2'b0 ;
15
    end else if (enable) begin
16
      out <= out + 1;
18
    end
19
```

endmodule

```
Testbench:
    `timescale 1ns/1ps
    module Testbench; _
 3
    wire [1:0] OUT;
                                 Testbench has no ports
    reg EN, CLK, RST;
    initial CLK = 0;
    always #2 CLK=~CLK;
 9
    initial begin
10
11
        #1
12
        EN = 0;
13
        RST = 1;
14
        #4
15
        EN = 1;
16
        RST = 0;
        #(4*7)
17
        EN = 0;
18
19
    end
20
    counter u1(
    .out (OUT)
                 , // Output of the counter
    .enable (EN) , // enable for counter
    .clk (CLK) , // clock Input
    .reset (RST)
                   // reset Input
26
27
    endmodule
```





#### Not synthesizable



Cannot easily converted to circuits

Initial #

```
`timescale 1ns/1ps
    module Testbench;
    wire [1:0] OUT;
    reg EN, CLK, RST;
    initial CLK = 0;
    always #2 CLK=~CLK;
    initial begin
11
        #1
        EN = ∅;
12
13
        RST = 1;
14
        #4
15
        EN = 1;
16
        RST = 0;
17
        #(4*7)
18
        EN = 0;
19
    end
20
    counter u1(
    .out (OUT)
                  , // Output of the counter
    .enable (EN) , // enable for counter
    .clk (CLK) , // clock Input
    .reset (RST) // reset Input
26
    );
27
    endmodule
```

#### **Blocking and Non-blocking**





```
1  // shift register
2  reg q1,q2,out;
3  always @(posedge clk) q1 <= in;
4  always @(posedge clk) q2 <= q1;
5  always @(posedge clk) out <= q2;</pre>
```



## Part 2

Finite State Machine





Hardware/Circuits





#### Design Methodologies & Templates





Output=1

State 1

A=0

State 2

Output=0

A=0

State 3

Output=1



Step 1: define states

Step 2: draw state-transfer diagram

Step 3: fill in the template



#### FSM Example 1 Passcode Detector



Question: build circuits that outputs 1 pulse, whenever receives "110"







#### FSM Example 1 – Step 1 & 2



Question: build circuits that outputs 1 pulse, whenever receives "110"





Question: build circuits that outputs 1 pulse, whenever receives "110"

```
module PasscodeDetector (
         input clk, data in, rstb,
 2
         output reg data out
 3
 4
     );
 5
 6
         reg [1:0] state;
         // Declare states
 7
         parameter STAT_IDLE = 0,
 8
                      STAT R1 = 1,
 9
                      STAT R2 = 2,
10
                      STAT R3 = 3;
11
12
         // Output depends only on the state
         always @ (state) begin
13
             if(state == STAT R3) begin
14
15
                  data out <= 1; // alarming!</pre>
16
             end
17
             else begin
18
                  data out <= 0;
19
             end
20
         end
```

```
// Determine the next state
22
          always @ (posedge clk) begin
23
               if (~rstb)
24
25
                   state <= STAT IDLE;</pre>
26
               else
27
                   case (state)
28
                        STAT IDLE: begin
29
                            if(data_in==1) begin
                                 state <= STAT R1;</pre>
30
31
                            end
32
                        end
                        STAT_R1: begin
33
                            if (data_in==1)
34
35
                                 state <= STAT R2;</pre>
                            else
36
37
                                 state <= STAT IDLE;</pre>
38
                        end
                        STAT_R2: begin
39
40
                            if (data_in==0)
41
                                 state <= STAT R3;</pre>
42
                            else
                                 state <= STAT R2;</pre>
43
44
                        end
                        STAT_R3: begin
45
                            if(data_in==1) begin
46
47
                                 state <= STAT R1;</pre>
48
                             end
                            else begin
49
                                 state <= STAT_IDLE;</pre>
50
51
                             end
52
                        end
                   endcase
53
54
          end
     endmodule
```



#### FSM Example 2 Auto Chip Testing Environment



Push button to test Function 1, Function 2, Function 3, ... in series





#### FSM Example 2 Step 1&2





Step 3 is so easy... that you can fill it yourself.



Philosophy behind: Use "state variable" to label timing

Another Q: How to generate custom waveform after entering some state?

Solution: setup an counter variable Do everything at its pace [! DO NOT ABUSE. This may cause large comparing logic.]



Possibility of Nested State Machine!

```
always @ (posedge clk or posedge reset) begin
              if (reset)
                  state <= S0;
              else begin
 4
                  case (state)
                      S0:
 6
                          data_out = 2'b01;
 8
                      S1: begin
                          if(cnt==10'd1) begin
 9
10
                               //do what you want
                          end
11
                          else if (cnt<=10'd5) begin
12
13
                               //do what you want
14
                          end
                          else if (cnt<=10'd20) begin
15
                               //do what you want
16
17
                          end
                          else begin
18
                               //do what you want
19
                          end
20
21
22
                      end
                      S2:
23
24
                          data_out = 2'b11;
25
                      S3:
                          data out = 2'b00;
26
                      default:
27
28
                          data out = 2'b00;
                  endcase
29
30
              end
31
     end
```



#### A practical application - Vending Machine



All selections are ¥ 0.30

The machine make changes

#### Inputs:

- ¥ 0.25
- ¥ 0.10
- ¥ 0.05

#### Outputs

- Dispense can
- Dispense ¥ 0.10
- Dispense ¥ 0.05





#### A practical application - Vending Machine



■ A starting (idle) state:



■ A state for each possible amount of money captured:



■ What's the maximum amount of money captured before purchase?

25 cents (just shy of a purchase) + one quarter (largest coin)



States to dispense change (one per coin dispensed):



#### A practical application - Vending Machine









- State Machine vs. Al
  - State machines exhaustively cover all cases, which is impossible in Al agent design.
- State machine is appropriate for small scale designs.
- Nested state machines:
  - normally, do not nest inside FSM in >2 folds.
  - Otherwise, too complicated timing path dependency.



## Part 3

Timing





- Central Question: How to make faster computer? (performance)
- Q1: What are the limitations for timing?
  - Timing & Delay Mechanism
- Q2: How to Analysis Timing in VLSI?
  - Static Timing Analysis (STA)
- Q3: How to improve the design?
  - Retiming in HDL



# Timing Part: Q1

What are the limitations for timing?







#### **Combinational Logic (CL) Delay**







Flip-flops need time to a ready & adequate data inputs



### 2 - Combinational Logic Delay









Application Example: load with unit "pF"



#### Things to Do in One Cycle





Clock Period T:

T>Time(Clk-to-q)+T(mux)+T(setup)





#### Some Thoughts on FSM Timing







#### Some Thoughts on FSM Timing





Time point: t1, t2, t3



#### 3 - Wire Delay







#### 4 - Clock Skew







## Timing Quality to Blame



|                           | Foundry             | Library Developer                   | CAD Tool       | Designer (You!)                     |
|---------------------------|---------------------|-------------------------------------|----------------|-------------------------------------|
| Gate Delay                | Physical parameters | Cell topology,<br>Transistor sizing | Cell selection | Choose Design<br>Corner to Consider |
| Wire Delay                | Physical parameters |                                     | Place & Route  | Layout                              |
| Cell Input<br>Capacitance | Physical parameters | Cell topology,<br>Transistor sizing | Cell selection |                                     |
| Cell Fanout               |                     |                                     | synthesis      | HDL                                 |
| Cell Drive Strength       | Physical parameters | Transistor sizing                   | Cell selection |                                     |



## Timing Part: Q2

How to Analysis Timing in VLSI? ... STA



### Timing Analysis for VLSI





**Definition of Paths** 







STA: Check gate-level netlist to find the timing for all paths



#### Timing Analysis for VLSI







# Timing Part: Q3

How to improve the design?

#### Technique 1 - Pipelining



Figure 1: A small graph before retiming. The nodes represent logic delays, with the inputs and outputs passing through mandatory, fixed registers. The critical path is 5.



Figure 2: The example in Figure 2 after retiming. The critical path is reduced from 5 to 4.



end

for faster cycles



#### Technique 2 – Floor planning





- 1. Make module connection natural:
  - Find good neighbors
- 2. Leave bus channels
- 3. Make bus wave guides!





Die photo of Intel first 8b processor 8008