# Intro to HW Design & Externs for P4→NetFPGA

**CS344** – **Lecture 5** 

#### **Announcements**

## Updated deliverable description for next Tuesday

- Implement most of the required functionality
- Make sure baseline tests are passing
- Add your own!
- Out of town: May 3<sup>rd</sup> 6<sup>th</sup> (Interoperability test May 9<sup>th</sup>)
  - Office hours on May 3<sup>rd</sup> cancelled
  - Office hours on May 7<sup>th</sup> added

#### **Outline**

#### • Goal 1:

Build our own stateful extern for P4→NetFPGA

## Approach:

- Intro to HW design
- Finite State Machines a recipe for success
- Build our stateful extern
- Test it out

#### • Goal 2:

Packet parsing in HDL

# **Logic Gates**

#### **AND Gate**

| Α | В | Υ |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

**OR Gate** 

| Α | В | Υ |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |

**XOR Gate** 

| Α | В | Υ |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |







# Software vs Hardware Design

#### Software Design

- Functionality as sequence of instructions for CPU
- Language: C, C++, Python, etc.

#### Hardware Design

- Functionality as digital circuit
- Language: Verilog, VHDL

## A simple example:

```
if (a > b) {
    res = a;
}
else {
    res = b;
}
return res;
```

# **A Simple Example**

#### Software (C)

```
if (a > b) {
          res = a;
      else {
          res = b;
      return res;
      cmp a b tmp
      begz tmp else
      store r0 a
      return
else: store r0 b
      return
```

#### Hardware (Verilog)

```
wire a, b;
reg res;
always @(*)
    if (a > b) begin
        res = a;
    end
    else begin
        res = b;
    end
a
b
                         res
```

# Verilog

#### Basic data types:

- reg
  - Example: reg [7:0] A;
  - Can be used to hold state
- o wire
  - Example: wire [15:0] B;
  - Used for combinational (stateless) logic only

#### Example Usage:

```
OB = {A[7:0], A[7:0]}; // Assignment of bits
```

oreg [31:0] Mem [0:1023]; // 1K word memory

# **Combinational vs Sequential Logic**

## Combinational Logic

- Made of logic gates
- No memory elements
- Outputs settle to stable values after "short" logic delay

## Sequential

- Combinational circuits and memory elements
- Used to store state
- Output depends on inputs and current state

# **Adding State: Flip-Flop**



```
reg res;
always @(posedge clk)
    if (rst)
       res <= 0;
    else
      res <= res_next;</pre>
```

# **Register the Output**

```
wire a, b;
reg res next;
                                          reg res;
always @(*)
                                          always @ (posedge clk)
    if (a > b) begin
                                               if (rst)
        res next = a;
                                                   res <= 0;
    end
                                               else
    else begin
                                                   res <= res next; |
        res next = b;
    end
                                                  clk rst
                            res_next
                                                             res
                                                     D_out
                                             D in
       Combinational Logic
                                              Sequential Logic
```

## **Avoid Latches!**



else



a = c;

```
a = b;
```



```
always @(*) begin
    a = c;
    if (enable)
        a = b;
end
```



# **Set Up Time Constraints**



Assumes clock is perfectly synchronized at all flip-flops!

# **Set Up Time Constraints**



# **Set Up Time Constraints**



# **Finite State Machine (FSM)**



# **FSMs** Are Everywhere...



# What is an FPGA?



# What is an FPGA?



# **FPGA Design Flow**

## RTL Design

Describe design in HDL

#### RTL Simulation

## Synthesis

Decompose design into well defined logic blocks that are available on FPGA

#### Place and Route

Figure out exactly which logic blocks to use and how to route between them

## HW Testing

# P4-NetFPGA Extern Function library

HDL modules invoked from within P4 programs

Stateful Atoms [1]

| Atom      | Description                                            |
|-----------|--------------------------------------------------------|
| R/W       | Read or write state                                    |
| RAW       | Read, add to, or overwrite state                       |
| PRAW      | Predicated version of RAW                              |
| ifElseRAW | Two RAWs, one each for when predicate is true or false |
| Sub       | IfElseRAW with stateful subtraction capability         |

Stateless Externs

Atom Description

IP Checksum Given an IP header, compute IP checksum

LRC Longitudinal redundancy check, simple hash function
timestamp Generate timestamp (granularity of 5 ns)

Add your own!

# Build a new extern: reg\_srw

## Specifications:

- Single state variable
- Can either read or write state
- Produces result in 1 clock cycle
- Not accessible by control-plane

#### • P4 API:

# reg\_srw Next State Logic

#### opcodes:

- READ = 0
- WRITE = 1



```
wire valid, opcode;
wire [7:0] newVal;
reg [7:0] state_next;
always @(*)
    if (valid & opcode)
        state_next = newVal;
    else
        state_next = state;
```

# reg srw Finite State Machine

#### opcodes:

- READ = 0
- WRITE = 1



```
wire valid, opcode;
wire [7:0] newVal;
reg [7:0] state next;
always @(*)
    if (valid & opcode)
        state next = newVal;
    else
        state next = state;
reg [7:0] state;
always @ (posedge clk)
    if (rst)
        state <= 0;
    else
        state <= state next;</pre>
```

# **Register the Outputs**



# **Register the Outputs**



#### **More Advanced State Machines**

```
localparam STATE 1 = 0;
localparam STATE 2 = 1;
reg state, state next;
reg [1:0] output 1;
always @(*) begin
     // defaults
     state next = state;
     outpu\overline{t} 1 = 1;
     case (\overline{s}tate)
          STATE 1: begin
               \overline{\text{output 1}} = 1;
               state \overline{next} = STATE 2;
          end
          STATE 2: begin
               o\overline{u}tput 1 = 2;
               state \overline{n}ext = STATE 1;
          end
     endcase
end
always @(posedge clk) begin
     if (rst)
          state <= STATE 1;</pre>
     else
          state <= state next;</pre>
end
```



#### SDNet Extern API

#### P4 API:

#### **HDL Interface:**

```
module my reg srw (
  input
                                           clk,
  input
                                           rst,
  input
                                           input VALID,
                                           input DATA,
  input
          [REG WIDTH+OP WIDTH:0]
                                           output VALID,
  output
          [REG WIDTH-1:0]
                                           output DATA
  output
);
wire valid, stateful valid, opcode;
wire [REG WIDTH-1:0] newVal;
assign valid = input VALID;
assign {stateful valid, newVal, opcode} = input DATA;
```

# Stateful\_Valid Signal

```
bit<8> result;
if (p.hdr.invoke == 1) {
    myReg_reg_srw(0, REG_WRITE, result);
} else {
    result = 32;
}
```

#### Two cases:

- p.hdr.invoke == 1 → stateful\_valid signal will be set
- p.hdr.invoke != 1 → stateful\_valid signal will not be set
- valid signal will be asserted in both cases

## **State Machine**



#### SDNet Extern API

#### **HDL Interface:**

```
module @MODULE NAME@#(
  parameter REG WIDTH = @REG WIDTH@,
  parameter OP WIDTH = 1)
  input
                                         clk lookup,
  input
                                         rst,
  input
                                          tuple in @EXTERN NAME@ input VALID,
                                         tuple in @EXTERN NAME@ input DATA,
  input
         [REG WIDTH+OP WIDTH:0]
                                          tuple out @EXTERN NAME@ output VALID,
  output
  output [REG WIDTH-1:0]
                                         tuple out @EXTERN NAME@ output DATA
);
wire valid, stateful valid, opcode;
wire [REG WIDTH-1:0] newVal;
assign valid = tuple in @EXTERN NAME@ input VALID;
assign {stateful valid, newVal, opcode} = tuple in @EXTERN NAME@ input DATA;
```

# Adding extern support to P4→NetFPGA

• Update file: \$SUME\_SDNET/bin/extern\_data.py

#### Commands used:

- extern name full name of extern function, determined by SDNet
- o module name name of the top level extern module, determined by SDNet
- input\_width(field) width in bits of an input field, determined by P4 programmer

# Putting it all together: EXTERN\_reg\_srw\_template.v

```
module @MODULE NAME@#(
  parameter REG WIDTH = @REG WIDTH@,
  parameter OP WIDTH = 1)
  input
                                    clk lookup,
  input
                                    rst,
                                    tuple in @EXTERN NAME@ input VALID,
  input
                                    tuple in @EXTERN NAME@ input DATA,
  input
          [REG WIDTH+OP WIDTH:0]
                                    tuple out @EXTERN NAME@ output VALID,
  output
                                    tuple out @EXTERN NAME@ output DATA
  output
          [REG WIDTH-1:0]
);
// wire and reg declarations
wire valid, stateful valid, opcode;
wire [REG WIDTH-1:0] newVal;
reg [REG WIDTH-1:0] state, state next;
reg valid out;
// decoding the inputs
assign valid = tuple in @EXTERN NAME@ input VALID;
assign {stateful valid, newVal, opcode}
                  = tuple in @EXTERN NAME@ input DATA;
// next state logic
always @(*)
    if (valid & stateful valid & opcode)
        state next = newVal;
    else
        state next = state;
```

```
// state update / output logic
always @(posedge clk_lookup)
   if (rst) begin
       state <= 0;
      valid_out <= 0;
end
   else begin
      state <= state_next;
      valid_out <= valid;
end

// wire up the outputs
assign tuple_out_@EXTERN_NAME@_output_VALID = valid_out;
assign tuple_out_@EXTERN_NAME@_output_DATA = state;
endmodule</pre>
```

# Using our new extern: srw\_test.p4

```
// extern declaration
#define REG READ 0
#define REG_WRITE 1
@Xilinx MaxLatency(1)
@Xilinx ControlWidth(0)
extern void myReg reg srw(in bit<8> newVal,
                           in bit opCode,
                           out bit<8> result);
// match-action pipeline
control TopPipe (inout Parsed packet p,
                inout user metadata t user metadata,
                 inout digest data t digest data,
                 inout sume metadata t sume metadata) {
    apply {
       bit<16> newVal:
       bit opcode;
       if (p.ethernet.etherType > 10) {
           newVal = p.ethernet.etherType;
           opcode = REG WRITE;
       } else {
           newVal = 0; // unused
           opcode = REG READ;
       myReg reg srw(newVal, opcode, p.ethernet.etherType);
```

#### What we didn't cover

- Externs with control-plane interface
- BRAM based stateful extern
- Extern C++ implementations

## **AXI4 Stream Interface**

Standardized interface for streaming packets between modules



## **AXI4 Stream Interface**

# • Standardized interface for streaming packets between modules

| AXI4-Stream | Description                         |
|-------------|-------------------------------------|
| TDATA       | Data Stream                         |
| TKEEP       | Marks NULL bytes (i.e. byte enable) |
| TVALID      | Valid Indication                    |
| TREADY      | Flow control indication             |
| TLAST       | Indicates final word of packet      |
| TUSER       | Out of band metadata                |

#### **AXI4-Stream Handshake**



- TVALID & TREADY → data is being transferred
- TVALID & TREADY & TLAST → the final word of the pkt is being transferred

# **TUSER Bus for SimpleSumeSwitch**

```
/* standard sume switch metadata */
struct sume_metadata_t {
   bit<16> dma_q_size;
   bit<16> nf3_q_size;
   bit<16> nf2_q_size;
   bit<16> nf1_q_size;
   bit<16> nf0_q_size;
   bit<8> send_dig_to_cpu; // send_digest_data_to_CPU
   bit<8> dst_port; // one-hot_encoded
   bit<8> src_port; // one-hot_encoded
   bit<16> pkt_len; // unsigned_int
}
```

#### **HDL Ethernet Parser**

```
always @(*) begin
     // default values
     src mac w
                  = 0;
     dst mac w = 0;
     eth done w = 0;
     src port w
                 = 0;
     state next
                  = state;
     case(state)
       /* read the input source header and get the first word */
       READ MAC ADDRESSES: begin
          if(valid) begin
             src port w = tuser[SRC PORT POS+7:SRC PORT POS];
             dst mac w = tdata[47:0];
             src mac w = tdata[95:48];
             eth done w = 1;
             state next = WAIT EOP;
          end
       end // case: READ WORD 1
       WAIT EOP: begin
          if(valid && tlast)
             state next = READ MAC ADDRESSES;
          end
     endcase // case(state)
end // always @ (*)
```

```
always @(posedge clk) begin
   if(reset) begin
      src port <= {NUM QUEUES{1'b0}};</pre>
      dst mac <= 48'b0;
      src mac <= 48'b0;</pre>
      eth done <= 0;
      state <= READ MAC ADDRESSES;</pre>
   end
   else begin
      src port <= src port w;</pre>
      dst mac <= dst mac w;</pre>
      src mac <= src mac w;</pre>
      eth done <= eth done w;
      state <= state next;</pre>
   end
end
```

# **Parser Comparison**

# Verilog

```
always @(*) begin
                     = 0;
      src_mac_w
      dst_mac_w
                     = 0;
      eth done w
                     = 0;
      src_port_w
                     = 0;
      state next
                     = state;
        /* read the input source header and get the first word */
        READ MAC ADDRESSES: begin
           if(valid) begin
              src_port_w = tuser[SRC_PORT_POS+7:SRC_PORT_POS];
              dst mac w
                          = tdata[47:0];
              src_mac_w = tdata[95:48];
              eth_done_w = 1;
              state_next = WAIT_EOP;
        end // case: READ_WORD_1
        WAIT EOP: begin
           if (valid && tlast)
              state next = READ MAC ADDRESSES;
      endcase // case(state)
end // always @ (*)
   always @(posedge clk) begin
      if(reset) begin
         src port <= {NUM QUEUES{1'b0}};</pre>
         dst mac <= 48'b0;
         src mac <= 48'b0;
         eth done <= 0;
         state <= READ MAC ADDRESSES;
      else begin
         src port <= src port w;</pre>
         dst mac <= dst mac w;
         src mac <= src mac w;</pre>
         eth done <= eth done w;
         state <= state next;
      end // else: !if(reset)
   end // always @ (posedge clk)
```

# **P4**

# FIN

#### P4-NetFPGA Workflow

1. Write P4 program

All of your effort will go here

- 2. Write externs
- 3. Write python gen\_testdata.py script
- 4. Compile to Verilog / generate API & CLI tools
- 5. Run simulations
- 6. Build bitstream
- 7. Check implementation results
- 8. Test the hardware

fa

pass