# Pattern matching kernel

This notebook shows you how to use the *pattern matching kernel*.
The design provides network connectivity using User Datagram Protocol (UDP) as the transport protocol.

Let's have a look at the *project* design.

There are 4 Kernels:
* CMAC: provides the translation between physical signals to AXI4-Stream interface
* Network layer: provides a bridge between raw Ethernet packets and the application using UDP as transport layer
    * ARP provides translation between MAC and IP addresses
    * ICMP provides ping capabilities
    * The UDP module has a 16-entry table with socket information that needs to be filled in before running
* krnl_proj: free running kernel matching the input stream to possible patterns
* krnl_s2mm: read data from the stream and copy it to memory


## FPGA Reset

In [1]:
#Uncomment the next line and run to reset the FPGA if it is not taking programming or otherwise misbehaving
!xbutil reset --device 0000:02:00.1 --force

Performing 'HOT Reset' on '0000:02:00.1'
Are you sure you wish to proceed? [Y/n]: Y (Force override)
Successfully reset Device[0000:02:00.1]


## Import packages and program FPGA
In this section we need to import the `pynq` and python packages that will be used in the rest of this notebook. We also import the `vnx_utils.py` file with helper functions to set up the vnx examples.

In [2]:
import pynq
import numpy as np
import vnx_utils
import socket
import time
import threading
import struct
import os
import re

We also need to define the current device, only if there is more than one Alveo card on the host. First let's check how many devices are available.

In [3]:
for i in range(len(pynq.Device.devices)):
    print("{}) {}".format(i, pynq.Device.devices[i].name))

0) xilinx_u55c_gen3x16_xdma_base_3


In [4]:
currentDevice = pynq.Device.devices[0]
xclbin = "../project.intf0.xilinx_u55c_gen3x16_xdma_3_202210_1/vnx_project_if0.xclbin"
ol = pynq.Overlay(xclbin,device=currentDevice)
network_layer = ol.networklayer_0

## Network configuration

In [5]:

# Define IPs
alveo_ipaddr = '192.168.100.2'
sw_ip = '192.168.100.1'

print(f"Configuring Alveo IP: {alveo_ipaddr}")

# 1. Set Alveo IP Address
print(network_layer.set_ip_address(alveo_ipaddr, debug=True))

# 2. Configure Sockets
print("Configuring Sockets...")
network_layer.sockets[0] = (sw_ip, 50446, 60133, True)
network_layer.sockets[1] = (sw_ip, 38746, 62781, True)

network_layer.populate_socket_table(debug=True)

# 4. ARP Discovery (Broadcast to ensure the Host sees us)
network_layer.arp_discovery()

SW_PORT = ol.networklayer_0.sockets[1]['theirPort']
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # UDP
sock.bind(('', SW_PORT))
alveo_port = ol.networklayer_0.sockets[1]['myPort']

# --- WARM-UP ---
print("Sending warm-up bursts to lock ARP...", end="")
sock.sendto(b'\x00' * 64, (alveo_ipaddr, alveo_port))
time.sleep(0.1)
print(" Done.")
time.sleep(1.0) # Give the network a second to settle
# -------------------------------

print(f"Network Configured.")

Configuring Alveo IP: 192.168.100.2
{'HWaddr': '00:0a:35:02:9d:02', 'inet addr': '192.168.100.2', 'gateway addr': '192.168.100.1', 'Mask': '255.255.255.0'}
Configuring Sockets...
Sending warm-up bursts to lock ARP... Done.
Network Configured.


## Patteerns Parsing

In [6]:

def parse_patterns_header(file_path):
    patterns_db = {}
    if not os.path.exists(file_path):
        print("Error: patterns.h not found.")
        return {}

    with open(file_path, 'r') as f:
        content = f.read()

    re_data = re.search(r'const unsigned char PATTERN_DATA\[\d+\]\[\d+\]\s*=\s*\{(.*?)\};', content, re.DOTALL)
    re_len = re.search(r'const int PATTERN_LENGTHS\[\d+\]\[\d+\]\s*=\s*\{(.*?)\};', content, re.DOTALL)
    re_counts = re.search(r'const int NUM_PATTERNS_MATRIX\[\d+\]\s*=\s*\{([^}]+)\};', content)
    re_offsets = re.search(r'const int PATTERN_OFFSETS\[\d+\]\[\d+\]\s*=\s*\{(.*?)\};', content, re.DOTALL)

    if not (re_data and re_len and re_counts and re_offsets):
        return {}

    def parse_c_array(raw_str):
        rows = raw_str.split('},')
        matrix = []
        for row in rows:
            clean_row = row.replace('{', '').replace('}', '').strip()
            if clean_row:
                items = []
                for x in clean_row.split(','):
                    x = x.strip()
                    if not x: continue
                    try:
                        val = int(x, 16) if x.startswith('0x') else int(x)
                        items.append(val)
                    except ValueError: continue
                matrix.append(items)
        return matrix

    data_matrix = parse_c_array(re_data.group(1))
    len_matrix = parse_c_array(re_len.group(1))
    offset_matrix = parse_c_array(re_offsets.group(1))
    counts = [int(x.strip()) for x in re_counts.group(1).split(',')]

    global_id_offsets = [0] * len(counts)
    for i in range(1, len(counts)):
        global_id_offsets[i] = global_id_offsets[i-1] + counts[i-1]

    for n in range(min(len(counts), len(data_matrix))):
        for p in range(counts[n]):
            if n >= len(len_matrix) or p >= len(len_matrix[n]): continue
            p_len = len_matrix[n][p]
            p_start = offset_matrix[n][p]
            if p_len > 0 and (p_start + p_len) <= len(data_matrix[n]):
                pat_bytes = data_matrix[n][p_start : p_start + p_len]
                patterns_db[p + global_id_offsets[n]] = pat_bytes

    return patterns_db

# Load patterns
patterns_map = parse_patterns_header("../Project_kernels_HLS/src/patterns.h") 
print(f"Loaded {len(patterns_map)} patterns.")

Loaded 2662 patterns.


## Payload Generation

In [7]:
KERNEL_DWIDTH_BITS = 32  

# Derived Constants
BYTES_PER_BEAT = KERNEL_DWIDTH_BITS // 8

# Mapping bit width to numpy types for verification
DTYPE_MAP = {
    8:  np.uint8,
    16: np.uint16,
    32: np.uint32,
    64: np.uint64
}
KERNEL_DTYPE = DTYPE_MAP[KERNEL_DWIDTH_BITS]

print(f"Parametric Config: DWIDTH={KERNEL_DWIDTH_BITS} ({BYTES_PER_BEAT} Bytes/Beat)")
print(f"Verification Type: {KERNEL_DTYPE}")

def generate_host_payloads(patterns, num_packets=100, payload_size=1024):
    packets = []
    expected_map = {}
    keys = list(patterns.keys())
    
    # We track the global byte count because S2MM writes a continuous stream
    # derived from the incoming packets.
    global_byte_counter = 0
    
    for i in range(num_packets):
        data = bytearray(payload_size)
        
        if keys:
            pid = np.random.choice(keys)
            pat = patterns[pid]
            # Padding check
            if len(pat) < (payload_size - 64):
                offset = 64
                data[offset : offset+len(pat)] = pat

                match_byte_local = offset + len(pat) - 1
                
                # 2. Global Byte index in the entire stream
                match_byte_global = global_byte_counter + match_byte_local
                
                beat = match_byte_global // BYTES_PER_BEAT
                
                expected_map[beat] = pid
        
        packets.append(data)
        global_byte_counter += payload_size
        
    # Calculate total beats for buffer allocation
    # (Total Bytes / Bytes_Per_Beat)
    total_beats = global_byte_counter // BYTES_PER_BEAT
        
    return packets, expected_map, total_beats

# Generate
PAYLOAD_SIZE = 1024
NUM_PACKETS = 100
packet_list, expected_map, total_beats = generate_host_payloads(patterns_map, NUM_PACKETS, PAYLOAD_SIZE)

print(f"Generated {len(packet_list)} packets.")
print(f"Total Expected Beats: {total_beats}")

Parametric Config: DWIDTH=32 (4 Bytes/Beat)
Verification Type: <class 'numpy.uint32'>
Generated 100 packets.
Total Expected Beats: 25600


## Loopback configuration

In [8]:
print_lock = threading.Lock()
done = threading.Event()

# Global variable to store what the thread captures
captured_results = None

def socket_receive_threaded(sock, size): 
    global captured_results
    
    BYTES_PER_PACKET = 1024 
    
    shape_global = (size,)
    shape_local = (BYTES_PER_PACKET,)
    
    recv_data_global = np.empty(shape_global, dtype = np.uint8)
    data_partial = np.empty(shape_local, dtype = np.uint8)
    
    num_it = (size // BYTES_PER_PACKET)
    
    sum_bytes = 0
    connection = 'None'
    
    # Set timeout so we don't hang forever if packets are lost
    sock.settimeout(5.0)
    
    try:
        for m in range(num_it):
            # recvfrom_into writes directly into data_partial
            res = sock.recvfrom_into(data_partial) 
            
            # Copy partial buffer to global buffer
            start_idx = m * BYTES_PER_PACKET
            end_idx = start_idx + BYTES_PER_PACKET
            
            # Safety check for last packet
            if end_idx > size: end_idx = size
                
            recv_data_global[start_idx : end_idx] = data_partial[:(end_idx-start_idx)]
            
            sum_bytes = sum_bytes + int(res[0])
            connection = res[1]
            
        msg = "SUCCESS" 
    except socket.timeout:
        msg = "TIMEOUT"
    except Exception as e:
        msg = f"ERROR: {e}"

    # Export data for the verification cell
    captured_results = recv_data_global

    print_lock.acquire()
    print ("\n[Thread] Reception finished. Status: {}. Total received {:,} bytes from {}".format(msg, sum_bytes, connection))
    print_lock.release()
    
    done.set()

print("Receiver Thread Defined.")

Receiver Thread Defined.


## Execution

In [9]:

# 1. Setup Sockets
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('', SW_PORT)) # Bind to Host Port
sock.settimeout(5.0)

# 2. Reset Threading
print_lock = threading.Lock()
done = threading.Event()
captured_results = None 
done.clear()

# 3. Start Receiver Thread
expected_rx_bytes = total_beats * (KERNEL_DWIDTH_BITS // 8)
t = threading.Thread(target=socket_receive_threaded, args=(sock, expected_rx_bytes))
t.start()


# 4. Start Main Transmission
print(f"Sending {len(packet_list)} packets to {alveo_ipaddr}:{alveo_port}...")
sent_bytes = 0

for pkt in packet_list:
    sock.sendto(pkt, (alveo_ipaddr, alveo_port))
    sent_bytes += len(pkt)
    time.sleep(0.0005) # Flow control

print(f"Sent {sent_bytes} bytes. Waiting for return traffic...")

# 5. Wait for Completion
is_done = done.wait(timeout=10)

if is_done:
    print("Main: Thread signaled completion.")
else:
    print("Main: Timed out (Likely packet loss).")

# Cleanup
sock.close()

Sending 100 packets to 192.168.100.2:62781...

[Thread] Reception finished. Status: SUCCESS. Total received 102,400 bytes from ('192.168.100.2', 62781)
Sent 102400 bytes. Waiting for return traffic...
Main: Thread signaled completion.


## Verification

In [10]:

if captured_results is None or len(captured_results) == 0:
    print("ERROR: No data captured.")
else:
    # 1. View Data
    output_beats = captured_results.view(KERNEL_DTYPE)
    
    # 2. Find Stream Alignment (Global Shift)
    # We check the first 5 expected patterns to see if they appear shifted
    # by multiples of packet size (256 beats for DWIDTH=32)
    packet_size_beats = 1024 // (KERNEL_DWIDTH_BITS // 8) # 256 for DWIDTH=32
    
    global_shift = 0
    detected = False
    
    print("Synchronizing stream...")
    
    # Sort map to find the first few expected beats
    sorted_map = sorted(expected_map.items())
    first_beat, first_id = sorted_map[0]
    
    # Search for the first pattern in a wide range
    search_window = packet_size_beats * 5 # Look within +/- 5 packets
    
    for offset in range(-search_window, search_window):
        check_idx = first_beat + offset
        if 0 <= check_idx < len(output_beats):
            if output_beats[check_idx] == first_id:
                global_shift = offset
                detected = True
                break
    
    if detected:
        print(f"Stream Locked! Global Shift: {global_shift} beats.")
        if abs(global_shift) == packet_size_beats:
            print(f" -> DIAGNOSIS: Exactly 1 packet ({global_shift} beats) was lost/shifted.")
    else:
        print("WARNING: Could not synchronize stream. Verification will likely fail.")

    # 3. Verify with Shift Applied
    matches = 0
    misses = 0
    
    print(f"\nVerifying {len(expected_map)} Patterns with shift {global_shift}...")

    for beat, exp_id in sorted_map:
        
        # Apply the detected global shift to our expectation
        aligned_beat = beat + global_shift
        
        found = False
        # Small window for local jitter
        local_window = 10 
        
        for w in range(-local_window, local_window):
            idx = aligned_beat + w
            if 0 <= idx < len(output_beats):
                if output_beats[idx] == exp_id:
                    found = True
                    break
                    
        if found:
            matches += 1
        else:
            misses += 1
            if misses <= 5:
                val = output_beats[aligned_beat] if 0 <= aligned_beat < len(output_beats) else -1
                print(f"Missed {exp_id} at Expected Beat {beat} (Aligned {aligned_beat}). Found: {val}")

    print(f"\nMatches: {matches}, Misses: {misses}")
    
    if matches > 0 and misses == 0:
        print("TEST PASSED!")
    elif matches > 0:
        print("TEST PASSED with Packet Loss (Partial Data).")
    else:
        print("TEST FAILED.")

Synchronizing stream...
Stream Locked! Global Shift: 0 beats.

Verifying 100 Patterns with shift 0...

Matches: 100, Misses: 0
TEST PASSED!


In [None]:

def save_debug_report(filename, expected_map, output_buffer):
    print(f"Generating debug report: {filename} ...")
    
    
    # CRITICAL FIX: View as uint16 because DWIDTH=16
    # This prevents the "big number" packing issue
    output_data = captured_results.view(KERNEL_DTYPE)
    
    with open(filename, "w") as f:
        # --- SECTION 1: EXPECTED VS ACTUAL ---
        f.write("=======================================================\n")
        f.write("SECTION 1: VERIFICATION (Expected vs Actual)\n")
        f.write("=======================================================\n")
        f.write(f"{'BEAT':<12} | {'EXPECTED ID':<12} | {'ACTUAL ID':<12} | {'STATUS'}\n")
        f.write("-" * 60 + "\n")
        
        matches = 0
        misses = 0
        
        # Sort by beat to keep it chronological
        for beat, exp_id in sorted(expected_map.items()):
            # Safety check for bounds
            if beat < len(output_data):
                act_id = output_data[beat]
            else:
                act_id = -1 # Out of bounds
            
            # Check for exact match
            status = "MATCH" if act_id == exp_id else "MISS"
            
            # Check for near-miss (shifted by +/- 4 beats)
            if status == "MISS":
                for offset in range(-4, 5):
                    check_idx = beat + offset
                    if 0 <= check_idx < len(output_data):
                        if output_data[check_idx] == exp_id:
                            status = f"SHIFTED ({offset:+d})"
                            break
            
            if status == "MATCH": 
                matches += 1
            else: 
                misses += 1
                
            f.write(f"{beat:<12} | {exp_id:<12} | {act_id:<12} | {status}\n")
            
        f.write("-" * 60 + "\n")
        f.write(f"SUMMARY: Matches: {matches}, Misses: {misses}, Total: {len(expected_map)}\n\n\n")

        # --- SECTION 2: RAW HARDWARE DETECTIONS ---
        f.write("=======================================================\n")
        f.write("SECTION 2: ALL HARDWARE DETECTIONS (Non-zero outputs)\n")
        f.write("=======================================================\n")
        f.write(f"{'BEAT':<12} | {'DETECTED ID'}\n")
        f.write("-" * 30 + "\n")
        
        # Scan entire buffer for any non-zero value
        hw_detections = np.nonzero(output_data)[0]
        
        if len(hw_detections) == 0:
            f.write("No patterns detected (Output is all zeros).\n")
        else:
            for beat in hw_detections:
                val = output_data[beat]
                f.write(f"{beat:<12} | {val}\n")
                
    print(f"Report saved. Open '{filename}' to analyze.")

# Run the export (using the correct buffer variable 'output_buffer')
save_debug_report("debug_results.txt", expected_map, output_beats)

Generating debug report: debug_results.txt ...
Report saved. Open 'debug_results.txt' to analyze.
