# DeepSoCFlow PYNQ Driver Test Notebook

This notebook demonstrates how to use the PYNQ-based driver for the DeepSoCFlow CGRA accelerator. The driver provides a Python interface that replicates the bare-metal C runtime behavior, making it easier to test and validate the accelerator.

## Imports and Setup

In [None]:
import pynq
import numpy as np
import json
import os

from pynq_driver import DeepSoCFlowPYNQ

# Define paths to required files
NOTEBOOK_DIR = os.getcwd() 
BITSTREAM_PATH = os.path.join(NOTEBOOK_DIR, 'design_1.bit')
CONFIG_PATH = os.path.join(NOTEBOOK_DIR, 'config.json')
WBX_PATH = os.path.join(NOTEBOOK_DIR, 'wbx.bin')
Y_EXP_PATH = os.path.join(NOTEBOOK_DIR, 'y_exp.txt') 

# Verify all required files exist
assert os.path.exists(BITSTREAM_PATH), f"Bitstream not found at {BITSTREAM_PATH}"
assert os.path.exists(CONFIG_PATH), f"Config file not found at {CONFIG_PATH}"
assert os.path.exists(WBX_PATH), f"WBX file not found at {WBX_PATH}"
assert os.path.exists(Y_EXP_PATH), f"Expected output file not found at {Y_EXP_PATH}"

print("Setup complete. All necessary files found.")

## Load the FPGA Overlay

Load the bitstream onto the FPGA fabric. This configures the programmable logic with the DeepSoCFlow accelerator.

In [None]:
print("Loading overlay...")
overlay = pynq.Overlay(BITSTREAM_PATH)
# overlay.ip_dict
print("Overlay loaded successfully.")

## Initialize the Driver

Create the PYNQ driver instance. This will:
- Locate the accelerator IP in the overlay
- Load the hardware configuration from `config.json`
- Allocate contiguous memory buffers matching the C runtime structure

In [None]:
print("--- Initializing Driver ---")

ACCELERATOR_IP_NAME = 'axi_cgra4ml_0'

driver = DeepSoCFlowPYNQ(overlay, config_path=CONFIG_PATH, accelerator_ip_name=ACCELERATOR_IP_NAME)

print("\n--- Driver initialization complete! ---")

## Setup the Model

Load model weights, biases, and input data from the `wbx.bin` file, then configure the accelerator by:
- Pre-loading all bundle (layer) parameters into BRAM
- Configuring hardware registers with memory addresses
- Setting up control signals for PS-PL synchronization

In [None]:
print("\n--- Setting up Model ---")
driver.model_setup(wbx_path=WBX_PATH)
print("\n--- Model setup complete! ---")

## Run Inference and Validate Results

Execute the model inference on the accelerator and compare the output with expected results.

The driver will:
- Start the hardware accelerator
- Process each bundle (layer) sequentially
- Handle double-buffered OCM for PS-PL synchronization
- Apply post-processing (bias, activation, pooling, softmax)
- Return the final network output

In [None]:
print("\n--- Running Model Inference ---")

output = driver.model_run()

print("\n--- Model run complete! ---")
print(f"Output shape: {output.shape}")
print(f"Output dtype: {output.dtype}")
print(f"First 10 output values:\n{output[:10]}")

# Load expected output for comparison
print("\n--- Comparing Output with Expected Results ---")
last_bundle = driver.bundles[-1]
output_dtype = np.float32 if last_bundle['is_softmax'] else np.int32
expected_output = np.loadtxt(Y_EXP_PATH, dtype=output_dtype)

print(f"Expected output shape: {expected_output.shape}")
print(f"Expected output dtype: {expected_output.dtype}")
print(f"First 10 expected values:\n{expected_output[:10]}")

# Compare outputs with tolerance
is_close = np.allclose(output, expected_output, atol=1e-2, rtol=1e-2)
print(f"\nOutputs match expected (within tolerance): {is_close}")

if not is_close:
    print("\n--- Differences Found ---")
    diff_idx = np.where(~np.isclose(output, expected_output, atol=1e-2, rtol=1e-2))[0]
    if diff_idx.size > 0:
        print(f"Number of mismatches: {diff_idx.size}")
        print("First 20 differences:")
        for i in range(min(20, diff_idx.size)):
            idx = diff_idx[i]
            pynq_val = output[idx]
            exp_val = expected_output[idx]
            diff = abs(pynq_val - exp_val)
            print(f"  Index {idx}: PYNQ={pynq_val:.6f}, Expected={exp_val:.6f}, Diff={diff:.6f}")

print("\n--- Validation Complete ---")

## Cleanup

Release allocated resources and free the overlay.

In [None]:
print("\n--- Cleaning Up Resources ---")

# Release driver resources
try:
    if 'driver' in locals() and driver is not None:
        del driver
except NameError:
    pass

# Free the overlay
try:
    if 'overlay' in locals() and overlay is not None:
        overlay.free()
except NameError:
    pass

print("Cleanup complete.")