# HLS Project Instructions  
*Made by Anish & Bowen*

## Step 1 – HLS Code Development

1. **Write C/C++ HLS code for:**
   - `conv2d` (see: `HLS/HLS_Convolution/Include`)
   - `max_pooling` (2x2) (see: `HLS/HLS_Poolings/Include`)
   - `min_pooling` (2x2) (see: `HLS/HLS_Poolings/Include`)
   - `avg_pooling` (2x2) (see: `HLS/HLS_Poolings/Include`)

2. **Test your code locally in Vitis HLS using a `tb.cpp` testbench.**

3. **Synthesize** the design in Vitis HLS.

4. **Export** the RTL output (Verilog/VHDL).

---

## Step 2 – Vivado Block Design

1. Open Vivado.
2. Create a new block design and add your HLS IP cores.
3. Connect the following components (use auto connect if available):
   - Zynq Processing System (PS)
   - AXI interfaces
   - DMA (Direct Memory Access)
4. Generate the bitstream.
5. Export the PYNQ overlay (`.xsa` file).

---

## Step 3 – Set Up PYNQ

1. Download the appropriate PYNQ image for your board from [PYNQ Boards](https://www.pynq.io/boards.html).
2. Flash the image onto an SD card and insert it into the PYNQ board.
3. Connect the PYNQ board to your computer via USB or Ethernet.
4. Use a terminal program (e.g., PuTTY) to access the PYNQ board's command line.
5. Follow the setup instructions here to ensure your board is connected and accessible: [Nengo PYNQ Setup Guide](https://www.nengo.ai/nengo-pynq/connect.html#via-a-computer).
6. On the PYNQ terminal, run `ip a` to find the board's IP address.
7. Enter the IP address in your web browser to access the Jupyter Notebook interface on the PYNQ board.

---

## Step 4 – Python on PYNQ

1. Write Python code for the PYNQ-Z2 board to:
   - Load the overlay
   - Perform DMA transfers
   - Start the HLS IP cores
   - Read back results
   - Process input and save the output

---

## Example: Running Convolution from Python

```python
from pynq import Overlay, allocate
import numpy as np
from PIL import Image
import os

IMG_WIDTH = 128
IMG_HEIGHT = 128
INPUT_DIR = "images"
OUTPUT_DIR = "results"

overlay = Overlay("Convolution.xsa")
overlay.download()
conv = overlay.convolution_0
os.makedirs(OUTPUT_DIR, exist_ok=True)

for i in range(1, 11):
    img_path = f"{INPUT_DIR}/input{i}.png"
    out_path = f"{OUTPUT_DIR}/output{i}.png"

    img = Image.open(img_path).convert("L").resize((IMG_WIDTH, IMG_HEIGHT))
    img_np = np.array(img, dtype=np.uint8)

    in_buf = allocate(shape=(IMG_HEIGHT, IMG_WIDTH), dtype=np.uint8)
    out_buf = allocate(shape=(IMG_HEIGHT, IMG_WIDTH), dtype=np.uint8)
    np.copyto(in_buf, img_np)

    in_addr = int(in_buf.physical_address)
    out_addr = int(out_buf.physical_address)

    conv.write(0x10, in_addr & 0xFFFFFFFF)
    conv.write(0x14, (in_addr >> 32) & 0xFFFFFFFF)
    conv.write(0x1C, out_addr & 0xFFFFFFFF)
    conv.write(0x20, (out_addr >> 32) & 0xFFFFFFFF)
    conv.write(0x00, 0x01)

    while (conv.read(0x00) & 0x2) == 0:
        pass

    out_img = Image.fromarray(out_buf).convert("L")
    out_img.save(out_path)
    print(f"Saved {out_path}")

## Example: Running Convolution from Python
```python
from pynq import Overlay, allocate
import numpy as np
from PIL import Image
import os

IMG_WIDTH = 64
IMG_HEIGHT = 64
POOL_WIDTH = 32
POOL_HEIGHT = 32
UPSCALE_FACTOR = 8
INPUT_DIR = "images"
OUTPUT_DIR = "results"

overlay = Overlay("Pooling.xsa")
overlay.download()
poll = overlay.pollings_0
os.makedirs(OUTPUT_DIR, exist_ok=True)

for i in range(1, 11):
    img_path = f"{INPUT_DIR}/input{i}.png"
    out_max = f"{OUTPUT_DIR}/max_pool_{i}.png"
    out_min = f"{OUTPUT_DIR}/min_pool_{i}.png"
    out_avg = f"{OUTPUT_DIR}/avg_pool_{i}.png"

    img = Image.open(img_path).convert("L").resize((IMG_WIDTH, IMG_HEIGHT))
    img_np = np.array(img, dtype=np.uint8)

    in_buf = allocate(shape=(IMG_HEIGHT, IMG_WIDTH), dtype=np.uint8)
    max_buf = allocate(shape=(POOL_HEIGHT, POOL_WIDTH), dtype=np.uint8)
    min_buf = allocate(shape=(POOL_HEIGHT, POOL_WIDTH), dtype=np.uint8)
    avg_buf = allocate(shape=(POOL_HEIGHT, POOL_WIDTH), dtype=np.uint8)

    np.copyto(in_buf, img_np)

    poll.write(0x10, in_buf.physical_address & 0xFFFFFFFF)
    poll.write(0x14, (in_buf.physical_address >> 32))
    poll.write(0x1C, max_buf.physical_address & 0xFFFFFFFF)
    poll.write(0x20, (max_buf.physical_address >> 32))
    poll.write(0x28, min_buf.physical_address & 0xFFFFFFFF)
    poll.write(0x2C, (min_buf.physical_address >> 32))
    poll.write(0x34, avg_buf.physical_address & 0xFFFFFFFF)
    poll.write(0x38, (avg_buf.physical_address >> 32))
    poll.write(0x00, 0x01)

    timeout = 1000000
    while (poll.read(0x00) & 0x2) == 0 and timeout > 0:
        timeout -= 1

    if timeout == 0:
        print("Timeout waiting for IP to finish.")
        continue

    def upscale(buf):
        return np.kron(buf, np.ones((UPSCALE_FACTOR, UPSCALE_FACTOR), dtype=np.uint8))

    Image.fromarray(upscale(max_buf)).save(out_max)
    Image.fromarray(upscale(min_buf)).save(out_min)
    Image.fromarray(upscale(avg_buf)).save(out_avg)
    print(f"Saved: {out_max}, {out_min}, {out_avg}")