# Convolution Co-processor for ZYNQ7000 processing system

# Joey De Smet Sam Decorte

Faculty of Engineering Technology, KU Leuven - Bruges Campus Spoorwegstraat 12, 8200 Bruges, Belgium {joey.desmet, sam.decorte}@student.kuleuven.be

#### Abstract

Keywords— Co-Processor, SIMD

I. INTRODUCTION

II. IMPLEMENTATION



Fig. 1. Overview interconnect architecture

### III. PERFORMANCE ANALYSIS

In this section, we evaluate the performance of the proposed convolution co-processor. Metrics include processing throughput, latency, resource utilization, and energy efficiency. Comparisons are made with a reference CPU-only implementation on the ZYNQ7000 processing system.

#### A. Experimental Setup

The experiments were performed on a Digilent ZedBoard development board with the following specifications:

- Processing System: Dual-core ARM Cortex-A9, 667 MHz
- FPGA: XC7Z020 (Artix-7), 53k LUTs, 106k FFs, 4.9 Mb BRAM
- Clock frequency of co-processor: 100 MHz
- Test images: resolution  $640 \times 480$ , 32-bit RGBA
- Convolution kernel:  $3 \times 3$

#### B. Latency and Throughput

The latency  $T_{\rm latency}$  of the co-processor is measured as the time between issuing a convolution request and receiving the processed data:

$$T_{\text{latency}} = T_{\text{transfer}} + T_{\text{compute}} + T_{\text{response}}$$
 (1)

Throughput  $R_{\text{throughput}}$  is calculated as:

$$R_{\rm throughput} = \frac{{\rm Number\ of\ pixels\ processed}}{T_{\rm latencv}} \eqno(2)$$

**TABLE I.** Latency and throughput for processing new versus in-memory images

| In memory | Latency [ms] | Throughput [MPix/s] |
|-----------|--------------|---------------------|
| No        | _            | _                   |
| Yes       | _            | -                   |

#### C. Resource Utilization

The FPGA resource usage of the convolution co-processor is summarized in Table II:

TABLE II. FPGA Resource Utilization

| Resource   | Used | Available |
|------------|------|-----------|
| LUTs       | -    | _         |
| Flip-Flops | -    | -         |
| BRAM [Kb]  | -    | _         |
| DSP Slices | _    | _         |

#### D. Comparison with CPU Implementation

For reference, a CPU-only implementation, as a FreeRTOS task with highest priority, was run on the ARM Cortex-A9 core. Table III summarizes the speed-up achieved:

TABLE III. Speed-Up of FPGA Co-Processor vs CPU

| CPU Latency [ms] | FPGA Speed-Up |
|------------------|---------------|
| _                | _             |

# E. Energy Efficiency

Energy consumption was measured for the convolution coprocessor using onboard power monitoring or external measurement tools. The energy efficiency  $\eta$  is defined as the number of pixels processed per joule of energy consumed:

$$\eta = \frac{\text{Number of pixels processed}}{F_{\text{total}}} \quad [\text{MPixels/J}]$$
(3)

where  $E_{\rm total}$  is the total energy consumed during the convolution operation.

**TABLE IV.** Energy efficiency of the co-processor for CPU and FPGA

| Platform | Energy [mJ] | Efficiency [MPix/J] |
|----------|-------------|---------------------|
| FPGA     | -           | _                   |
| CPU      | _           | _                   |

#### IV. CONCLUSION

#### V. FUTURE WORK

- Splitting the data into the different buffers to allow for more parallelism, is now managed by the processor. A hardware implementation could make it possible for data to be streamed in bigger burst which would decrease te delay for data transfer.
- Currently only  $3\times3$  kernels are supported some minor changes could be done to expand this to a  $n\times n$  kernel.

# ACKNOWLEDGMENT

The autors used generative AI tools to assist with language refinement, LaTeX table template generation and grammar correction during the preparation of this paper.

#### REFERENCES

- User'sAvailable: [1] Digilent, ZedBoard2014. Guide, https://files.digilent.com/resources/programmable-logic/zedboard/
- TedBoard\_HW\_UG\_v2\_2.pdf
   ARM, AXI specification, 2025. Available: https://developer.arm.com/documentation/ihi0022/latest/
   AMD, Zynq 7000 SoC Technical Reference Manual, 2023. Available: https://docs.amd.com/r/en-US/ug585-zynq-7000-SoC-TRM/Register-ICCIDR-Details