Vector Add FPGA Example

A "Hello World" equivalent for Intel FPGA devices using Intel oneAPI (up to version 2025.0). This example demonstrates vector addition (C[i] = A[i] + B[i]) on FPGA hardware using SYCL programming model.

Overview

This example performs element-wise addition of two vectors using Intel's oneAPI SYCL framework. The kernel reads from two input arrays and writes the sum to an output array. The implementation uses manual memory management with explicit memcpy operations for clarity and educational purposes.

Development Process

The FPGA development workflow consists of three stages with increasing compilation time:

Emulation (seconds) to ensure functional validation
- CPU emulation: make cpu
- FPGA emulator: make fpga_emu
Report Generation (minutes)
- make report - Generates hardware resource utilization and performance reports
- Requires BOARD_NAME configuration
Hardware Compilation (hours)
- make fpga - Full hardware synthesis and place-and-route
- make recompile_fpga - Recompiles host code with existing kernel binary
- Requires BOARD_NAME configuration

Configuration

Board Setup

Before generating reports or executing on FPGA hardware, you must specify your target board in the Makefile:

BOARD_NAME := your_board_name

Common board names:

intel_a10gx_pac:pac_a10 - Intel Arria 10
intel_s10gx_pac:pac_s10 - Intel Stratix 10
ia840f:ofs_ia840f - Intel Agilex 7 (Bittware's IA840F)
/path/to/IOFS_BUILD_ROOT/oneapi-asp/<folder>:<variant>

Edit the Makefile and uncomment/modify the appropriate board configuration for your system.

Compilation Targets

Quick Development

# CPU execution (fastest for development)
make cpu
./vec_add.cpu

# FPGA emulator (seconds)
make fpga_emu
./vec_add.fpga_emu

Performance Analysis

# Generate hardware reports
make report

# Full hardware compilation
make fpga
./vec_add.fpga

Host-Only Recompilation

# Recompile only host code (seconds vs hours)
make recompile_fpga

Architecture

File Structure

main.cxx - Host code with memory allocation, data transfer, and timing
kernel.cxx - Device kernel implementing vector addition
kernel.hpp - Header with SYCL kernel interface
Makefile - Build system with multiple targets

Memory Management

This example uses manual memory allocation with explicit memcpy operations:

malloc_device<DATATYPE>() for device memory
queue.memcpy() for host-device transfers
No USM (Unified Shared Memory) for educational clarity

Kernel Implementation

The kernel uses a single_task with optional loop unrolling:

h.single_task([=]() [[intel::kernel_args_restrict]] {
    #pragma unroll UNROLL
    for (size_t i = 0; i < N; ++i) {
        const DATATYPE a = d_A[i];
        const DATATYPE b = d_B[i];
        d_res[i] = a + b;
    }
});

Performance Metrics

II stands for "Initiation Interval" - the number of clock cycles between starting consecutive iterations of a loop. An II of 1 means the loop can start a new iteration every clock cycle (optimal). Higher II values indicate pipeline stalls or resource conflicts.
Frequency: Design clock frequency read from executable file (using aocl info), used for II calculations. FPGA designs run at fixed frequencies (typically 200-500 MHz) determined during synthesis, unlike CPUs that can dynamically adjust clock speeds.
Throughput: Measured memory transfer rate in GB/s, reported separately for kernel (load/store) and host-device transfers

Optimization Features

Loop Unrolling: Controlled by UNROLL macro at compile time. Disabled by default
Restrict pointers: __restrict__ keyword prevents pointer aliasing
Kernel arguments restrict: [[intel::kernel_args_restrict]] attribute

Compilation Options

Data Type

# Use float instead of double
make USE_FLOAT=1 fpga

Loop Unrolling

# Enable unrolling (modify kernel.hpp)
#define UNROLL 1

Key Features

Separated Host/Kernel Compilation: Kernel compiled to dynamic library, enabling fast host-only recompilation
Multiple Execution Targets: CPU, emulator, simulator, and hardware
Comprehensive Timing: Detailed breakdown of memory transfers and computation
Performance Analysis: Automatic II calculation and throughput estimation
Verification: Built-in result validation against CPU reference

Troubleshooting

Common Issues

Board not found: Verify BOARD_NAME in Makefile
Memory insufficient: Reduce problem size or use smaller data type
Compilation errors: Check oneAPI installation and board support

Performance Tips

Use loop unrolling for better II
Consider data type impact (float vs double, or even more specific data types, using ac_int / ap_float)
Monitor memory bandwidth utilization

Dependencies

Intel oneAPI Base Toolkit (2024.0 or later)
Intel oneAPI HPC Toolkit
Intel FPGA Add-on for oneAPI
Supported Intel FPGA board

License

MIT License - see LICENSE file for details.

This example is provided as-is for educational purposes. The MIT License allows for maximum freedom in using, modifying, and distributing the code while providing minimal liability protection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vector Add FPGA Example

Overview

Development Process

Configuration

Board Setup

Compilation Targets

Quick Development

Performance Analysis

Host-Only Recompilation

Architecture

File Structure

Memory Management

Kernel Implementation

Performance Metrics

Optimization Features

Compilation Options

Data Type

Loop Unrolling

Key Features

Troubleshooting

Common Issues

Performance Tips

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
kernel.cxx		kernel.cxx
kernel.hpp		kernel.hpp
main.cxx		main.cxx

License

fm16191/fpga-vector-add

Folders and files

Latest commit

History

Repository files navigation

Vector Add FPGA Example

Overview

Development Process

Configuration

Board Setup

Compilation Targets

Quick Development

Performance Analysis

Host-Only Recompilation

Architecture

File Structure

Memory Management

Kernel Implementation

Performance Metrics

Optimization Features

Compilation Options

Data Type

Loop Unrolling

Key Features

Troubleshooting

Common Issues

Performance Tips

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages