Skip to content

Latest commit

 

History

History

Host Examples

OpenCL host code for optimized interfacing with Xilinx Devices

Examples Table

Example Description Key Concepts / Keywords
concurrent_kernel_execution_c/ This example will demonstrate how to use multiple and out of order command queues to simultaneously execute multiple kernels on an FPGA. Key Concepts
- Concurrent execution
- Out of Order Command Queues
- Multiple Command Queues
Keywords
- CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
- clSetEventCallback()
copy_buffer_c/ This Copy Buffer example demonstrate how one buffer can be copied from another buffer. Key Concepts
- Copy Buffer
Keywords
- cl::CommandQueue::enqueueCopyBuffer()
data_transfer_c/ This example illustrates several ways to use the OpenCL API to transfer data to and from the FPGA Key Concepts
- OpenCL API
- Data Transfer
- Write Buffers
- Read Buffers
- Map Buffers
- Async Memcpy
Keywords
- enqueueWriteBuffer()
- enqueueReadBuffer()
- enqueueMapBuffer()
- enqueueUnmapMemObject()
- enqueueMigrateMemObjects()
device_query_c/ This example prints the OpenCL properties of the platform and its devices. It also displays the limits and capabilities of the hardware. Key Concepts
- OpenCL API
- Querying device properties
Keywords
- clGetPlatformIDs()
- clGetPlatformInfo()
- clGetDeviceIDs()
- clGetDeviceInfo()
device_query_cpp/ This Example prints the OpenCL properties of the platform and its devices using OpenCL CPP APIs. It also displays the limits and capabilities of the hardware. Key Concepts
- OpenCL API
- Querying device properties
errors_c/ This example discuss the different reasons for errors in OpenCL and how to handle them at runtime. Key Concepts
- OpenCL API
- Error handling
Keywords
- CL_SUCCESS
- CL_DEVICE_NOT_FOUND
- CL_DEVICE_NOT_AVAILABLE
errors_cpp/ This example discuss the different reasons for errors in OpenCL C++ and how to handle them at runtime. Key Concepts
- OpenCL C++ API
- Error handling
Keywords
- CL_SUCCESS
- CL_DEVICE_NOT_FOUND
- CL_DEVICE_NOT_AVAILABLE
- CL_INVALID_VALUE
- CL_INVALID_KERNEL_NAME
- CL_INVALID_BUFFER_SIZE
hbm_bandwidth/ This is a HBM bandwidth check design. Design contains 8 compute units of a kernel which has access to all HBM banks (0:31). Host application allocate buffer into all HBM banks and run these 8 compute units concurrently and measure the overall bandwidth between Kernel and HBM Memory.
hbm_simple/ This is a simple example of vector addition to describe how to use HLS kernels with HBM (High Bandwidth Memory) for achieving high throughput. Key Concepts
- High Bandwidth Memory
- Multiple HBM Banks
Keywords
- HBM
- XCL_MEM_TOPOLOGY
- cl_mem_ext_ptr_t
host_global_bandwidth/ Host to global memory bandwidth test
host_global_bandwidth_5.0_shell/ Host to global memory bandwidth test for 5.0 shell
kernel_swap_c/ This example shows how host can swap the kernels and share same buffer between two kernels which are exist in separate binary containers. Dynamic platforms does not persist the buffer data so host has to migrate data from device to host memory before swapping the next kernel. After kernel swap, host has to migrate the buffer back to device. Key Concepts
- Handling Buffer sharing across multiple binaries
- Multiple Kernel Binaries
Keywords
- clEnqueueMigrateMemObjects()
- CL_MIGRATE_MEM_OBJECT_HOST
mult_compute_units/ This is simple Example of Multiple Compute units to showcase how a single kernel can be instantiated into Multiple compute units. Host code will show how to use multiple compute units and run them concurrently. Key Concepts
- Multiple Compute Units
Keywords
- -nk
multiple_cus_asymmetrical/ This is simple example of vector addition to demonstrate how to connect each compute unit to different banks and how to use these compute units in host applications Key Concepts
- Multiple Compute Units
Keywords
- #pragma HLS PIPELINE
multiple_devices_c/ This example show how to take advantage of multiple FPGAs on a system. It will show how to initialized an OpenCL context, allocate memory on the two devices and execute a kernel on each FPGA. Key Concepts
- OpenCL API
- Multi-FPGA Execution
- Event Handling
Keywords
- cl_device_id
- clGetDeviceIDs()
multiple_process_c/ This example will demonstrate how to run multiple processes to utilize multiple kernels simultaneously on an FPGA device. Multiple processes can share access to the same device provided each process uses the same xclbin. Processes share access to all device resources but there is no support for exclusive access to resources by any process. Key Concepts
- Concurrent execution
- Multiple HLS kernels
- Multiple Process Support
Keywords
- PID
- fork
- XCL_MULTIPROCESS_MODE
- multiprocess
overlap_c/ This examples demonstrates techniques that allow user to overlap Host(CPU) and FPGA computation in an application. It will cover asynchronous operations and event object. Key Concepts
- OpenCL API
- Synchronize Host and FPGA
- Asynchronous Processing
- Events
- Asynchronous memcpy
Keywords
- cl_event
- clCreateCommandQueue
- CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
- clEnqueueMigrateMemObjects
streaming_chain/ This is a kernel containing the cascaded Matrix Multiplication using dataflow. ap_ctrl_chain is enabled for this kernel to showcase how multiple enqueue of Kernel calls can be overlapped to give higher performance. ap_ctrl_chain allow kernel to start processing of next kernel operation before completing the current kernel operation. Key Concepts
- ap_ctrl_chain
- PLRAM
streaming_host_bandwidth/ This is a simple Vector Increment C Kernel design with 1 Stream input and 1 Stream output that demonstrates on how to process an input stream of data for computation in an application and the Host to Device streaming bandwidth test. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- CL_STREAM_NONBLOCKING
streaming_k2k/ This is a simple kernel to kernel streaming Vector Add and Vector Multiply C Kernel that demonstrates on how to process a stream of data for computation between two kernels. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- CL_STREAM_NONBLOCKING
streaming_k2k_mm/ This is a simple kernel to kernel streaming Vector Add and Vector Multiply C Kernel design with 2 memory mapped input to kernel 1, 1 Stream output from kernel 1 to input of kernel 2, 1 memory mapped input to kernel 2, and 1 memory mapped output that demonstrates on how to process a stream of data for computation between two kernels. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- CL_STREAM_BLOCKING
streaming_mm_mixed/ This is a simple streaming Vector Addition C Kernel design with 1 Stream input, 1 memory mapped input to the kernel, and 1 stream output that demonstrates on how to process a stream of data for computation along with OpenCL buffers. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- CL_STREAM_BLOCKING
streaming_multi_cus/ This is a simple Vector Add C Kernel design with 2 Stream inputs and 1 Stream output that demonstrates on how to process an input stream of data for computation in an application using multiple compute units. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- Multiple Compute Units
streaming_simple/ This is a simple Vector Add C Kernel design with 2 Stream inputs and 1 Stream output that demonstrates on how to process an input stream of data for computation in an application. Key Concepts
- Read/Write Stream
- Create/Release Stream
Keywords
- cl_stream
- CL_STREAM_EOT
- CL_STREAM_NONBLOCKING
sub_devices_c/ This example demonstrates how to create OpenCL subdevices which uses a single kernel multiple times in order to show how to handle each instance independently including independent buffers, command queues and sequencing. Key Concepts
- Sub Devices
Keywords
- cl_device_partition_property
- createSubDevices
- CL_DEVICE_PARTITION_EQUALLY