Run your first SDAccel program on AWS F1

Dutch Althoff edited this page Jan 31, 2018 · 13 revisions

This tutorial explains the procedure to package an RTL design as an SDAccel kernel and then use this RTL kernel to accelerate a host application. The tutorial uses the vadd_kernel example from the SDAccel Github examples repository and covers the following:

  1. Writing an RTL design adhering to the SDAccel kernel interface requirements
  2. Packaging the RTL design as an SDAccel kernel (XO file)
  3. Compiling the host application and the FPGA binary containing the RTL kernel
  4. Creating the Amazon FPGA Image
  5. Executing the host application with the Amazon FPGA image

Note: This tutorial doesn't presently use the SDAccel RTL Kernel Wizard. The SDAccel RTL Kernel Wizard is a new feature which assists users through the process of packaging RTL designs as SDAccel kernels. The RTL Kernel Wizard generates the required XML file, an example project design and a set of scripts to build that example design into an XO file. For more details on how to use the RTL Kernel Wizard, you can watch this online video


Example Overview

This example is a simple vector-add design. The host application writes two vectors A and B of arbitrary length to the FPGA kernel which in turn sums the two vectors together to produce an output vector C. The host application then reads back the result.

Overview of the hardware kernel

The kernel has an AXI memory mapped master interface and an AXI lite slave interface:

  • The AXI master interface is used to read the values of A and B from global memory and write back the values of C
  • The AXI lite slave interface is used to pass paramaters and control the kernel as follows:
    • Offset 0x00: Control and status register
    • Offset 0x10: Base address of vector A in global memory
    • Offset 0x1C: Base address of vector B in global memory
    • Offset 0x28: Base address of vector C in global memory
    • Offset 0x34: Length of the vectors

The kernel starts executing when bit 0 of the control register is set to 1. The AXI master issues bursts requests to read values of A and B from global memory and streams them into two FIFOs, one for the values of A, one for the values of B. The adder module reads from both FIFOs, sums the values to compute C[i] = A[i] + B[i] and writes the result into an output FIFO. This FIFO is read by the AXI master to burst the results of the vector-add back into global memory. When the entire vectors have been processed, the kernel asserts bit 1 of the control register to indicate it is done.

Overview of the host application

The host.cpp file provides a very simple application to exercise the vector-add kernel. All FPGA-side operations are triggered using standard OpenCL API calls:

  • Buffers are created in the FPGA using cl::Buffer
  • Data is copy to and from the FPGA using <command_queue>.enqueueMigrateMemObjects
  • Kernel arguments (length of the vectors, base addresses of A, B, C) are passed using <kernel>.setArg
  • Kernel is executed using <command_queue>.enqueueTask

Of note, the FPGA device is initialized using the xcl::find_binary_file and xcl::import_binary_file utility functions. The xcl::find_binary_file function makes it very easy to find the desired FPGA binary file. The function looks in 4 predefined directories for a binary file matching one of the following names:

  • <name>.<target>.<device>.(aws)xclbin
  • <name>.<target>.<device_versionless>.(aws)xclbin
  • binary_container_1.(aws)xclbin
  • <name>.(aws)xclbin

Preparing to run the tutorial

  • Execute the following commands to clone the Github repository and configure the SDAccel environment:
    $ git clone https://github.com/aws/aws-fpga.git
    $ cd aws-fpga                                      
    $ source sdaccel_setup.sh
  • Go to the testcase directory
   $ cd SDAccel/examples/xilinx/getting_started/rtl_kernel/rtl_vadd

The SDAccel Github examples use common header files and those needs to be copied in the local project source folder to make it easier to use.

  • Type the command make local-files to copy all necessary files in the local directory.
  $ make local-files

1. Writing an RTL design adhering to the SDAccel kernel interface requirements

To be used as an SDAccel kernel, an RTL design must comply with the following signals and interface requirements:

  • Clock.
  • Active Low reset.
  • 1 or more AXI4 memory mapped (MM) master interfaces for global memory. All AXI MM master interfaces must have 64-bit addresses.
    • You are responsible for partitioning global memory spaces. Each partition in the global memory becomes a kernel argument. The memory offset for each partition must be set by a control register programmable via the AXI4 MM Slave Lite interface.
  • One and only one AXI4 MM slave lite I/F for control interface. The AXI Lite interface name must be S_AXI_CONTROL.
    • Offset 0 of the AXI4 MM slave lite must have the following signals:
      • Bit 0: start signal - The kernel starts processing data when this bit is set.
      • Bit 1: done signal - The kernel asserts this signal when the processing is done.
      • Bit 2: idle signal - The kernel asserts this signal when it is not processing any data.
  • One or more AXI4-Stream interfaces for streaming data between kernels.

In this example, the RTL is already compliant and doesn't need to be modified.

The RTL code for this example is located in the ./src/hdl directory.

2. Packaging the RTL design as an SDAccel kernel (XO file)

A fully packaged RTL Kernel is delivered as an XO file which has a file extension of .xo. This file is a container encapsulating a Vivado IP object (including RTL source files) and a kernel description XML file. The XO file can be compiled into the platform and run in the SDAccel hardware or hardware emulation flows.

To package the kernel and create the XO file the following steps are required:

  • Writing a kernel description XML file
  • Packaging the RTL as a Vivado IP suitable for use in IP Integrator
  • Running the package_xo command to generate the XO file

Writing a kernel description XML file

A special XML file is needed to describe the interface properties of the RTL kernel. The format for the kernel XML file is described in the Create Kernel Description XML File section of the documentation.

This XML file can be created manually or with the RTL Kernel Wizard. In this example, the XML file is already provided (./src/kernel.xml).

  • Look at the content of the file to familiarize yourself with the information captured in the XML description.

Packaging the RTL as a Vivado IP suitable for use in IP Integrator

The example comes with the ./scripts/package_kernel.tcl script which takes the existing RTL design and packages it as a Vivado IP. The script places it in an IP directory called ./packaged_kernel_${suffix} where “suffix” is specified as an user argument.    

Running the package_xo command to generate the XO file

  • In the SDAccel/examples/xilinx/getting_started/rtl_kernel/rtl_vadd directory, run the following commands to package the RTL and create the XO file:
  $ vivado -mode tcl  

  # Set suffix for the directory for RTL-IP import   
  Vivado% set suffix rtl_ip    

  # Import the RTL to the “packaged_kernel_{$suffix}” IP directory   
  Vivado% source scripts/package_kernel.tcl   

  # Create the XO file
  Vivado% package_xo -xo_path ./src/rtl_vadd.xo \
                     -kernel_name krnl_vadd_rtl \
                     -ip_directory ./packaged_kernel_rtl_ip \
                     -kernel_xml ./src/kernel.xml
  # Exit Vivado
  Vivado% exit

The ./src/rtl_vadd.xo file gets generated. It contains all the necessary information SDAccel needs to use the kernel.

3. Compiling the host application and the FPGA binary containing the RTL kernel

This section covers the following steps:

  • Creating and configuring a new project
    • Starting the SDAccel GUI
    • Creating a workspace
    • Setting the platform
    • Creating a new empty project
    • Importing the application host code and kernel XO file
    • Specifying the binary container for the kernel executable
  • Verifying the application using the hardware emulation flow
  • Compiling the host application and the FPGA binary for hardware execution

The host application code for this example is in the ./src/host.cpp file.

In the SDAccel flow the host code uses OpenCL APIs to interacts with the FPGA.

Creating and configuring a new project

Starting the SDAccel GUI

  • Open the SDx GUI by running the following command:
  $ sdx 

Creating a workspace

  • In the Workspace Launcher window, add a workspace inside the current directory named Test_dir as shown below. A new directory Test_dir will be created and used to store all the logfiles from our runs.

Setting the platform

  • In the Welcome window, click Add custom platform to set the path to AWS F1 platform.

  • Click on the plus sign as shown below.

  • Then browse to the /SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/ directory, select platform.

  • Click Apply and OK. This completes the platform setup process.

Creating a new empty project

  • In the Welcome window, click Create SDX Project and set the project name to TEST_RTL_KERNEL.
  • Move through the next three screens (keeping the default selections) by clicking Next -> Next -> Next.
  • Finally select an Empty Application in the Available Templates section, and then click Finish.

Importing the application host code and kernel XO file.

On the left side of the SDAccel GUI you will see the Project Explorer pane.

  • Right Click on project.sdx and then select Import.

  • Select General -> Filesystem and then click on Next.
  • Browse to the source file directory of the current example, rtl_vadd/src
  • Select the files host.cpp , xcl2.cpp, xcl2.h and rtl_vadd.xo as shown below.

Specifying the binary container for the kernel executable

Now that the files have been imported, we must instruct SDAccel to add a binary container, in other words an output file where the FPGA design will be compiled to.

In the center of the SDAccel GUI you will see the SDx Project Settings.

  • Click Add Binary Container the icon as shown below

The default name for the binary container is binary_container_1. Since the host application uses the xcl::find_binary_file utility function, it will automatically find the container by searching for a file with the default name.

The project creation and setup is now complete.

Verifying the application using the hardware emulation flow

SDAccel provides three different build configurations: Emulation-CPU, Emulation-HW and System.

In the Emulation-CPU mode, the host application executes with a C/C++ or OpenCL model of the kernel(s). The main goal of this mode is to ensure functional correctness of your application. Note: this mode is not presently supported for RTL kernels.

In the Emulation-HW mode, the host application executes with a RTL model of the kernel(s). This mode enables the programmer to check the correctness of the logic generated for the custom compute units and gives performance estimates.

In the System mode, the host application executes with the actual FPGA.

  • To run hardware emulation, go to SDx Project Settings and make sure that Active build configuration is set to Emulation-HW.

  • Click the Build icon to start the emulation build process.

  • After the emulation build process completes, run Hardware Emulation by clicking the Run Icon.

After completion of Hardware Emulation run, you can find and inspect various reports in the Reports tab, such as the System Estimate, Profile Summary, and Application Timeline.

Compiling the host application and the FPGA binary for hardware execution

  • To run hardware execution, go to SDx Project Settings and set Active build configuration to System.
  • Click the Build icon to initiate the hardware build process.

It generally takes few hours to complete the hardware build.

At the end of this process, the host executable (TEST_RTL_KERNEL.exe) and FPGA binary (binary_container_1.xclbin ) are generated in the Test_dir/TEST_RTL_KERNEL/System directory.

  • Exit the SDAccel GUI.

4. Creating the Amazon FPGA Image

In order to execute the application on F1, an Amazon FPGA Image (AFI) must first be created from the FPGA binary (*.xclbin). This step cannot be presently performed through the SDAccel GUI. The AFI is created using the AWS create_sdaccel_afi.sh command line script.

  • Using your S3 bucket, S3 dcp folder and S3 log folder information, execute the following command:
  $ cd ./Test_dir/TEST_RTL_KERNEL/System
  $ $SDACCEL_DIR/tools/create_sdaccel_afi.sh \
         -xclbin=binary_container_1.xclbin \
         -o=binary_container_1 \
         -s3_bucket=<bucket-name> \
         -s3_dcp_key=<dcp-folder-name> \
         -s3_logs_key=<logs-folder-name>

The above step generates an *.awsxclbin file and an *_afi_id.txt file containing the ID of your AFI. The AFI ID can be used to check the status of the AFI generation process.

  • Note your AFI ID
  $ cat <timestamp>_afi_id.txt 
  • Check the status of the AFI generation process
  $ aws ec2 describe-fpga-images --fpga-image-ids <AFI ID> 

The command will return Available when the AFI is created, registered and ready to be used. The command will return Pending otherwise.

    "State: { 
        "Code" : Available" 
    }

5. Executing the host application with the Amazon FPGA image

Once the AFI is Available, you can execute the application on the F1 instance.

  $ sudo sh
  # source /opt/Xilinx/SDx/2017.1.rte/setup.sh   
  # ./TEST_RTL_KERNEL.exe 
  Device/Slot[0] (/dev/xdma0, 0:0:1d.0)
  xclProbe found 1 FPGA slots with XDMA driver running
  platform Name: Xilinx
  Vendor Name : Xilinx
  Found Platform
  XCLBIN File Name: vadd
  INFO: Importing ./binary_container_1.awsxclbin
  Loading: './binary_container_1.awsxclbin'
  TEST PASSED

Behind these deceptively simple log messages, a lot just happened. The application:

  • Detected the FPGA platform
  • Loaded the binary_container_1.awsxclbin container
  • Retrieved the AFI id from the container and requested that the corresponding AFI be downloaded in the FPGA
  • Created buffers in the FPGA and transferred two vectors A and B
  • Triggered the FPGA kernel to sum the two vectors A and B
  • Read the results back and checked them for correctness

This concludes this tutorial on how to run your first SDAccel program on F1 using RTL kernels.

Do not forget to stop or terminate your instance.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.