Skip to content

AWS F1 DMA Example

Julian Kemmerer edited this page Jun 28, 2022 · 48 revisions
Table of Contents

This example uses Amazon's FPGA Developer AMI. This is the fastest way to run both the PipelineC tool and the AWS F1 build process. However, the PipelineC tool can also be run locally and its outputs copied to the AWS instance.

Instructions for starting an F1 instance are provided by Amazon.

This FPGA design is based off of AWS's own DMA example. Please refer to their documentation for non PipelineC questions.

The files mentioned below are all in the example directory.

How does this example work?

The original AWS DMA example allows you to write and read the FPGA as if it were an address space in memory. This example works by using a narrow portion of that functionality:

  • Write a small input buffer 'message' of fixed size N bytes to address 0
    • This acts as the input to the FPGA hardware
  • Read those same N bytes 'message' from address 0
    • This is the output from the FPGA hardware
  • No other addresses or buffer sizes are supported The code describing the conversion between AWS DMA interfaces and this simple 'message' abstraction is in files dma_msg.h, dma_msg_hw.c , and dma_msg_sw.c.

Software

Amazon provides a simple read+write interface to the FPGA through user space file IO and a kernel driver. This example writes+reads 'message' byte arrays to/from file - relatively simple code.

Hardware

Amazon uses an AXI4 bus in their DMA example. This example hardware serializes and deserializes bursts of AXI4 data to form 'message' byte arrays that can be passed to and from other logic such as the POSIX Experiment.

Input and output byte representation

DMA data is just bytes that need to be interpreted further, specific to your application.

The files work_sw.c and work_hw.c describe the conversion of the DMA message struct to/from work() input/output types used in this example.

The FPGA 'output = work(input)' function

This example does a matrix multiplication. The work.h file contains the definition of output = work(input): the function, its inputs (N floating point values), and outputs (a single floating point value).

Software driver/tester

test.c describes the standard test of 'do work on the CPU', 'do work on the FPGA', and see if there was a speed up. It includes helper functions to easily swap out what the input values are and how the output values are compared. In this example the CPU and FPGA both use the same work() function source code so this isn't the best possible CPU implementation to compare against.

Run the example

In your AWS F1 Developer AMI instance (doesn't need to be an F1 FPGA instance yet) use these steps to run the example:

These steps require 16+ GB of RAM for your instance:

  1. Update and install the latest PipelineC repo
cd ~/src/project_data/
git clone https://github.com/JulianKemmerer/PipelineC.git # Fine to fail if exists
cd PipelineC
git pull # In case already exists
cd examples/aws-fpga-dma
chmod +x install.sh
./install.sh
  1. Run the AWS environment setup scripts
AWS_FPGA_REPO_DIR=/home/centos/src/project_data/aws-fpga
cd $AWS_FPGA_REPO_DIR
source hdk_setup.sh
source sdk_setup.sh
cd $HDK_DIR/cl/examples/cl_dram_dma
export CL_DIR=$(pwd)
  1. Run the PipelineC tool (~ minutes to several hours)
cd ~/src/project_data/PipelineC/;
rm -r /home/centos/pipelinec_syn_output; 
python -u ./src/pipelinec 2>&1 | tee out.log

These steps require 32+ GB of RAM for your instance:

  1. Build Vivado checkpoint that will be turned into an Amazon FPGA Image (AFI) (~ several hours)
cd $CL_DIR/build/scripts
./aws_build_dcp_from_cl.sh
  1. Wait for vivado to finish and put checkpoint file in $CL_DIR/build/checkpoints/to_aws/
ls -lt $CL_DIR/build/checkpoints/to_aws/ | grep .tar | head -n 1
# Set these environment variables based on your output
export TARTIMESTAMP=20_03_20-103330
export TARFILENAME=$TARTIMESTAMP.Developer_CL.tar

These steps require very little RAM:

  1. Copy checkpoint to Amazon S3 for Amazon to do their magic. (requires AWS credentials to be setup)
# Set environment vars needed 
export REGION=us-east-1
export S3BUCKET=pipelinec
export S3DCPDIRNAME=dcps
export S3LOGSDIRNAME=logs
aws s3 mb s3://$S3BUCKET --region $REGION  # Create an S3 bucket (choose a unique bucket name)
aws s3 mb s3://$S3BUCKET/$S3DCPDIRNAME/   # Create folder for your tarball files
aws s3 cp $CL_DIR/build/checkpoints/to_aws/$TARFILENAME s3://$S3BUCKET/$S3DCPDIRNAME/    # Upload the file to S3
# Make room for Amazon's log file on S3
aws s3 mb s3://$S3BUCKET/$S3LOGSDIRNAME/  # Create a folder to keep your logs
touch LOGS_FILES_GO_HERE.txt                     # Create a temp file
aws s3 cp LOGS_FILES_GO_HERE.txt s3://$S3BUCKET/$S3LOGSDIRNAME/   #Which creates the folder on S3
  1. Tell Amazon to generate an AFI using those S3 files
export AFI_NAME=pipelinec
export AFI_DESC=aws_example
aws ec2 create-fpga-image --region $REGION --name $AFI_NAME --description $AFI_DESC --input-storage-location Bucket=$S3BUCKET,Key=$S3DCPDIRNAME/$TARFILENAME --logs-storage-location Bucket=$S3BUCKET,Key=$S3LOGSDIRNAME
# Set these environment variables based on your output
export AFIID=afi-0a418ee223c9a814c
export AGFIID=agfi-0148fb8a218d50b49
  1. Wait for Amazon to say your AFI is 'available' (~ few hours)
aws ec2 describe-fpga-images --fpga-image-ids $AFIID | grep "Code" 

These steps require an F1 FPGA instance:

  1. Start working with real FPGA hardware (must be on F1 instance now)
# Clear FPGA (30s)
sudo fpga-clear-local-image  -S 0
# Load FPGA (30s)
sudo fpga-load-local-image -S 0 -I $AGFIID
# Reset (pcie reset)
sudo fpga-describe-local-image -S 0 -R -H
  1. Do test (rebuild, reset fpga again, run ./test)
cd /home/centos/src/project_data/PipelineC/examples/aws-fpga-dma
reset; make clean; make && sudo fpga-describe-local-image -S 0 -R -H && sudo ./test
Clone this wiki locally