phsa-runtime is a generic implementation of HSA Runtime Specification 1.0. It is designed to be used with the GCC's BRIG frontend for finalization support.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
include
src
CMakeLists.txt
LICENSE
README.md
runtime_conformance.sh

README.md

Portable HSA Runtime

Phsa-runtime is a generic implementation of HSA Runtime Specification 1.0. It is designed to be used with the GCC's BRIG frontend for finalization support. It has been tested on Debian based Linux systems, but should port easily to other operating systems.

Phsa-runtime together with the GCC's BRIG frontend can be used to implement a base profile HSA 1.0 software stack with HSAIL finalization support for any GCC-supported target.

Phsa-runtime was developed by Parmance for General Processor Technologies who published it as an open source project in November 2016.

Status

Phsa-runtime fully passes the HSA runtime conformance test suite. There are a few HSA API methods that are not implemented yet, and a small number of implemented ones might be missing features.

Requirements

  • CMake 2.8+
  • A C++ compiler with C++11 support.
  • Boost Thread library (libboost-thread-dev)
  • Boost Filesystem library (libboost-filesystem-dev)
  • Boost System library (libboost-system-dev)
  • libelf (libelf-dev)
  • GCC BRIG frontend with a backend for the desired device to be used as a kernel agent

Installation

The familiar CMake build process is used:

mkdir build
cd build
cmake ..
make
make install

Note: HSA client programs typically link to a library named 'libhsa-runtime64.so' while phsa-runtime builds a library called 'libphsa-runtime64.so'. Thus, it's a good idea to create a symlink 'libhsa-runtime64.so -> libphsa-runtime64.so' in the install location.

GCC BRIG frontend

When a GCC version with the BRIG frontend ('gccbrig' binary) is installed to PATH and libhsail-rt can be found by both the static and the dynamic linker, GCC-based finalization should just work.

To use the GCC BRIG FE from GCC build tree without needing to install it (useful for GCC developers), a few environment variables need to be set. An example of how to setup such an environment is shown in the following:

# This is the location of the root of your GCC build.
GCC_BUILDROOT=$HOME/src/gccbrig

export LIBRARY_PATH=$GCC_BUILDROOT/build-master/gcc

export PHSA_COMPILER_FLAGS=-L$LIBRARY_PATH

# The location of libhsail-rt.so for your target in the
# GCC's build tree.
export PHSA_RUNTIME_DIR=$GCC_BUILDROOT/x86_64-pc-linux-gnu/libhsail-rt/.libs"

# Ensure phsa-runtime's finalizer's linker finds libhsail-rt.
export LDFLAGS="-L$PHSA_RUNTIME_DIR"

# Ensure the runtime linker finds it as well.
export LD_LIBRARY_PATH=$PHSA_RUNTIME_DIR:$LD_LIBRARY_PATH

# Add 'gccbrig' to the PATH from the build tree.
export PATH=$GCC_BUILDROOT/gcc/:$PATH

License

The code base is licensed with the permissive MIT license which allows adopting the phsa-runtime for closed and open source purposes.

Repository structure

Source tree is structured as follows:

.
├── include             Public headers
└── src
    ├── common          Common helpers
    ├── Devices         Agent implementations
    │   └── CPU         CPU agent that uses the host machine's cores for kernel execution
    ├── Finalizer       Finalizer implementations
    │   └── GCC         The default finalizer that uses the GCC BRIG frontend
    ├── hsa             HSA API method implementations
    └── Platform        Runtime implementations
        └── CPUOnly     Platform that only uses CPU agents for execution


Code Base Internals

The code base of phsa-runtime is a relatively straightforward class hierarchy. When porting phsa-runtime to a new platform, the relevant classes are Runtime, MemoryRegion, Agent, Queue and Signal which are covered in the following.

class Runtime (Runtime.hh, Runtime.cc)

A good spot to start when porting phsa-runtime to a new platform is the Runtime class. This class is a central point for holding objects that are used to control an HSA enabled platform. A recommended practice is to derive a new Runtime implementation for the new platform at hand, and create an instance of that class in Runtime::get().

In the Runtime class for the new target, the constructor should initialize all the objects used for controlling the resources of the platform. For example, MemoryRegion objects are used to keep book of allocations from different HSA memory regions, and Agent objects are used to control kernel agents associated with the platform. In addition, the extension list, which currently supports only the finalizer extension, must be filled. Typically GCCFinalizer is instantiated here as the finalizer implementation.

In addition to instantiating platform specific implementation of HSA objects, the Runtime's interface has methods for instantiating certain HSA objects. The subclass should override createSoftQueue() and createSignal() to return objects of the derived types.

An example implementation is the CPURuntime (CPURuntime.hh, CPURuntime.cc) class. The implementation includes a single "CPU agent" that utilizes GCCFinalizer for finalizing HSAIL programs for the host CPU. CPU agents are simply kernel agents that are running in the same set of processor cores the host program is running in. The basic assumption in the case of platforms with only CPU agents is a fine grained coherent virtual memory thanks to the shared memory hierarchy between the cores.

class MemoryRegion (MemoryRegion.hh)

Used for keeping book of allocations from different HSA memory regions in the platform.

FixedMemoryRegion (FixedMemoryRegion.hh, FixedMemoryRegion.cc) is a MemoryRegion implementation that returns chunks of memory from a fixed (virtual) address region with a straightforward (but fast) allocation algorithm. This is useful for platforms with heterogeneous devices with their own local physical memories mapped to certain physical address ranges that are in turn mapped to the process' virtual memory via mmap() or a similar mechanism. The standard malloc() cannot be used for allocation in that case as one must ensure chunks are returned from that address range only.

Instead of using a custom allocator, CPUMemoryRegion (CPUMemoryRegion.hh, CPUMemoryRegion.cc) uses the standard malloc() of the host system. It can be used for platforms with only CPU agents.

class Agent (Agent.hh)

A derived type of the KernelDispatchAgent (Agent.hh) interface should be implemented for all the different types of devices in the platform that one wants to run kernels on. A KernelDispatchAgent implementation controls an agent that processes packets from user mode queues (the Queue interface) and executes them.

An example implementation, CPUKernelAgent (CPUKernelAgent.hh, CPUKernelAgent.cc), implements a kernel agent that both processes queues and also executes kernel dispatch packets in the host CPU. The kernel functions are assumed to be loaded to the same process which are then simply called from a thread dedicated to processing queues.

In case of heterogeneous platforms, KernelDispatchAgent implementations orchestrate the execution with the agent device utilizing target specific communication and synchronization mechanisms. The division of responsibilities between processing the queues and executing kernels is target dependent. In case the device supports processing user mode queues directly, the implementation can be as simple as initializing the device in the constructor, and finalizing it in shutDown(). In some cases a host thread can process the user mode queues itself and only delegate single kernel packets to the agent.

class Queue (Queue.hh)

An interface for implementing user mode queues and soft queues. The default implementation is UserModeQueue (UserModeQueue.hh, UserModeQueue.cc) which should work for most purposes. It allocates the actual queue from a specified MemoryRegion and updates the different indices using gcc's _atomic* builtins.

class Signal (Signal.hh)

The concept of signals is opaque in HSA. This class implement the different hsa_signal* APIs for atomic variables that work globally in the heterogeneous platform at hand. The default implementation GCCBuiltinSignal (GCCBuiltinSignal.hh, GCCBuiltinSignal.cc) uses gcc's atomic builtins. This matches with the gcc BRIG frontend of which runtime library also uses them for accessing signal values in HSAIL.

class GCCFinalizer (GCCFinalizer.hh, GCCFinalizer.cc)

This class is an interface between phsa-runtime and the GCC's BRIG frontend. It calls a BRIG enabled GCC via command line to finalize the incoming HSAIL programs encoded as BRIG binaries. The default implementation links the ELF produced by GCC to a dynamic library that can be loaded to the current process via dlopen() for execution.

When porting the finalizer to a new GCC-supported device, this class should be derived and adapted such that the returned ELF binary is loaded correctly for the device at hand. The default CPU agent implementation serves merely as an example in that case.

During porting or bug hunting, it might become useful to have the gcc's intermediate files dumped from the compilation process for closer inspection. This behavior can be enabled by setting the environment variable PHSA_DEBUG_MODE to 1.