Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
201 lines (147 sloc) 8.44 KB
the "hip" directory contains ports of rodinia applications using the new HIP API.
These can be compiled to CUDA machines (using NVCC), or AMD GPUs (using hcc).
Porting flow:
#1: Selecting a benchmark:
* See issues below if you want to select benchmarks that avoid known issues.
* Some benchmarks have large inputs.
* For this example we selected "b+tree".
* Assume we start in the base "hip" directory from a "git clone" command.
* Also assume we have the original rodinia source code available in ../rodinia_3.0/
#2: Copy the rodinia benchmark source into HIP/examples.
# info: the HIP examples directory contains only a subset of rodinia, built
# up as we add new workloads. So, the first step is to copy into HIP git
# the original source code (in cuda, opencl, openmp) and data.
# here are the commands:
> cp -r ../rodinia_3.0/cuda/b+tree/ examples/rodinia_3.0/cuda/
> cp -r ../rodinia_3.0/opencl/b+tree/ examples/rodinia_3.0/opencl
> cp -r ../rodinia_3.0/openmp/b+tree/ examples/rodinia_3.0/openmp
> cp -r ../rodinia_3.0/data/b+tree/ examples/rodinia_3.0/data
# Note the "data" directory may not exist for some benchmarks (some generate their own data automatically).
# add new files to git - do this now, while directories are clean:
> git add examples/rodinia_3.0/*/b+tree
> git commit -m "Add new rodinia sample b+tree"
#3: Make sure targetpaths work correctly:
All targets (cuda, opecl, openmp, hip) have the same make and run scripts:
Warning : some of the run commands run different inputs.
Also different targets have inconsistent output formats (some print info messages, timing, etc).
This makes it difficult to "diff" say OpenMP and CUDA outputs.
Suggest running each by-hand, and comparing to ensure the results are "reasonable".
In some cases it may be beneficial to modify source to ensure correct outputs are generated.
Here are commands for openmp - likely the most reliable:
#OpenMP:
> cd examples/rodinia_3.0/openmp/b+tree
> make
> ./run > run.out
> cd ../../..
#OpenCL:
> cd examples/rodinia_3.0/opencl/b+tree
> make
> ./run > run.out
> cd ../../..
#CUDA - make sure to check this one as it will be the base for the hip work:
# Obviously need to be on a CUDA box for this:
# See note below on cudaMemcpyToSymbol - some rodiniai samples may need minor changes to run correctly out-of-the-box.
> cd examples/rodinia_3.0/cuda/b+tree
> make
> ./run > run.out
> cd ../../..
#4: Set up hip tree (starting from CUDA, potentially modified)
> cp -r examples/rodinia_3.0/cuda/b+tree/ examples/rodinia_3.0/hip/
> git add examples/rodinia_3.0/hip/b+tree
> git commit -m "Add hip version of rodinia b+tree"
# Now create a directory to contain the original cuda source.
# strategy is to "hipify" from cusrc/some_file.cu to ./some_file.cu - this
> cd examples/rodinia_3.0/hip/
> mkdir cusrc
# Move any files containing cuda code here (these will be "hipified")
# other files may also need to be moved, search source for calls to "cuda*"
# or files that contain cuda kernels.
> mv *.cu *.cuh cusrc
# create a "hipcmd.ch".
# No specific template for this, but use ../hip/kmeans/hipcmd.sh as an example.
# Responsibility are:
# - make hip, am, grid_launch object files.
# - hipify cusrc/* files into ./ directory (keep file name the same)
# - call make
# modify Makefile :
# No specific template for this, but use ../hip/kmeans/Makefile as an example.
# - use "$(HIPCC)" and $(HIPCC_FLAGS) compiler to compile *.cu files.
# - use "$(HIPLD) as the linker (replace GCC or CUDA). Note on CUDA platforms
# HIPLD includes linker flags such as -lcuda -lcudart so these can be removed from Makefile
# and just use $(HIPLD).
# - make default target and executable name match name of rodinia test, ie "b+tree".
# - note dbg is handled in "make.config", so maybe can remove from Makefile here.
#
# Note supporting commands and files are defined in ../../common/make.config - ensure the
# hip/Makefile includes this file (likely already does).
#5: Create test infrastructure.
The "test" directory contains run commands and "golden" outputs for each workload.
This is not part of standard rodinia but is a useful enhancement.
Some small amount of setup needs to be done to use the test infrastructure:
- The common/make.config file contains make steps for testing, including "make test". Ensure that
make.config is included in the hip/Makefile.
- most rodinia tests contain a "run" script that runs the test, and also some tests have data input
files in data/TESTNAME. Copy the "run" script to tests/TESTNAME/run*.cmd, and copy/modify it
as desired to create multiple run*.cmd files. The test harness looks for 'tests/TESTNAME/run*.cmd' files
and will run all that are found.
When the test passes, add it to the list of tests in hip/Makefile.
#6: Collecting stats
- Track how long it takes to port the application using HIP tools.
- cusrc can be used to determine how many LOC have to be manually changed from the original cuda source.
Contributing to rodinia:
Modifications and comments should be done with the goal to contribute the changes back to rodinia source
as a updated version. The improvements that the update will provide:
* Add HIP paths for benchmarks. Portable path to CUDA and AMD.
* Update cuda benchmarks to work with cuda7.
* Add run and functional test framework.
Known issues, workarounds, and tips:
###
#1 rodinia uses deprecated "string" form of cudaMemcpyToSymbol.
In modern versions of CUDA the first parameter to cudaMemcpyToSymbol is the symbol reference ; in older
versions the symbol was passed with surrounding "". Rodinia uses the now-deprecated syntax and
as a result the cudaMemcpyToSymbol returns an error and does not actually perform the copy. The
fix is to remove the surrounding quotes. In cases where this occurs, this should be modified in
both the cuda and
###
#2 cudaMemcpyToSymbol is not yet supported by HIP.
The workaround is to use cudaMemcpy to initialize the symbol, and to pass the symbol to
the PFE kernel as an additional parameter. The kernel code that references the constant
should reference the new parameter.
The workaround code should be bracketed with #ifdef USE_CONSTANT_BUFFER ; this allows
the workaround to be easily removed for testing purposes (ie to run cuda code).
###
#3 cuda Textures are not yet supported by HIP. The workaround is to pass symbols directly to kernel
#ifdef USE_TEXTURES to remove calls to cudaCreateChannelDesc, cudaBindTexture.
Replace this with calls to cudaMemcpy, pass parameters to the kernel.
Inside the kernel, replace calls tex1Dfetch with direct load accesses, referencing the new parameter.
Example from hip/kmeans/kmeans_cuda_kernel.cu:
#ifdef USE_TEXTURES
float diff = (tex1Dfetch(t_features,addr) -
#else
float diff = (features[addr] -
#endif
###
#5 Convert kernel header convention to pass hipLaunchParm as first parameter to the kernel.
See kmeans/kmeans_cuda_kernel.cu for an example.
###
#6 Some rodinia files do not compile with clang or generate format warnings for printf. For
these, fix the code in cusrc/*.
###
#7
Some rodinia code assumes multiple GPUs are present, and may call 'cudaSetDevice(1)" even if the system
has a single GPUs. hipSetDevice returns an error, but this may be ignored. Change the Rodinia code to
get the device count (cudaGetDeviceCount) and ensure a valid device id is passed to cudaSetDevice.
###
#8
What to do if the program compiles but doesn't run correctly?
Be conservative in synchronization:
- Set HIP_LAUNCH_BLOCKING=1
This will make HIP APIs 'host-synchronous', so they block until any kernel launches or data copy commands complete.
If this fixes the problem, it can indicate that an asynchronous command is referencing stack data,
or that the necessary synchronization is not being applied.
Trace the APIs and return codes to see if any HIP APIs return non-zero error codes.
- Set HIP_TRACE_API=1
For a list of other HIP environment variables, set HIP_PRINT_ENV=1.
###
#10 OpenMP support - recent versions of HCC include support for OpenMP 4.0 directives (which target the CPU.)
Older versions of HCC might generate errors when openmp files were included or openmp pragmas were used.
You can’t perform that action at this time.