Skip to content
Alex Brown edited this page Aug 15, 2022 · 61 revisions

Tensile is a tool for creating a benchmark-driven backend library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on AMD GPU.

Overview for creating a custom TensileLib backend library for your application :

  1. Install the PyYAML, CMake, OpenMP, MessagePack, and other dependencies (mandatory), git clone and cd Tensile
  2. Create a benchmark config.yaml file in ./Tensile/Configs/
  3. Run the benchmark. After the benchmark is finished. Tensile will dump 4 directories: 1 & 2 is about benchmarking. 3 & 4 is the summarized results from your library (like rocBLAS) viewpoints.

0_Build: has a client exe, so you can launch from a library viewpoint.

1_BenchmarkProblems: has all the problems descriptions and executables generated during benchmarking, where you can re-launch the script (run.sh) to reproduce results.

2_BenchmarkData: has the raw performance results for all kernels, in csv and yaml formats.

3_LibraryLogic: has optimal kernel configurations yaml file. Usually rocBLAS takes the yaml files from this folder.

4_LibraryClient: has the code objects, kernels, and library code. This is the output of running TensileCreateLibrary using the 3_LibraryLogic directory as an input

  1. Add the Tensile library to your application's CMake target. The Tensile library will be written, compiled and linked to your application at application-compile-time.

Quick Example (Ubuntu):

sudo apt-get install python3 python3-yaml libomp-dev libboost-program-options-dev libmsgpack-dev llvm-6.0-dev
mkdir Tensile
cd Tensile
git clone https://github.com/ROCmSoftwarePlatform/Tensile repo
cd repo
git checkout master

You are now ready to run benchmarks using Tensile. A sample tuning file can be found in ./Tensile/Configs/rocblas_sgemm_example.yaml. This file generates a library for gfx1030. If you are running on a different architecture, you will first need to edit the line that says ArchitectureName: "gfx1030" at the bottom of the file. Then, you can run the benchmark as follows:

mkdir build
cd build
../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./

After about 1 minute of benchmarking, Tensile will output a yaml file with the winning kernels. This file contains the results of the winning kernels in the 3_LibraryLogic directory. Spreadsheets and yaml files with the Benchmark Data for all kernels are available in the 2_BenchmarkData directory. The client is built at the very beginning of the build, and is cached for future builds provided the output directory and client build files are unchanged. To use the client, do the following:

./0_Build/client/tensile_client -h
./0_Build/client/tensile_client --problem-size=5760,5760,1,5760 --library-file=path/to/TensileLibrary.yaml --code-object=path/to/*hsaco --code-object=path/to/*TensileLibrary.co --problem-identifier=Cijk_Ailk_Bjlk