Autotuning

Grigori Fursin edited this page Aug 30, 2018 · 26 revisions

[ Home ]

This page may be outdated - please check our latest crowd-tuning report sponsored by the Raspberry Pi foundation and compiler autotuning wiki page.

Table of Contents

Introduction

Before using autotuning in CK you should first read Getting Started Guide, Portable-workflows, and introduction to CK-powered compiler flag autotuning.

The main idea of CK is to decouple multi-objective autotuning, design exploration machine learning and run-time adaptation from underlying platform (compilers, libraries, OS, hardware) and programs, and just treat them as some physical objects with unified API (our original background was in physics, quantum electronics and machine learning) as described in P1, P2 and P3.

The community can share workloads, exploration tools, machine learning models, data sets, compilers, hardware descriptions as reusable components with a unified JSON API and JSON meta information, and then assemble various research workflows such as autotuning and predictive modeling as LEGO(tm)!

We have implemented this approach and successfully used it in many projects shared as reproducible, reusable and extensible CK workflows via GitHub:

If you have any questions or suggestions, please feel free to get in touch with the community via this public mailing list or LinkedIn group.

Examples of customizable autotuning workflows

Obtaining the workflow and autotuning plugins

We share our code and data as reusable components with JSON API and meta-description packed in so-called CK repositories. CK repositories can be shared via GitHub or any other Git repositories.

For example, we have converted most public programs used in our papers on program autotuning into the CK repository ctuning-programs available at https://github.com/ctuning/ctuning-programs. You can install this repository as follows:

 $ ck pull repo:ctuning-programs

Note that CK will also automatically and recursively install all the dependent CK repositories:

  • ctuning-datasets-min (minimal data sets required for the above programs)
  • ck-autotuning (CK modules to perform autotuning)
  • ck-env (CK modules to manage multiple versions of tools, libraries, etc.)
  • ck-analytics (CK modules to perform statistical analysis of experiments, visualization and machine learning)

You may also wish to install the CK repository for viewing entries, experiments and graphs in a user-friendly format via any web browser (useful to discuss results in workgroups, etc):

 $ ck pull repo:ck-web

To view examples in the CK web front-end please visit:

To update all these repositories at once, use:

 $ ck pull all

or to update together with the CK kernel:

 $ ck pull all --kernel

Understanding the CK repository structure

Before running the shared autotuning experimental workflow, we briefly introduce the structure of a CK repository.

You can find the installed CK repositories in the following local directories:

  • $HOME/CK (on Linux)
  • %USERPROFILE%\CK (on Windows)

You should now see at least the following sub-directories there:

  • ck-analytics
  • ck-autotuning
  • ck-env
  • ctuning-datasets-min
  • ctuning-programs

Each directory has its own .ckr.json file with a unique identifier (UID) of the given repository, its alias, notes, URL (if shared) and dependencies on other repositories.

In CK, each repository is associated with an API module that is used to describe and access its artifacts. This module, to some extent, becomes a container for the artifacts. For example, all programs are associated with a module called program, all data sets - with a module called dataset, etc. Therefore, each repository has the structure repo_dir/module_dir/data_dir. This structure protects user experimental setups from changes in software and hardware: the module deals with such changes by setting the environment for multiple versions of a given tool or hardware.

For example, in ctuning-programs/program, you can find sub-directories (so-called CK entries) with shared benchmarks that we used in our past research, including cbench-automotive-susan; in ctuning-datasets-min/dataset, you can find sub-directories with shared data sets for the above programs, including image-pgm-0001/data.pgm which can be used as input for cbench-automotive-susan.

Finally, each CK entry contains any user files/sub-directories/archives in any format together with .cm sub-directory (Collective Meta) which describes the entry and connects it with other modules/entries via meta.json and info.json files.

Note that all modules and entries always have an assigned unique ID (and an optional alias): 16 lower case hexadecimal characters. For example, CK data entry with UID ffbf51c23a91343c has alias cbench-automotive-susan, while CK data entry with UID b2130844c38e4a56 has alias image-pgm-0001.

Whenever alias is used for an entry name instead of UID, a '.cm' sub-directory is created at the same level the entries' directory with two text files per entry:

  • .cm/alias-a-${alias}, containing UID.
  • .cm/alias-u-${UID}, containing alias.

For example, if you find the cbench-automotive-susan entry:

 $ ck find program:cbench-automotive-susan

and change to its parent directory containing all the program entries, then in the .cm sub-directory you will find two text files:

  • .cm/alias-a-cbench-automotive-susan containing ffbf51c23a91343c
  • .cm/alias-u-ffbf51c23a91343c containing cbench-automotive-susan

While this may seem somewhat redundant, it allows us to considerably speed up searching for entries by the so-called UOA (UID Or Alias) where both UID and alias can be used interchangeably to refer to the same entry.

Furthermore, this allows us to directly reference and quickly locate any CK entry via its Collective ID (CID), a colon-separated combination of its module UOA and its entry UOA: CID=< module_UOA>:<data_UOA>. In a sense, CID is similar to DOI (Digital Object Identifier), though decentralized.

For example, you can locate cbench-automotive-susan by any of the four possible combinations of its CID:

 $ ck find program:cbench-automotive-susan
 $ ck find b0ac08fe1d3c2615:ffbf51c23a91343c
 $ ck find b0ac08fe1d3c2615:cbench-automotive-susan
 $ ck find program:ffbf51c23a91343c

In this directory, you will see the program's source files, possibly build scripts, etc.

Similarly, you can locate dataset:image-pgm-0001:

 $ ck find dataset:image-pgm-0001

where you will find the data.pgm image that we often used as input to the cbench-automotive-susan program in our autotuning research.

Interestingly, we consider and preserve CK modules inside repositories too, just like any other CK entry, i.e. by using the module module (UID=032630d041b4fd8a). For example, the program module in the ck-autotuning repository can be found using:

 $ ck find module:program

This directory contains module.py with a unified API to access program entries and to implement some common functions (such as compile or run).

The aim is to eventually convert local project directories with ad-hoc files (which are usually scattered all the place and easily get lost) into commonly structured, interlinked repositories, each with their own UID and meta-description.

To reiterate, CK is just a small (~200 KB of Python code) library with a unified API and some commonly used functions to help users manage their data and execute modules.

More importantly, we use human-readable and schema-free JSON stored natively in a file system to be able to operate directly on files and descriptions without being locked into third party and possibly proprietary tools.

Furthermore, we can easily share the whole repository including data and code using existing and popular sharing technology such as Git, and public/private servers such as GitHub or Bitbucket.

In contrast with Virtual Machine or Docker monolithic images, we provide lightweight portable containers which can be reused and extended via common API on Windows or Linux systems. In fact, we envision that CK can complement VM or Docker technology, particularly for sharing reusable components along with publications.

We have gradually converged on this repository structure after many years of R&D. Interestingly, this repository structure has not undergone any major changes in the last 5 years, even though over the same period we have developed from scratch two frameworks to make use of it (Collective Knowledge and its backward-compatible precursor Collective Mind).

Compiling and running shared program

We are now ready to demonstrate how to reuse shared modules and data to easily compile and run susan benchmark on any user platform, and then autotune it with execution time and size using GCC compiler flags.

Besides using CK module program as a container to share program source code, we also implemented a simple workflow (pipeline) to compile and run any program following instructions provided in JSON meta-description.

You can check susan's meta-description via

 $ ck load program:cbench-automotive-susan
or
 $ ck find program:cbench-automotive-susan
and then check [entry]/.cm/meta.json

You can now compile susan benchmark simply via

 $ ck compile program:cbench-automotive-susan --speed

By default, susan will be compiled using the available GCC compiler with the default optimization flag that should minimize the execution time (usually "-O3").

Note, that cbench-automotive-susan uses OpenME plugin framework (CID=package:lib-rtl-xopenme) to instrument code, expose various run-time information (such as image size), and collect kernel/function execution time (to be able to analyze tune programs on fine-grain level) which will be eventually recorded into pipeline JSON output.

This library is included in ck-autotuning repo and will be installed at first use in $HOME/CK-TOOLS directory (user will be asked to install it). We will provide more details about how different versions of tools and libraries are supported in CK further.

If compilation succeeded, you should see similar to the following string at the end:

 Compilation time: 5.200 sec.; Object size: 70916; MD5: 9b58b9d55585ca5bd19a5e75f448bb14

It is also possible to add compilation flags via

 $ ck compile program:cbench-automotive-susan --flags="-O3 -fno-if-conversion"

Note, that when using command line interpreter you switch your current directory to the one of the cbench-automotive-susan (ck find program:cbench-automotive-susan), you can omit module and data alias, and compile program simply as:

 $ ck compile --speed

Now, it is possible to run compiled program via

 $ ck run program:cbench-automotive-susan
or simply as follows if your current directory is within CK entry cbench-automotive-susan:
 $ ck run

Note, that module program will read meta-description of cbench-automotive-susan and attempt to assemble command line using meta["run_cmds"] dictionary. If more than one command line is available (there are three algorithms in this benchmark and hence three command lines to detect edges and corners or smooth a given image), user will be asked to select a given command line.

Next, if CK data set is required (such an image), it will be automatically searched via tags, also specified in meta["run_cmds"]. If more than one data set if found, user will be also asked to select a given data set.

It is possible to define command line and data set via command line (for automation) as follows:

 $ ck run program:cbench-automotive-susan --cmd_key=edges --dataset_uoa=image-pgm-0001

Such approach allows users to add, change, customize, extend and share any other program, data set, compiler (GCC, LLVM, ICC), flags and parameters (OpenCL/CUDA/MPI parameters, CPU/GPU frequency, and so on) simply via JSON meta description, as shown in the next sections.

We also provided a way to easily share notes about any entry in CK using

 $ ck wiki [module UOA]:[data UOA]
For example, you can check notes about cbench-automotive-susan via
 $ ck wiki program:cbench-automotive-susan

Also note, that some programs, such as cbench-automotive-susan, support execution time calibration, i.e. automatically increasing the number of repetitions of the most time-consuming kernel (exposed by programmer) via environment variable CT_REPEAT_MAIN until execution time is around 5 seconds. We use it to reduce OS noise in small programs (however we need to be careful since it may also alter memory hierarchy utilization across iterations).

Assembling experimental workflows from unified and shared components

Having such unified and shared components with code, data, meta information and schema-free JSON API allow users to quickly prototype their ideas as LEGO(TM) rather than wasting time on (re)developing own infrastructure.

Furthermore, it may help solve an old and well-known problem of a lack of representative benchmarks and data sets available to researchers - now users can have an access to a growing number of real applications, kernels and data sets from the community. Researchers can also use predictive analytics to find representative sets for their own scenarios (see P1, P2 and P3).

For our research on making performance/energy/size autotuning and modeling practical, we implemented a universal program pipeline (i.e. function pipeline in module program). We gradually extend it whenever new technology is being available or whenever users report some unexpected behaviour that has to be fixed (see P1, P2).

This pipeline can be executed via:

 $ ck pipeline program:cbench-automotive-susan --speed

It resolves all dependencies, asks which command line and data set to use, compiles program with the best available flag using GCC (or other compilers registered with CK), and runs program.

For example, to systematize computer engineering, we use top-down methodology originating from physics, where we gradually expose from the JSON information flow the following keys:

  • choices - available design and optimization choices (compiler flags, OpenCL parameters, hardware configurations, run-time info, algorithm precision, ...
  • characteristics - measurable characteristics such as execution time, energy, code size, error rate, hardware cost, ...)
  • features - software, hardware and data set properties that are usually not changing (image size, number of instructions in a loop, cache size, etc)
  • state - run-time state of the system (state of the network, cold/hot cache, etc)
  • dependencies - information about software dependencies (resolved via special semantic tags)

Note, that in our concept, schema is added to JSON only when module becomes relatively stable or idea is validated by the community. Nevertheless, we are able to keep backward compatibility for modules during agile developments by adding new keys and later dropping old ones (unifying interfaces by the community to some extent similar to Wikipedia).

To provide schema, we use our own flat key format that is able to reference any key in the complex JSON hierarchy as one string. Such flat key always starts with # followed by #key if it is a dictionary key or @position_in_a_list if it is a value in a list. Using these flat keys, users can gradually describe API as shown in figure above.

We also convert characteristics, features and choices to vectors using flat format to be able to immediately use predictive analytics.

Preparing program workflow (pipeline) for autotuning

Above organization of experiments in pipelines (research workflows) with exposed characteristics, choices, features and a state helped us unify, simplify and accelerate our own research on machine-learning based performance modeling, autotuning and run-time adaptation.

We can now develop higher-level modules to automatically explore available choices, monitor characteristics, apply Pareto frontier and expose information flow to existing statistical analysis, classification and predictive modeling tools including R and SciPy, as conceptually shown below.

This allows us to (collaboratively) explore large design and optimization spaces, model behaviour of various components of computer systems, correlate multiple characteristics with choices and features, and predict better optimizations or hardware designs when enough knowledge is collected (as described in 1 and 2).

For this purpose, we developed a new CK module, pipeline, which has 2 main functions:

  • run
  • autotune

The first function serves as a high-level wrapper for any sub-module which also has pipeline function with a related API. This wrapper unifies input and output using special keys such as "characteristics","choices","features","state", etc (see 1, 2 and 3 for more details). It also preset system to a given state (if needed), runs a given pipeline multiple times, perform statistical analysis on empirical characteristics, report high discrepancies, report system changes during execution (such as CPU or GPU frequency), and so on, thus serving as a unified experimental engine.

That is why we added such pipeline function to program module to unify benchmarking and autotuning experiments, i.e. you can compile and run program cbench-automotive-susan multiple times using the following command:

 $ ck run pipeline:program program_uoa=cbench-automotive-susan --speed 

The autotune function takes various parameters as input describing

  • keys in choices dictionary to explore
  • exploration strategy (exhaustive, random, machine-learning based (requires extra module - under development))
  • which characteristics to tune
  • how many times repeat the same experiments and which statistical analysis to apply (min, max, mean, expected values via histogram)
  • whether to apply Pareto frontier filter and which characteristics to use (execution time, energy, code size, errors, costs)
  • where to record experimental results.

To demonstrate basic autotuning, we decided to implement an old but yet unsolved compiler flag tuning scenario. For simplicity, we added all Linux scripts to the entry demo:autotuning-compiler-flags-susan-linux in ck-autotuning repository (there are also demos available for Windows or exploring behaviour of various kernels including OpenCL-based versus multiple data sets Paper).

You can find this entry simply as:

 $ ck find ck-autotuning:demo:autotuning-compiler-flags-susan-linux
or just
 $ ck find demo:autotuning-compiler-flags-susan-linux

Note, that in order to find any entry in CK, it is also possible to explicitly specify repository alias as above. Furthermore, whenever new repositories are added to the CK, they also get their own entry inside default CK repository under repo module. Hence, it is possible to see all repositories registered in CK via

 $ ck list repo

Don't forget that it is possible to start compiler flag autotuning of a given program simply as follows (as briefly described in Getting Started Guide, part II):

 $ ck autotune program:cbench-automotive-susan

However, here we want to provide lower-level details about JSON API.

In order to use autotuning in CK, one needs to "set up" program experimental pipeline as JSON input, i.e. resolve all dependencies and prepare the default set of all parameters. Autotuning module will be then just changing various parameters in this JSON input and rerun experiments.

It is possible to set up such pipeline for cbench-automotive-susan and edges algorithm via

 $ ck pipeline program:cbench-automotive-susan --cmd_key=edges --prepare --save_to_file=_setup_program_pipeline_tmp.json

You will be asked one question about which data set to use - any can be selected.

If pipeline was prepared successfully (i.e. no error messages), a pipeline input file _setup_program_pipeline_tmp.json will be created.

It is now possible to execute this pipeline via

 $ ck run pipeline:program pipeline_from_file=_setup_program_pipeline_tmp.json --repetitions=4 --save_to_file=$PWD/_program_pipeline_output_tmp.json
where it is possible to optionally specify number of repetition of an experiment (if more than 1, statistical analysis will be performed via math.variation module), and record pipeline output with updated characteristics, features and state to _program_pipeline_output_tmp.json.

Note, that --repetitions and --save_to_file are optional. Also, note that --save_to_file takes full path to the output file, otherwise it will be created in the tmp directory of a compiled program. You may hence want to substitute $PWD (returns current directory in Linux) with your own full path. If you use Windows, you should use %CD% instead (or full path).

The output JSON file has 3 main keys:

  • experiment_desc - to describe pipeline and default parameters (for reproducibility)
  • last_iteration_output - pipeline output from the last statistical repetition of an experiment
  • last_stat_analysis - statistical analysis applied to all characteristics

It is possible to check whether pipeline failed or not via:

  • output['last_iteration_output']['fail'] - 'yes', if pipeline failed
  • output['last_iteration_output']['fail_reason'] - string describing problem with pipeline execution (if pipeline failed).

If pipeline did not fail, output['last_stat_analysis']['dict_flat'] contains a list of all characteristics in flat format with appended #min, #max, #mean, #center #halfrange, #exp, and others (see JSON API via ck analyze math.variation --help).

Below is example of the whole program execution time after statistical analysis:

  "last_stat_analysis": {
    "dict_flat": {
      "##characteristics#run#execution_time#all": [
        0.05864406779661017, 
        0.0595593220338983, 
        0.05627118644067796, 
        0.053898305084745766
      ], 
      "##characteristics#run#execution_time#all_unique": [
        0.05864406779661017, 
        0.0595593220338983, 
        0.05627118644067796, 
        0.053898305084745766
      ], 
      "##characteristics#run#execution_time#center": 0.056728813559322036, 
      "##characteristics#run#execution_time#exp": 0.056414312617702446, 
      "##characteristics#run#execution_time#exp_allx": [
        0.056414312617702446
      ], 
      "##characteristics#run#execution_time#exp_ally": [
        12.954426003376366
      ], 
      "##characteristics#run#execution_time#exp_warning": "no", 
      "##characteristics#run#execution_time#halfrange": 0.002830508474576266, 
      "##characteristics#run#execution_time#max": 0.0595593220338983, 
      "##characteristics#run#execution_time#mean": 0.05709322033898305, 
      "##characteristics#run#execution_time#min": 0.053898305084745766, 
      "##characteristics#run#execution_time#range": 0.005661016949152532, 
      "##characteristics#run#execution_time#range_percent": 0.10503144654088031, 
      "##characteristics#run#execution_time#repeats": 4, 

In order to customize program pipeline execution, it is possible to directly modify JSON input file and change pipeline parameters using choices keys. Below is an example of choices of a prepared pipeline under Windows with MinGW:

  "choices": {
    "cmd_key": "edges", 
    "compile_type": "dynamic", 
    "data_uoa": "cbench-automotive-susan", 
    "dataset_uoa": "image-pgm-0001", 
    "device_id": "", 
    "host_os": "windows-64", 
    "module_uoa": "b0ac08fe1d3c2615", 
    "target_os": "mingw-64", 
    "target_os_bits": "64"
  } 

For example, users can change different data set via dataset_uoa. Other choices will be explained further in this getting started guide.

Also note that not all choices are pre-defined here. Additional choices can be described in flat format under choices_desc key.

For example, program pipeline attempts to detect used compiler and its version (for example, GCC), and then pre-loads available optimization flags in choices_desc:

  "choices_desc": {
    "##compiler_flags#base_opt": {
      "choice": [
        "-O3", 
        "-Ofast", 
        "-O0", 
        "-O1", 
        "-O2", 
        "-Os"
      ], 
      "default": "", 
      "desc": "base compiler flag", 
      "sort": 10000, 
      "tags": [
        "base", 
        "basic", 
        "optimization"
      ], 
      "type": "text"
    }, 

    ...

Above example means that it is possible to set choices['compiler_flags']['base_opt'] to one of the following values (-O3,-O2,-O1,-O0,-Os,-Ofast).

Adding and modifying description of compiler flags for autotuning

Program pipeline attempts to detect compiler version and then find an associated CK entry under the compiler module with a full description of compiler flags:

 $ ck list compiler

Note, that we added automatic extraction of compiler flags from GCC sources using the following command (please ask us for more details if needed, since it's not yet very user friendly and required GCC sources installed somewhere):

 $ ck extract_opts compiler

You can see an example of automatically extracted optimization flags for GCC 5.2.0 in CK live repo or in GitHub (desc.json). If our automatic flag extraction produces wrong choices, it is possible to manually patch those files and share improvements back via GitHub (community-driven research similar to Wikipedia).

You can find full descriptions of automatically extracted compiler flags in the following CK entries:

 $ ck list compiler:gcc-*-auto

Rather than automatically extracting flags, you can manually add an entry for a new compiler version by copying the most close entry and updating its meta, for example creating meta for new GCC version 7.0.0 from 6.1.0:

 $ ck cp compiler:gcc-6.1.0-auto :gcc-7.0.0-user
 $ vim `ck find compiler:gcc-7.0.0-user`/.cm/desc.json

Note, that we skipped module name in the second parameter (:gcc-7.0.0-user). In such case CK reuses module from the first parameter, i.e. compiler.

You can simply add or remove various flags from the "choices" key in the meta.

In order to reuse the new description of compiler flags for your experiments, you need to setup experimental pipeline from scratch via:

 $ rm _setup_program_pipeline_tmp.json
 $ ck pipeline program:cbench-automotive-susan --cmd_key=edges --compiler_description_uoa=gcc-7.0.0-user --prepare --save_to_file=_setup_program_pipeline_tmp.json

Note, that gcc-7.0.0-user entry was created in your local repository and not in ck-autotuning to avoid polluting shared repositories. However, if you would like to share your new compiler description with the community (for example, for some specific ARM or Intel architecture), you can explicitly copy original entry to the ck-autotuning repository.

 $ ck copy compiler:gcc-7.0.0-user ck-autotuning::gcc-7.0.0-user

After that you can go to the ck-autotuning repository, commit changes and push them to GITHUB via (if you have writing rights for this repository, otherwise create a pull request) via:

 $ ck where repo:ck-autotuning
 $ cd `ck where repo:ck-autotuning`
 $ git commit .cm/*
 $ git commit compiler/*
 $ git push

Note, that at this stage researchers can update their repositories via ck pull all and immediately take advantage of the new compiler description in their program pipelines!

Autotuning program and recording experiments via CK

Above organization of experimental workflows (pipelines) allow us to separate autotuning from the choices and characteristics, i.e. we can perform universal autotuning of any choices and monitor any characteristics that pipeline (researchers) exposes. At the same time, users can gradually add various universal exploration strategies as extra CK modules (plugins).

It is possible to perform automatic tuning of global compiler flags (GCC by default) for cbench-automotive-susan by preparing the following autotuning scenario in the autotuning_configuration.json file:

 $ cat > autotuning_configuration.json

 {
  "choices_order": [
    ["##compiler_flags#*"]
  ],
  "choices_selection": [
     {"type":"random", "omit_probability":"0.90", "tags":"basic,optimization", "notags":""}
  ],
  "pipeline_update":{
     "repeat":200,
     "select_best_base_flag_for_first_iteration":"yes"
   },
  "seed":12345,
  "iterations":10,
  "repetitions":3,
  "record":"yes",
  "record_uoa":"demo-autotune-flags-susan-linux-i10",
  "tags":"my experiments,autotuning,compiler flags,gcc 5.1",
  "record_params": {
    "search_point_by_features":"yes"
  },
  "features_keys_to_process":["##choices#*"]
 }

 Ctrl^D

and then start autotuning via

 $ ck autotune pipeline:program pipeline_from_file=_setup_program_pipeline_tmp.json @autotuning_configuration.json

Here, the choices_order key specifies the list of pipeline dictionary keys that the program will be tuned on. Wildcards are supported, so ##compiler_flags#* means that the compiler flags will be tried in no particular order (otherwise, we would need to add each key as separate sub-list).

  • choices_selection specifies exploration strategy for each sub-choice in choices_order. In above example, we choose random search strategy with 10% probability to select a given flag having the tags basic,optimization. If a flag is selected and is parametric, the parameter will also be randomly selected from its specified range (see the description of flags of available compilers: ck list compiler).
  • pipeline_update allows to override input pipeline parameters that were predefined during the preparation step. Here we fix to repeat the kernel of the susan benchmark 200 times, and force selection of the "best" flag (for example, "-O3" for GCC) during the first iteration to be able to detect speed ups during autotuning versus the "best" optimization.
  • seed specifies a random seed be able to reproduce exact autotuning experiments.
  • iterations sets the number of autotuning experiment iterations (different combinations of compiler flags to try).
  • repetitions sets the number of statistical repetitions of a single autotuning experiment.
  • record - if set to 'yes', experiments will be recorded in a repository (local by default). Experiments are recorded under the experiment module in the ck-analytics repository.
  • record_uoa - if record=="yes", this key can explicitly specify an entry alias. If omitted, entries will be found via meta information useful for crowdtuning programs across numerous compilers and hardware - it will be described later in the ''crowdtuning experiments'' section. For example, it is possible to add sub-dictionary meta explicitly describing host and target OS, architecture, program, compiler, etc.
  • tags - these comma-separated tags will be added to the entry to be able to find experiments by these tags, i.e.
 $ ck find experiment --tags="compiler flags, gcc 5.1"
  • record_params - customizes recording procedure. For example, multiple experiments can be recorded in a given experiment entry (individual exploration points). To avoid duplicating points with the same choices (i.e. compiler flags in our case), it is possible to set "search_point_by_features"="yes", to find similar points by some features (see next key), aggregate result and perform statistical analysis.
  • features_keys_to_process - specifies a list of features (flat keys including wildcards) that CK will use to find similar sub-experiments in CK repository and update these sub-experiments, instead of adding duplicate exploration points. In our example, we will find in repository that an experiment with the same set of flags was already performed, and rather than adding a new point, we will update an existing one and will perform new statistical analysis on an updated point.

Note that, by default, experiments will be recorded in a local CK repository. You can find recorded experiment entries (with full CID) via

 $ ck list experiment --print_full

However, users may want to create multiple repositories for different experiments just as they would create various directories with some ad-hoc files. It is possible to create a new repository simply as:

 $ ck add repo:my-repo --quiet
This repository will be created in $HOME/CK/my-repo.

It is also possible to create a CK repository in any active user directory via:

 $ ck add repo:my-repo --here --quiet

Now, it is possible to explicitly record experiments to such repository via key

  • "record_repo":"my-repo"

It is possible to list experiments only in this repository via:

 $ ck list my-repo:experiment:* 

You can find more details about managing repos here.

Full JSON API for autotuning is available via:

 $ ck autotune pipeline --help

Note, that various search strategies in CK are implemented via choice module:

 $ ck find module:choice

One of the main idea behind CK is to let community provide modules (plugins), tools, programs and data sets to extend autotuning scenarios and eventually cover the whole software and hardware stack.

For example, we are gradually converting our compiler autotuning plugins from Collective Mind/MILEPOST projects to enable fine-grain autotuning on function, kernel and loop level. We are also adding our adaptive, machine-learning based exploration strategies for performance modeling or others' techniques with the help of the community.

Note, that you may find various exploration examples in the following CK demo entry:

 $ ck find demo:plugin-based-autotuning-engine-demo

Understanding experimental format

Experiments are recoded inside a given experiment entry as separate points. Each point is automatically assigned its own UID and is described by a set of files:

  • ckp-UID.<4 digit sequential number of a statistical repetition of a given experiment>.json - contains a given pipeline output as JSON
  • ckp-UID.features.json - describes features of a point that can be used to find the same experiment, aggregate experimental results and perform statistical analysis
  • ckp-UID.flat.json - output of a statistical analysis module in flat vector format mentioned above. This file is used in all further processing and visualization of results

It is possible to list all available points in a given experiment entry via

 $ ck list_points experiment:<experiment UOA> 

For example, it is possible to list all points for above compiler flags autotuning via

 $ ck list_points experiment:demo-autotune-flags-susan-linux-i10

We have found such organization of experiments via native file system and schema-free, extensible JSON file very convenient for quick prototyping of ideas and sharing of results either via CK or even via standard OS tools, instead of spending lots of time dealing with MySQL (or any other database) queries, specification, changes in tables, etc., and still duplicating information in ad-hoc CSV, TXT and HTML files. When you have validated an idea, however, you can always document an API, provide a data specification (for example, along with a publication), or even connect MySQL (or any other database) as a fast backend to CK.

Reproducing experiments

Note that above format has enough information to replay a give point of a given experiment (i.e. it contains JSON input that can be simply used as pipeline input).

We provided a special function to reproduce experiment via:

 $ ck reproduce experiment:demo-autotune-flags-susan-linux-i10 --point=<point_UID> --subpoint=<4 digit number>

Now it is possible either just to check that output is the same (replaying experiment) or validating varying characteristics (reproducing experiment). In the latter case, all characteristics will be compared with the recorded one and any difference of more than 10% (customizable via --threshold_to_compare) will be reported. This allows us to analyze unexpected behaviour and collaboratively improve experimental pipelines to improve reproducibility, i.e. adding to program pipeline detection/setting of CPU/GPU frequency, state of the network or cache, pinning of threads, etc as described conceptually in the following papers: 1, 2.

Furthermore, using CK components and workflow with API allows users even to play with parameters of shared artifacts such as say number of threads of an algorithm to check its scalability and thus going beyond just replicating results but really validating them and possibly sharing unexpected results with the authors.

You may find various additional examples of replaying experiments in replay* scripts and json files in

 $ ck find demo:autotuning-compiler-flags-susan-mingw

Applying Pareto filter during multi-objective autotuning

We provide a basic Pareto frontier module math.frontier for multi-objective optimization. This filter can be invoked at the end of a pipeline to only retain points lying on a multi-dimensional frontier.

To apply Pareto filter, customize the execution pipeline (the autotuning_configuration.json file above) via 2 keys:

  • frontier_keys - list of flat keys to leave only best points during multi-objective autotuning
  • frontier_features_keys_to_ignore - list of keys to remove from features_keys_to_process to be able to prepare a subset of points in a given entry to detect frontier (usually removing optimization dimensions, such as compiler flags, to get all sub-points with different compiler flags, but leaving other dimensions such as data set features).

Such approach simplifies and unifies multi-objective autotuning. Users can simply select characteristics to tune depending on their requirements (execution time vs energy vs code size for mobile systems; algorithm precision vs execution time vs energy when mobile battery is low; compilation time vs execution time for JIT; execution time and soft errors for HPC systems; etc).

For example, the following keys can be used to systematize benchmarking of compilers and their optimization heuristics for mobile systems: execution time vs code size

 {
  "frontier_keys":["##characteristics#run#execution_time#min",
                   "##characteristics#compile#binary_size#min"],

  "frontier_features_keys_to_ignore":["##choices#compiler_flags#*"]
 }

Depending on user requirements, the mean (mean) or expected (exp) values can be used instead of the minimum value (min). For example, the peak performance of a given program on a given platform can be estimated by using the minimum execution time, whereas typical behaviour is more readily indicated by the expected value (obtained after a sufficient number of statistical repetitions).

Also, more than one expected value may suggest that a program's behaviour has several "states" (similar to electrons in physics) which may in turn suggest that some features should be added to the pipeline to describe such states (such as, for example, the processor frequency).

Finally, note that current implementation of a Pareto filter is not optimal, i.e. we keep all points on a frontier. This may result in too many points being saved particularly when more than two characteristics are monitored. Ideally, we would like to keep only a minimal set of equidistant points on a frontier - this can be a possible extension project for a Google Summer of Code project.

Retrieving and visualizing experimental results

It is possible to retrieve data from experiment entries as table (Python list of lists) which can be, in turn, processed via various predictive analytics tools, convert to Pandas objects and so on.

It is possible to retrieve table with results via:

 $ ck get experiment: @get_points.json --out=json_file --out_file=output.json

where get_points.json has the following format:

 {
  "data_uoa_list":["demo-autotune-flags-susan-linux-i10"],
  "flat_keys_list":[
    "##characteristics#compile#binary_size#min",
    "##characteristics#run#execution_time_kernel_0#exp"
  ]
 }

Key data_uoa_list specify list of experiment entries to process, while flat_keys_list specifies values of which flat keys (can be with wildcards) to add to vectors in a table (list).

Output JSON will contain a retrieved table =

[ [value1, value2], [value1, value2], ...]

The full API with all parameters can be obtained via

 $ ck get experiment --help

Such tables and flat keys allow us easily visualize any experimental dimensions. We developed a module to plot graphs from tables via module graph or record them to files. We also provided various demos of plotting 2D scatter graphs, 2D bars, histograms, 2D heat maps, 3D graphs and so on in demo:graph*' entries. You can find them via

 $ ck list demo:graph*

Note, that this CK functionality requires matplotlib, math' and numpy Python packages. However, if you use Anaconda Scientific Python, you will have most of those Python packages already installed.

For example, it is possible to plot 2D scatter graph of code size vs execution time with variation via MatPlotLib with GUI as follows:

 $ ck plot graph: @plot_with_variation.json

where self-explaining sample of plot_with_variation.json is following:

 {
  "experiment_module_uoa":"experiment",
  "data_uoa_list":["demo-autotune-flags-susan-linux-i10"],
  "flat_keys_list":[
    "##characteristics#compile#binary_size#min",
    "##characteristics#run#execution_time_kernel_0#center",
    "##characteristics#run#execution_time_kernel_0#halfrange"
  ],
  "plot_type":"mpl_2d_scatter",
  "display_x_error_bar":"no",
  "display_y_error_bar":"yes",
  "title":"Powered by Collective Knowledge",
  "axis_x_desc":"Binary size",
  "axis_y_desc":"Execution time",
  "plot_grid":"yes",
  "mpl_image_size_x":"12",
  "mpl_image_size_y":"6",
  "mpl_image_dpi":"100",
  "point_style":{"1":{"elinewidth":"5", "color":"#dc3912"},
                 "0":{"color":"#3366cc"}}
 }

It is possible to save this graph (as well as CK table in JSON and CSV formats) to file by adding the following keys to the above JSON input file:

 {
  "out_to_file":"2d_points_time_vs_size.png",
  "save_table_to_json_file":"2d_points_time_vs_size_with_pareto.json",
  "save_table_to_csv_file":"2d_points_time_vs_size_with_pareto.csv"
 }

You can find more examples in

 $ ck find demo:autotuning-compiler-flags-susan-linux

It is also possible visualize experiments as tables, sort fields, obtain CMD to replay points, etc. via CK web service as follows:

 $ ck start web
 $ firefox http://localhost:3344/web?wcid=experiment:

Above web service can also be used to browse and search CK repositories:

 $ firefox http://localhost:3344

or viewing individual entries (including meta information, description and all files), such as image-pgm-0001 used in various autotuning projects together with susan image processing benchmark:

 $ firefox http://localhost/ck/repo/web.php?wcid=8a7141c59cd335f5:b2130844c38e4a56
 $ firefox http://localhost/ck/repo/web.php?wcid=dataset:image-pgm-0001

Note, that if a module has an action (function) html_viewer, it will be used to view above entries thus allowing to customize views. For example, it is used to enable interactive graphs, reports and articles.

Aggregating results in a remote CK repository

Unified JSON API in CK allows to create remote CK repositories to easily aggregate experimental results from multiple users (crowdsourcing experimentation).

CK by default includes remote-ck repository that redirects all requests to our pilot live repository at http://cknowledge.org/repo

For example, it is possible to list all modules available in this remote repository simply as

 $ ck list remote-ck:module:*

This repository contains most of our shared programs, data sets, experimental results as well as interactive graphs and reports embedded to html as widgets:

Preliminary notes on how to create interactive graphs and reports via CK are available here.

Note, that you can start your own CK web service simply via

 $ ck start web

You can then add a remote repository on your client machine as follows:

 $ ck add repo:my-remote-repo --remote --url=http://<host_name_of_your_server_above>:3344/ck? --quiet

Now you can record data to the remote repo as if it's any local CK repository:

 $ ck add my-remote-repo:test:xyz 

Customized autotuning: simplifying and automating all above steps

Based on user feedback, we also provided functionality to simplify customized autotuning. You can now describe your own autotuning scenario in a simple JSON file, perform autotuning, record results in the repository, plot and reproduce them with just a few simple CK commands.

For example, you can explore some combinations of compiler flags for a given program by creating the following JSON file my-autotuning.json:

{
  "experiment_1_pipeline_update": {
    "choices_order": [
      [
        "##compiler_flags#base_opt"
      ]
    ],
    "choices_selection": [
      {
        "notags": "",
        "choice": ["-Os","-O0","-O1","-O2","-O3"],
        "default": "-O3",
        "type": "loop"
      }
    ]
  },

  "repetitions": 1, 
  "seed": 12345, 
  "iterations": -1,
  "sleep":0
}

and then invoking the following CK command (on Linux, MacOS or Windows):

 $ ck autotune program:cbench-automotive-susan @my-autotuning.json --new --skip_collaborative --scenario=experiment.tune.compiler.flags --extra_tags=explore

Furthermore, you can use CK customized autotuner on top of your own script (which may compile and run applications or do something else). You can find such example in the following CK entry:

 $ ck find demo:customized-autotuning-via-external-cmd

You can find other demos of customized autotuning such as OpenMP thread tuning or batch size tuning in DNN engines (Caffe, TensorFlow) in the followin grepository:

 $ ck pull repo --url=https://github.com/dividiti/ck-caffe
 $ ck pull repo:ck-caffe2
 $ ck pull repo:ck-tensorflow

 $ ck list script:explore-batch-size-unified-and-customized --all

You can also check shared and unified auto/crowd-tuning scenarios to tune OpenCL-based BLAS libraries such as CLBlast, OpenBLAS threads, batch sizes, etc by invoking

 $ ck autotune program
and selecting appropriate autotuning scenario.

Adding new data sets and benchmarks for autotuning and crowd-benchmarking

Please check out these examples to add your own data sets, benchmarks, scripts, etc.

Detecting platform properties

We have also provided extensible modules to detect various properties of platforms (useful for machine learning based autotuning and workload characterization) including CPU,OS,GPU,GPGPU,NN,etc. You can find further details here!

Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.