Unifying experiments

Grigori Fursin edited this page Aug 30, 2018 · 17 revisions

[ Home ]

This document can be outdated - please check our open ACM tournaments on co-designing efficient SW/HW stack for emerging workloads and the latest report on CK machine learning and autotuning workflows.

Table of Contents

Introduction

CK modules with JSON API and integrated web-service with the same JSON API helped us simplify and unify our experiments using machine learning-based and physics-based approach. Please check Getting Started Guide and this part on universal autotuning.

Basically we expose four main keys in the information flow within experimental pipelines:

  • design and optimization choices
  • multiple characteristics
  • hardware and software features
  • run-time state

Such unification, in turn, allowed us to implement a universal and multi-objective autotuner that takes as input any other module with a pipeline function, start exploring exposed choices, applies Pareto filter and complexity reduction (if needed), and records behavior in a local CK repository using experiment module as conceptually shown in the following figure:

We provided a module experiment in the ck-analytics repository which helps you record experiments in the CK repository, perform stat. analysis, visualize them, save to csv, render in HTML, etc. You can see available commands via

 $ ck pull repo:ck-analytics
 $ ck help experiment

Most of our higher-level experimental scenarios (including crowd-benchmarking, multi-objective autotuning and crowd-tuning) use this module.

Furthermore, it is possible to easily redirect information flow to any server with CK web service and thus crowdsource experiments. For example, we have CK web-service running on http://cknowledge.org/repo in the Microsoft Azure cloud - it continuously collects information about GCC/LLVM optimizations, SW/HW bug reports, etc from machines participating in our experiment crowdsourcing campaign (including from Android-based mobile devices).

Here is a list of our public experimental scenarios powered by CK:

You get pull those repositories to your system and check implementation of their modules to better understand concepts of unified and collaborative experimentation in the CK.

Next we provide some more details about crowdsourcing experiments.

Example of crowdsourcing program benchmarking and validating research ideas via CK

The universal CK artifact format and very simple JSON based API allows users to access all functionality as a simple web service. This in turn, opens up exciting opportunities for open, collaborative and reproducible R&D where researchers can share their experimental setups, validate their techniques by the community, and receive back to their open CK repositories unexpected behavior. Such behavior can be analyzed to improve research techniques (for example, analyzing scalability of parallel applications, building predictive models of SW/HW behavior possibly using active learning, checking run-time adaptation scenarios, etc).

Here we present two shared research findings from 2 papers which can be validated by the community (you may want to read these papers first for more details about our techniques):

Obtaining CK repository with shared experimental setup

You can obtain all artifacts and experimental setups in CK format as follows:

 $ ck pull repo:reproduce-ck-paper
 $ ck pull repo:reproduce-ck-paper-large-experiments

As usual, we store scripts to reproduce or validate technique in script module. You can list them via

 $ ck list reproduce-ck-paper:script:reproduce-filter-*

Testing our remote demo repository

We added to standard CK distribution remote-ck repository which transparently redirects all traffic to our pubic repository at http://cknowledge.org/repo

You can test it by loading shared test unicode entry via

 $ ck load remote-ck:test:unicode --min

Normally, you should be able to see a dictionary with some unicoded text. In such case, you can proceed with next steps.

Research scenario: analyzing program execution time variation versus CPU frequency

Very often authors in computer systems' research report only one performance (energy, cost, etc) number while forgetting (sometimes on purpose) about highly stochastic behavior of computer systems due to adaptive frequency changes, contentions, cold/hot cache, etc.

CK gradually enables reproducibility by allowing users observe and validate reported numbers or share unexpected behavior to detect run-time "states" in the system and fix common "program pipeline" (see our CPC'15 paper for more details).

In script:reproduce-filter-variation we demonstrate such example. We set CPU frequency to max value (on Windows, we ask users to set such value manually via power mode), run the same benchmark under the same conditions 10 times, analyze execution time variation via shared module math.variation and return expected values (possibly multiple suggesting that there are some unexpected run-time states). Then, we run the same code 10 more times but with lowest CPU frequency. Finally, we compare ratios of execution times and frequencies which should be around 1. Whenever results are unexpected, you will be asked to share results with us at cknowledge.org/repo.

You can participate in crowdsourcing CK-based experimentation via interactive script

 $ ck find reproduce-ck-paper:script:reproduce-filter-variation
 $ cd <above path>
 $ python reproduce.py

You can see current live results here.

Note, that you can reuse above script as a template to create your own CK-based experiments with possibility of validation by the community. However, rather than measuring frequency rations, you may analyze scalability or your algorithms, contentions or any other effect. You may just need to change remote-ck to your own remote repository in the script reproduce_analyze.json to, say, my-remote-repo and then interactively add it to CK via:

 $ ck add repo:my-remote-repo --remote
 $ ck start web

Also note, that if you add such remote repository entry to your own CK experimental repository which you plan to distribute to your colleagues, you may need to remind users to recache their local repositories to be able to properly register your remote repository in the CK via

 $ ck recache repo

Research scenario: analyzing different compiler flags for the same programs vs different data sets (to enable run-time adaptation)

Another common pitfall we observed in many papers on autotuning is reporting results just for one data set. On the other hand, it is often due to a lack of available multiple data sets or possibility to adding your own ones to existing benchmarks.

CK solves all above problems by providing an easy possibility to add and share your own benchmarks and data sets in a unified format while taking advantage of the unified benchmark compilation and execution workflow (aka CK experimental pipeline) - module program in ck-autotuning repository.

When using CK as an autotuning buildbot, our colleagues noticed that one of the very simple image B&W filter applications (similar to neural network threshold filter) had some unexpected speedups on different shared JPEG images (see details here). Later, we have also noticed that some images were takes from the surveillance camera during the day and some during the night thus allowing us to derive missing feature in the system - time of the day, and enable run-time adaptation for this statically compiled code. See our past and related research for more details:

Hence, we implemented a small script that attempts to check found GCC flags (if conversion optimization) versus 2 types of data sets on your local machine and hence with different compilers/hardware!

You can find and run this script interactively as follows:

 $ ck find reproduce-ck-paper:script:reproduce-filter-speedup
 $ cd <above path>
 $ python reproduce.py

If unexpected behavior is detected, user is asked to share results with us via cknowledge.org/repo. You can see latest shared results from such crowdsourced experiments here.

Note, that it is possible to compile and run this script for Android or MinGW (rather than automatically detected host OS such as Linux, MacOS X or Windows) simply via

 $ python reproduce.py --target_os=android19-arm
or
 $ python reproduce.py --target_os=mingw-32

You can see list of shared OS descriptions via:

 $ ck list os

Also note, that since this script (to calculate speed ups via expected values) became useful for other colleagues, we converted it into standard CK module program.experiment.speedup with JSON API, and shared it within ck-autotuning repository.

Finally, note that this script generates txt and html report in the same directory that we reused in the CK-based interactive report for this article. This interactive publication is also shared in reproduce-ck-paper repository to be used as template for your own shared interactive articles, and can be found via

 $ ck find dissemination.publication:cd11e3a188574d80

Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.