Unifying predictive analytics

Grigori Fursin edited this page Jul 17, 2018 · 14 revisions

[ Home ]

Please do not forget to check CK getting started guide and Portable CK workflows.

Table of Contents


During our research on machine learning based compilation and optimizations we had a need to extract numerous features and evaluate many different models implemented in different frameworks. This was a daunting task particularly since frameworks and APIs were changing all the time, were often not compatible with each other, used different formats, and were not automated.

One of the reasons to implement CK framework was to unify and automate this process across different platforms and operating systems including Linux, Windows and MacOS (and Android via CK web services).

We therefore implemented a universal machine learning module model with a JSON API and with with 3 functions:

  • build - building (training) a predictive model
  • validate - cross-validating (testing) new model
  • use - using model to predict values or classes

It is available in the ck-analytics repository and attempts to unify various available regression and classification algorithms under one JSON API which users can embed into their pipelines. You can install it as follows:

 $ ck pull repo:ck-analytics

The idea is to expose features, choices, characteristics and a state in all CK components in a unified way. In such case, it is possible to model behavior within CK workflows (pipelines) as a function of features, choices and the state.

As a practical use case, we applied this approach to compiler autotuning and crowd-tuning (see interactive CK report sponsored by the Raspberry Pi foundation) and benchmarking, autotuning and co-design of Pareto-efficient software/hardware stack for deep learning (see report from ACM ReQuEST tournaments).

Furthermore, such approach helps to crowdsource learning and optimization across multiple devices provided by volunteers.

Next we will describe several demos including how to build a simple decision tree using CK, how to validate it, and how to use it to predict new values.


Besides CK you may need to install the following Python packages: scikit-learn, matplotlib, numpy pandas:

 $ pip install scikit-learn matplotlib numpy pandas

You also need to install dot tool from Graphviz and add it to PATH if you would like to visualize decision trees.

We plan to automate installation of these dependencies via CK packages soon.

If you plan to use TensorFlow, you can already install it via CK packages from ck-tensorflow repository:

 $ ck pull repo:ck-tensorflow
 $ ck install package --tags=lib,tensorflow

Finally, if you plan to use models via R, you need to install R language.

CK demo entries

You can find available CK demo entries as follows:

 $ ck search ck-analytics:demo:ml-* | sort
  • ml-decision-tree - decision tree predicting 1 class (true/false)
  • ml-decision-tree-multi - decision tree predicting multiple classes
  • ml-dnn-classifier-multi - DNN classifier (currently via TensorFlow) predicting multiple classes

You can then find a path to a given demo entry as follows:

 $ ck find demo:{name from above list}

Each CK entry has several scripts to build, validate and reuse decision tree from a simple JSON input file (asked in ticket #31).

1-class decision tree example

Preparing input

Our input file to build a model is model-input.json:

  "ftable":          # Table with features
       [20, "O3"],   # Feature vector 1
       [20, "O2"],   # Feature vector 2
       [20, "Os"]    ...

   "fkeys":          # Some key for each above feature

   "features_flat_keys_desc": {  # User friendly description 
                                 # of each feature (via above key)
                                 # as well as possibility to add types, ranges, etc
      "program_size":{"name":"Program Size"},
      "opt_flag":{"name":"Optimization Flag"}

  "ctable": # Table of results (we will build model 
            # to correlate these results with above features) 

   "ckeys": # Key of the result

  "keep_temp_files":"yes", # tells CK to keep all intermediate files
                           # (useful for validation and debugging)

  "model_module_uoa":"model.sklearn", # Select high-level modeling engine
                                      # (currently model.sklearn or model.r) 
  "model_name":"dtc",                 # Select algorithm (DTC - decision trees)
  "model_file":"model-sklearn-dtc",   # Select filename to record model to
  "model_params":{"max_depth":3}      # Customize algorithm (for example, select depth of the tree)

Building model

You can build a model using prepared script model-sklearn-dtc-build.bat or directly using CK as follows:

 $ ck build model @model-input.json

You should normally see the following output files in your current directory:

  • model-sklearn-dtc.model.obj - model (python internal format)
  • model-sklearn-dtc.model.pdf - decision tree in PDF (can be embedded to papers)
  • model-sklearn-dtc.model.png - decision tree as PNG image (can be embedded to web papers)
  • model-sklearn-dtc.model.dot - decision tree in DOT (Graphviz) format
  • model-sklearn-dtc.model.decision_tree.json - decision tree in the CK format - can be later converted to C to be integrated with adaptive programs and compilers

If you had categorical feature dimensions such as in our example, CK will make an internal conversion to float. However, you may want to do it on your side to keep consistency between experiments.

Validating model

You can validate your new model using prepared script model-sklearn-dtc-validate.bat or directly using CK as follows:

 $ ck validate model @model-input.json

You should now see the oracle and predicted value together with overall prediction rate and RMSE for each input feature vector:

1) [20, "O3"] => False False 
2) [20, "O2"] => True True 
3) [20, "Os"] => True True 

Model RMSE =      0.0
Prediction rate = 100.000%
Mispredictions =  0 out of 3

Using model

Now you can use generated model to predict values based on new (previously unseen) features. You can check it via prepared script model-sklearn-dtc-use.bat or from the command line as follows:

 $ ck use model @model-input-use.json

where model-input-use.json contains a name of the created model file (without extension) and a new feature vector (note that you should now use floats instead of strings) as follows:

  "features": [10, 1], 


You should see the output similar to the one below:

  ft0 -> 10
  ft1 -> 1

Note, that you can record model in the CK entry to be shared and reused as demonstrated in the following scripts:\

  • model-sklearn-dtc-build-and-record.bat - will record above model files to CK model:demo-ml-decision-tree entry
  • model-sklearn-dtc-validate-from-recorded.bat - will validate model from CK model:demo-ml-decision-tree entry
  • model-sklearn-dtc-use-from-recorded.bat - will be making prediction using model from CK model:demo-ml-decision-tree entry

Real use cases of unified predictive analytics via CK

You can find examples of above decision trees in our project on autotuning and run-time adaptation of compilers and libraries depending on program, dataset and hardware features:

Improving models

Our long term vision is to make experimentation and predictive analytics simpler for researchers and easily accessible via CK web services with JSON API thus accelerating adoption of AI/ML as described in our ACM ReQuEST-ASPLOS'18 report, compiler crowd-tuning article and DATE'16 CK intro.

Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.