Getting started guide

Grigori Fursin edited this page Jun 25, 2018 · 24 revisions

[ Home ]

Table of Contents

Why Collective Knowledge (CK)?

Let's imagine that we are working on some research project where we need to load some existing text files (data sets), detect some patterns, record results and possibly share them with our colleagues (internally in a workgroup or publicly) or receive a feedback from them.

First, we would create a directory with the name of the project my_cool_project. Then we usually create a shell (or python) script detect_patterns where we will most likely hardwire path to existing local data sets (often wrongly thinking that we could always "clean it up" later). This script will then iterate over files, load them, search for pattern and save results:

(Example of a hypothetical ad-hoc shell script on Linux):
$ cat detect_patterns

#! /bin/bash

rm -f output.txt
for f in "datasets/*.txt"
do
  grep "some pattern" $f >> output.txt
done

Later, we may also provide extra functionality to customize this process via command line keys, record results in SQL or noSQL databases, prepare all tables and schemas describing all types which may take weeks.

If we need to share such experimental setup with colleagues, we then pack it or commit it to GIT/SVN. Our colleagues now need to go though the same time consuming steps, change paths to data sets, make sure that they have compatible software, possibly improve or change the scripts, data types and tables in a database, commit all changes back and ask all other colleagues to rebuild their databases, etc.

At the end, we spend most of our "research" time implementing experimental setups and code/data sharing instead of focusing on innovation. As a result, each student, scientist or even a company write their own experimental frameworks which often disappear at the end of projects simply because no-one knows how to use them apart a few colleagues.

Virtual Machines and Docker can automatically create snapshots of experimental setups and help share them across workgroups. Hence they are extremely useful to share already stable experimental workflows. However, they do not help develop reusable and customizable experimental setups that can take advantage of the latest user environment, code and data sets (it is not their purpose). Furthermore, they do not capture all hardware dependencies or run-time state of a system, and are not intended to help exchange experimental results.

Collective Knowledge Framework (or CK for short) is intended to close this gap. It helps researchers convert their ad-hoc scripts, tools, data sets and results into reusable and customizable components with a Python wrapper, Unique IDs, unified JSON API and JSON meta description. It then makes it really easy to implement customizable experimental workflows from such components (including your our own ones or shared by the community) just as LEGO (TM), unify command line and information flow between all components in the system via simple JSON API, detect or install all software dependencies (allowing multiple versions of tools and libraries easily co-exist on a user platform), save results in a human readable and easily extensible schema-free JSON format, and exchange whole experimental setups and results with all dependencies between workgroups using GIT or JSON-based web services.

Next, we will demonstrate how to convert above ad-hoc research scenario into customizable and reusable CK components and workflows.

Note, that CK has been successfully tested and used by volunteers on various platforms including Ubuntu, OpenSUSE, CentOS, MacOS X, Windows and partially on Android (via CK web services and OpenME). However, if you encounter any problems, please do not hesitate to get in touch with the CK community.

Quick CK installation

First, you should check whether CK is already installed on your machine by invoking the following command from your terminal window:

 $ ck version

If CK is not found, quickly install it as follows. Ensure that you have Python with version > 2.7 or > 3.2 installed:

 $ python --version
    or
 $ python3 --version

You should also have git command line client installed (since we use it to share research objects). You can check if it is installed simply via

 $ git --version

Now, you are ready to install CK. You can set up stable version simply using PIP (you should skip sudo on Windows and you should use pip3 if you want to install CK for Python 3.x):

 $ sudo pip install ck

Alternatively, if you don't have root access you can easily install a development version of CK from GitHub in your user space on Linux/MacOS:

 $ git clone https://github.com/ctuning/ck ck-master
 $ export PATH=$PWD/ck-master/bin:$PATH
 $ export PYTHONPATH=$PWD/ck-master:$PYTHONPATH

or on Windows

 $ git clone https://github.com/ctuning/ck ck-master
 $ set PATH={CURRENT PATH}/ck-master/bin;%PATH%
 $ set PYTHONPATH={CURRENT PATH}/ck-master;%PYTHONPATH%

Now you should be able to check that CK command line front-end is working via:

 $ ck version
or you can check that your CK version is up to date via:
 $ ck status

You can also list all internal CK actions via

 $ ck help

Finally, you can open this Wiki simple via

 $ ck guide

Note that ck is just a batch file which calls CK kernel ck-master/ck/kernel.py implemented as a small (and monolithic to be fast) Python module with a number of internal or productivity functions. All other CK functionality is implemented as plugins/wrappers (CK modules).

By default, CK attempts to find and use the latest version of installed Python, i.e. v3 and only then v2. You can force CK to use a specific Python version by changing environment variable "CK_PYTHON", i.e.:

 $ export CK_PYTHON=python3
      or
 $ export CK_PYTHON=python

Trying CK using Docker

You can also try CK without installing it using the following Docker image (if you need to install Docker on Linux or Windows, please check our guide here: http://github.com/ctuning/ck-docker):

 $ docker run -it ctuning/ck
However remember that the main purpose of CK is to help you clean up and systematize your local files while rebuilding and sharing native experimental workflows, and taking the advantage of the latest software environment!

Configuring internal kernel parameters

It is possible to see internal kernel parameters via

 $ ck set kernel

It is possible to change any kernel parameter, for example to let CK install software packages to env entries (will be explained later) via

 $ ck set kernel var.install_to_env=yes

It is possible to unset variable via

 $ ck set kernel var.install_to_env=

It is possible to setup kernel in a user-friendly way from the shell as follows

 $ ck setup kernel

Converting ad-hoc scripts and data sets into CK components with JSON API

Creating research project repository

Now we are ready to convert our ad-hoc scripts and datasets into CK format.

First, we create a so-called CK repository with an alias "my_cool_project" simply via:

 $ ck add repo:my_cool_project --quiet

It is just a new directory created in your user space (in CK directory). You can find it via

 $ ck find repo:my_cool_project 

However, your research project is now registered in the CK and is searchable.

For example, you can always list all registered CK repositories (projects) via

 $ ck list repo

Note, that after clean installation, you always have two repositories:

  • default - internal CK repository located in ck-master/ck/repo;
  • local - temporal (scratch pad) repository created in a user space during the first CK invocation.

Converting ad-hoc shell script to a unified CK module

Next, we would like to gradually substitute numerous and commonly used shell scripts with hard-wired paths and some ad-hoc command lines (for example, a hypothetical "detect_patterns" shell script mentioned in the intro) with reusable CK modules with a unified command line and python JSON API.

For this purpose, you can create a new CK module patterns with an action detect inside my_cool_project CK repository simply as follows:

 $ ck add my_cool_project:module:patterns --func=detect --quiet

Now you have a working dummy which can invoke from a command line as follows:

 $ ck detect patterns

It simply prints command line options converted into Python dictionary (JSON).

This dummy module is implemented in Python and you can find it via:

 $ ck find module:patterns

There, you should find Python source file module.py with the dummy function detect where you can re-implement your script functionality as we will demonstrate later.

Note, that such functions should always return a dictionary with at least an integer key return and text key error if return>0.

This allows us to build a hierarchy of modules on top of existing ones while easily propagating and handling errors rather than failing the whole workflow. It is particularly important during autonomous experiment crowdsourcing where we would like to continue running workflow if some minor errors occur (CPU/GPU frequency is not available or one program out of many failed to compile during a given optimization). Furthermore, it allows us to create stable web services and properly finish HTML pages when errors occur. That's why we use internal Python exceptions only in the last resort when there is a critical mistake that has to be immediately fixed!

You can now notice the CK command line standard:

 $ ck [action] [module alias] options

Note, that CK unified command line parameters and converts them to an input dictionary. For example, the following command line

 $ ck detect patterns param1=value1 -param2=value2 --param3=value3 --param4
is converted to function input dictionary i as follows:
 {
  'action':'detect',
  'module_uoa':'patterns',
  'param1':'value1',
  'param2':'value2',
  'param3':'value3',
  'param4':'yes'
 }

You can also add different json files to your input. For example, you can add the following two input files input1.json and input2.json

$ cat input1.json
{
  "input_params1":{"xyz":1}
}

$ cat input2.json
{
  "input_params2":[10,20,30]
}

to the input of your function as follows:

 $ ck detect patterns @input1.json @input2.json

This will be converted to function input dictionary i as follows:

 {
  'action':'detect',
  'module_uoa':'patterns',
  'input_params1':{'xyz':1},
  'input_params2':[10,20,30]
 }

A few more features of the command line:

 $ ck detect patterns --out=json @@param5 -- a b c 

CK will ask user to enter a dictionary from the command line that will be added to i under param5 key. At the same time, everything after -- will be converted to "unparsed" key. Finally, CK will print the output dictionary of the function (r) to console:

 $ ck detect patterns --out=json @@param5 -- a b c 

{"var1":"value1"}

Command line:

{
  "cids": [],
  "action": "abc",
  "param5": {
    "var1": "value1"
  },
  "out": "json",
  "unparsed": [
    "a",
    "b",
    "c"
  ],
  "cid": "xyz",
  "module_uoa": "xyz",
  "xcids": []
}
{
  "return": 0
}

Unparsed key is useful when CK modules are used as wrappers for existing tools such as GCC, LLVM, R, DNN while unifying their IO, detecting their version and ensuring compatibility and reproducibility for experimental workflows (as described later in advanced CK tutorials).

You can also specify repository alias and data alias in the CK command line at the same time as follows:

 $ ck detect my_cool_project:patterns:xyz

Such command line will be translated into the following input i:

 {
  'action':'detect',
  'module_uoa':'patterns',
  'repo_uoa':'my_cool_project',
  'data_uoa':'xyz',
  'cid': 'my_cool_project:patterns:xyz'
 }

For user convenience, we also added possibility to list all available actions of any given module via:

 $ ck help <module_uoa>

For example, ck help patterns should list detect function.

You may also write notes about this module and its actions on the GitHub wiki via

 $ ck webhelp patterns

Converting data sets to CK format

We would like to find patterns in some hypothetical text files usually scattered in multiple directories in our user space. The idea is to place such files inside CK repository to be able to automatically find them either by CK UID, alias or some tags rather than hard-wiring paths to them in scripts.

We can do it by creating a new abstraction in CK for a group of similar files via a new CK module. For example, lets add a new module text that will serve as a container for text files via

 $ ck add my_cool_project:module:text --quiet

In fact, all functionality in CK is implemented as such CK modules.

Note, that any module in CK has some internal actions standard for any repository and similar to a natural language:

  • add - add new entry for a given module
  • list - list all entries for a given module
  • find - return path of an entry
  • rm - remove entry
  • ren - rename entry
  • load - load meta-description of an entry
  • update - update meta-description of an entry
  • cp - copy entry to a different repository
  • help - list module's actions (if description exists)

Now, we can add text entries inside our project repository as follows:

 $ ck add my_cool_project:text:my_dataset1

You can find path to this new data set entry via:

 $ ck find text:my_dataset1

We can also add entries with some comma-separated informal tags

 $ ck add my_cool_project:text:my_dataset2 --tags=project1,experiment1
 $ ck add my_cool_project:text:my_dataset3 --tags=project1,experiment2
 $ ck add my_cool_project:text:my_dataset4 --tags=project1,experiment3
 $ ck add my_cool_project:text:my_dataset5 --tags=project1,experiment4

You can now list all of them via:

 $ ck list text

You can also use patterns:

 $ ck list text:my*

You can search entries by tags:

 $ ck search text --tags=experiment1

You can delete a given entry:

 $ ck rm text:my_dataset3

You can delete entries by tags:

 $ ck rm text:* --tags=experiment3

You can rename a given entry:

 $ ck ren text:my_dataset2 :my_dataset2x
 $ ck list text

Note that we have skiped module in the second parameter - in such case CK reuses the module from the first parameter. This also holds for repositories, i.e. we can move my_dataset2x to another repository (for example, local) as follows:

 $ ck mv text:my_dataset2x local::
 $ ck list text --all

You can copy a given entry to another one:

 $ ck cp text:my_dataset5 :my_dataset5x
 $ ck list text

You can update meta-description of a given entry (besides tags) from command line as follows:

 $ ck update text:my_dataset3 @@dict

You will then need to enter JSON, for example, to describe that a given entry may have a given data set file:

 {
  "dataset_files": [
    "text1.txt"
  ]
 }
(Press Enter twice to finish.)

You can then check that meta-description was updated as follows:

 $ ck load text:my_dataset3 --min

Now, you can create a text file `text1.txt`, for example, with the following text

 This is my test
and move it to the CK my_dataset3 entry (i.e. to the directory of this entry on your local disk which you can find via "ck find text:my_dataset3") - we will need it later to implement simple experimental scenario.

Referencing CK entries by a local identifier CID similar to DOI

You now probably noticed how we reference CK entries in the command line: by repository, module and entry aliases separated by a colon. Whenever only one alias is present, it is a module alias. Whenever two aliases are present, these are aliases of a module and an entry. Whenever three aliases are present, these are aliases of a repository, a module and an entry respectively.

Whenever aliases are omitted but colon is present, CK attempts to guess an alias from related references in the command line (see above example with entry renaming). Whenever repository is not mentioned, CK searches for an entry alias in all user repositories. This allows us to effectively reuse a given module and abstract data in other repositories!

Note that each repository, module and entry alias has an associated internal Unique ID (UID) - 16-digit hexadecimal number. You can find it via

 $ ck info text:my_dataset1
 $ ck info module:text
 $ ck info repo:my_cool_project

This allows us to find the same entry by Unique IDs while possibly renaming its user-friendly alias later. Hence any entry in the CK can be references by both UID and alias - we call such reference UOA (Unique ID Or Alias). A combination (repository UOA:)module UOA:entry UOA is called CID or Collective ID and is used similar to DOI. However, the main difference is that it is decentralized, i.e. it is locally generated and do not require any centralized web service from the start (eventually we still need some service to find entries which a user decides to share).

Understanding CK repository structure

Let's now investigate how my_cool_project looks like (alternatively, you can browse any CK repository on the web, for example: universal autotuning or unified predictive analytics). If you are on Linux and use command line, do not forget to use ls -a to list files starting with a dot.

You can find repository path via

 $ ck find repo:my_cool_project

Any CK repository has the following format:

  • .cmr.json - internal CK description (used to find a given repository or describe dependencies on other repositories)
  • various auxiliary files added by a user to describe this project such as Readme(.md), COPYRIGHT.txt, LICENSE.txt, AUTHORS, CHANGES, CONTRIBUTIONS)
  • directories named by a module alias or UIDs - they serve as containers for related entries. Note that if a given module has an alias, CK provides an explicit conversion from UID to alias and backward using two files inside .cm directory: alias-a-{alias name} which contains UID, and alias-u-{UID} which contains an alias. This allows CK to dramatically speed up search for a given entry both by alias and by UID.

For example, you can see text, module and .cm directories in the root path of my_cool_project repository.

  • Each above module directory has sub-directories named by an entry alias or UID. It may also contain .cm directory for fast alias to UID conversion.

For example, you can see my_dataset1, my_dataset2x, my_dataset3 and .cm directories in the text directory. You can also find patterns and text directories in the module directory.

Finally, each entry in CK has a .cm directory with several files including:

  • meta.json with any JSON meta description of an entry required by a user for a research project
  • info.json with some internal info about this entry including date of creation, author, etc,
  • desc.json with a schema for a meta description (optional - useful only before sharing when research project becomes stable)
  • updates.json to keep the history of all updates of this entry (optional)

For example, entry text:my_dataset3 has .cm\meta.json with the following JSON:

 {
  "dataset_files": [
    "text1.txt"
  ], 
  "tags": [
    "project1", 
    "experiment2"
  ]
 }

Implementing simple experimental workflow

Let's now re-implement our original and ad-hoc find_patterns script as a CK-based experimental workflow with unified JSON API to obtain all data sets which have text files, find some pattern there and return Unique IDs and aliases of those entries where this pattern is found.

We need to find our module patterns

 $ ck find module:patterns

and substitute dummy function detect with the following code:

def detect(i):
    """
    Input:  {
            }

    Output: {
              return       - return code =  0, if successful
                                         >  0, if error
              (error)      - error text if return > 0
            }

    """

    # Call CK kernel to invoke a given module with a given action and an input
    r=ck.access({'action':'search',
                 'module_uoa':'text'})
    # If error, quit (CK will print an error)
    if r['return']>0: return r

    # Get a list of entries in a special format 
    # To get an API of a function, use 'ck search --help'
    # Alternatively, you can just print this variable
    # to quickly figure out the schema
    l=r['lst']

    # Print entry alias, UID and path
    for x in l:
        # Get entry's UID and UOA (if there is an alias then UOA=alias, 
        #                          if there is only UID then UOA=UID)
        duoa=x['data_uoa']
        duid=x['data_uid']
        
        # Get entry's repo UOA and UID
        ruoa=x['repo_uoa']
        ruid=x['repo_uid']

        # Get path to an entry
        path=x['path']

        print ('')
        print (path)
        print (duoa, duid)
        print (ruoa, ruid)

    # Return at least integer 'return' key
    return {'return':0}

Now you can run it simply via

 $ ck detect patterns

Normally, you should see information about all text entries.

Now, we can make a slightly more sophisticated workflow where we specify a string in the command line to find it in the text files of all text entries that has tags project1. You need to substitute above function with the following:

def detect(i):
    """
    Input:  {
              (string)     - string to find in text files
            }

    Output: {
              return       - return code =  0, if successful
                                         >  0, if error
              (error)      - error text if return > 0
              (entries)    - list of 'text' entries with found string
            }

    """

    import os

    # Get string to search from the command line, i.e. 
    # when invoking this module as 'ck find patterns --string=test'
    s=i.get('string','')

    # Call CK kernel to invoke a given module with a given action and an input
    r=ck.access({'action':'search',
                 'module_uoa':'text',
                 'tags':'project1'})
    # If error, quit (CK will print an error)
    if r['return']>0: return r

    # Get a list of entries in a special format 
    # To get an API of a function, use 'ck search --help'
    # Alternatively, you can just print this variable
    # to quickly figure out the schema
    l=r['lst']

    e=[]

    # Print entry alias, UID and path
    for x in l:
        # Technically, we could just check if file 'text1.txt'
        # exists in the path, but here we would like to show
        # how to load a meta description of a given entry
        # and read only text files described via meta description

        duoa=x.get('data_uoa','')
        duid=x.get('data_uid','')
        muid=x.get('module_uid','')
        ruid=x.get('repo_uid','')

        r=ck.access({'action':'load',
                     'module_uoa':muid,
                     'data_uoa':duid,
                     'repo_uoa':ruid})
        if r['return']>0: return r

        path=r['path']
        meta=r['dict']

        # Check if there is a key with files
        df=meta.get('dataset_files',[])
        if len(df)>0:
           # Load file
           p=os.path.join(path, df[0])

           # use some internal CK productivity functions 
           r=ck.load_text_file({'text_file':p})
           if r['return']>0: return r
           ss=r['string']

           if s in ss:
              print('Data entry found '+duoa+' ('+duid+')')
              e.append(duid)

    return {'return':0, 'entries':e}

You can run it as follows:

 $ ck detect patterns --string="is my"

and you should get only one entry my_dataset3 (which we prepared before with the text1.txt file).

You can also add another action to this module via

 $ ck add_action module:patterns

Remember, that you can list all actions (functions) of this module via

 $ ck help patterns

Note, that you can now obtain API of any given function such as detect via

 $ ck detect patterns --help

Also note, that it is very easy to record above results inside your repository. Let's say you have a module results that you would like to use to record the output (or update it if entry already exists). Then you just add the following code to record results to entry my_result:

    r=ck.access({'action':'update',
                 'module_uoa':'result',
                 'repo_uoa':'my_cool_project',
                 'data_uoa':'my_result',
                 'dict':{'list_of_entries_with_pattern':e}})
    if r['return']>0: return r

Here, if you omit 'repo_uoa', the results will be recorded in local repository. If you omit data_uoa, a UID will be generated instead. If you use add instead of update and entry already exists, CK will quit with an error.

Reusing shared components

One of the main ideas behind CK is to eventually enable truly open research similar to open source development by collaboratively organizing and unifying any research code and data. Therefore, we already started sharing various repositories with common modules via GitHub and Bitbucket to let users reuse them, improve them and build upon them.

Below you may see a list of public repositories and modules (with their actions) shared by the non-profit cTuning foundation and dividiti:

For example, rather than inventing a module text to abstract data sets, we could reuse an existing module dataset from the public ck-autotuning repository. To use this module you just need to ask CK pull this repository via

 $ ck pull repo:ck-autotuning

You can check that this module is visible in CK via:

 $ ck find module:dataset
or
 $ ck list module:data*

You can now add a new dataset container for above examples with a text tag to the my_cool_project repository via:

 $ ck add my_cool_project:dataset:my_dataset1 --tags=project1,experiment1,text

Furthermore, now you can easily add more datasets shared by others to your experimental workflows. For example, you can pull ctuning-datasets-min repository with images, text files, audio files, crypted files, etc. which we were actively using in our past machine-learning based autotuning research via

 $ ck pull repo:ctuning-datasets-min

Note, that above command will also automatically pull extra repositories it is dependent on including ctuning-programs with various small benchmarks and kernels from our past research on crowdsourcing experiments and program optimization across voluntarily contributed computing devices such as mobile phones (CPC'2015 paper).

Also, note that for your convenience we provided a way to update all Git repositories including CK itself (if it was also installed from Git) via

 $ ck pull all --kernel

Exchanging artifacts and results between researchers

Open and customizable CK format together with human readable and easily editable file structure of a repository makes it straightforward to exchange various artifacts and experimental workflows between users, workgroups or the community.

You can easily share the whole repository with modules and data by packing it via

 $ ck zip repo:my_cool_project

and send created ckr-my_cool_project.zip to your colleagues. They can then add it to their system simply via

 $ ck add repo:my_cool_project --zip=ckr-my_cool_project.zip --quiet

Now your colleagues can immediately take advantage of all your modules and data for their experiments, replay and customize your workflows, and even crowdsource experiments! We hope this approach may solve various issues we encountered during Artifact Evaluation for PPoPP, CGO and other major computer systems' conferences.

For example, some researchers already started sharing their artifacts and workflows along with their publications in CK format:

Note, that as a part of an ACM taskforce on reproducible research, we currently discuss with the community how to unify sharing, description and discovery of artifacts which passed evaluation. Therefore, we provided an extra option in the CK to describe your repository before zipping it and sending it for Artifact Evaluation or preserving it in any Digital Library:

 $ ck describe repo:my_cool_project

You can also pack just a few entries from any repository using wildcards. The following example will create a ckr.zip with all modules from my_cool_project which can be useful if you want to share API but not entries themselves:

 $ ck zip repo --data=my_cool_project:module:*

Another example shows you how to archive all modules across all CK repositories via

 $ ck zip repo --data=modu*:*

These entries can be later easily extracted by your colleagues to any of their CK repositories via

 $ ck unzip repo:my_cool_project --zip=ckr.zip

Naturally, you can use any public or private Git service to host CK repositories and collaboratively improve code and data. You just need to first create a dummy Git repository (for example, at GitHub or Bitbucket) and then add it to CK via:

 $ ck pull repo:my_new_shared_repo --url={full url to your repository}

Then, whenever you add, update or remove entries, just do not forget to add --share in CK command line to let CK commit changes. For example,

 $ ck add my_new_shared_repo:text:xyz --share

This is still an experimental mode and you may as well manually add, commit and push changes to the Git repository. We plan to improve this functionality in the future.

Note that from our long research experience, it used to be a nightmare to migrate all our own artifacts from old personal machines to new ones. CK makes it extremely simple: you just pack your CK directory (by default, $HOME/CK on Linux or %USERPROFILE%\CK on Windows) and move it to the new computer. CK should immediately be able to use these files (provided that paths did not change, i.e. you use the same username). You may even use standard cloud synchronization to move your data between different machines. Furthermore, you can easily use the same cloud services to run your CK-based experiments (for example, we use Microsoft Azure Cloud to power our program crowd-benchmarking and crowd-tuning scenario across numerous hardware).

Using CK as a web service

Unified JSON API also made it relatively simple to use CK as a web service, i.e. when you want to access CK repositories remotely or quickly prototype client/server architecture with basic web dashboards (see example at http://cknowledge.org/repo). This can be especially useful when crowdsourcing various experiments such as performance autotuning or collaborative bug detection.

All you need is to start CK web service on a given machine with a given hostname via:

 $ ck start web

Then you just need to add a new repository on your machine while configuring it to access above remote repository via:

 $ ck add repo:remote-experiments --remote --url=http://{hostname}:3344/ck? --quiet

Now you can perform any operations presented in this section using this repository (apart from moving files between servers which requires some advanced functionality described in full documentation) while CK will automatically tunnel all requests to the remote machine. For example, you can add a text entry on such remove machine via:

 $ ck add remote-experiments:text:my_remote_dataset

Note, that you already have one remote repository in CK by default: remote-ck. It is connected with our public http://cKnowledge.org/repo repository to let you participate in our campaign to crowdsource software/hardware benchmarking and optimization (ck-crowdtuning repository).

By default, CK web services returns JSON. However, we provided convention to return HTML pages and hence enabled possibility to browse repositories, enable interactive tables, graphs and papers, and even create full-fledged web sites.

For example, you can install support for CK-powered websites and start CK web-service on your local machine via

 $ ck pull repo:ck-web
 $ ck start web

You can then start browsing local CK repositories by opening http://localhost:3344 page in the browser of your choice.

You can see examples of CK-powered website and interactive articles here:

Using CK from python scripts or Jupyter notebooks

It is also possible to access CK directly from python or ipython notebooks. You just need to install CK as a standard module as follows:

 For Linux:

 $ cd $CK_ROOT
 $ sudo python setup.py install

 For Windows:

 $ cd %CK_ROOT%
 $ python setup.py install

 For Linux or Windows using pip (if available):
 $ (sudo) pip install ck

Then you need to import ck.kernel as ck and use ck.access function as was already shown in above sections. For example, you can list all available modules as follows:

import ck.kernel as ck

r=ck.access({'action':'list','module_uoa':'module'})
if r['return']>0: 
   ck.err(r)

lst=r['lst']

for q in lst:
    ck.out('Data UOA: '+q['data_uoa'])

It is possible to get API of any such function via

 ck get_api --func=load_json_file

Using productivity functions from the CK kernel unified across all Python version

Developers have very little time porting their Python scripts from Python 2 to 3 or even make them compatible between two versions. At the same time, they often duplicate various functionality such as loading, processing and saving files. Eventually, we added all such functions to the CK kernel while hiding all the logic to support 2nd and 3rd version of Python from the users, and unifying unicoded IO. We also unified API to call any productivity function with dictionary both for input and output. For example, loading JSON file from CK module can be done as follows:

 r=ck.load_json_file({'json_file':'my_json_file.json'})
 if r['return']>0: return r
 d=r['dict']

This function can also be called from standalone Python scripts as follows:

 import ck.kernel as ck

 r=ck.load_json_file({'json_file':'my_json_file.json'})
 if r['return']>0:
    ck.err(r)
 d=r['dict']

It is possible to get API of any such function via

 ck get_api --func=load_json_file

Here is the list of some of the available productivity functions:

  • out - used instead of print while unifying Python 2 and 3 syntax as well as support for unicoded strings
  • inp - used instead of input/raw_input
  • gen_uid - generates CK UID
  • is_uid - checks if string is a CK UID
  • gen_tmp_file - generates tmp files
  • convert_json_str_to_dict
  • load_json_file
  • load_yaml_file
  • load_text_file
  • save_json_to_file
  • save_yaml_to_file
  • save_text_file
  • merge_dicts - intelligently merge dictionaries (per key while handling lists and sub-dicts)
  • compare_dicts - intelligently compare dictionaries
  • compare_flat_dicts - compare if two flat dictionaries are equal
  • substitute_str_in_file - substitute string in a given file
  • dump_json - dump json to string
  • input_json - input json from console and convert to dict
  • copy_to_clipboard
  • copy_path_to_clipboard - copy current path to clipboard (if supported by OS and Python)
  • convert_file_to_upload_string - useful for web services
  • convert_upload_string_to_file - useful for web services
  • list_all_files - list all files in a given directory recursively with some statistics and wild-cards
  • get_current_date_time
  • convert_iso_time
  • find_string_in_dict_or_list
  • unzip_file
  • flatten_dict - convert dict to flat vector (see https://hal.inria.fr/hal-01054763 and http://arxiv.org/abs/1506.06256 for a brief intro of our flat vectors useful for CK-based statistical analysis and machine learning)
  • get_by_flat_key - get value in dict using flat key
  • set_by_flat_key - set value in dict using flat key
  • restore_flattened_dict - convert flat vector to dictionary

Some of these functions can be invoked directly from the command line and hence reused in other scripts:

  • ck uid - generates unique ID in CK format
  • ck cid - detects if current directory belongs to a CK entry, and prints CID (repo UOA:module UOA: data UOA) of this entry
  • ck copy_path_to_clipboard - copies current path to the clipboard (if supported by OS and Python)

Summary of commands for locating entities

CK provides several commands for locating entities. (To run examples below, please install the [ck-autotuning](https://github.com/ctuning/ck-autotuning) repository first as follows: `ck pull repo:ck-autotuning`.)

  • find - returns the path to a given entity. For example, `ck find test:unicode` returns the path to an entity called `test:unicode`. Note that when applied to a repository, `ck find` returns the path to the _description_ of the repository, but not to the repository itself. Use the `ck where` to obtain to the path to the repository.
  • where - returns the path to a given repository. For example, `ck where repo:ck-autotuning` returns the path to the `ck-autotuning` repository.
  • list - returns the list of all entities with names matching a given pattern (possibly with wild-cards). For example, `ck list module` lists all available modules; `ck list dataset:image-*` lists all datasets whose names start with `image-`. This method is fast, as it only searches through names.
  • search - similar to list, but can prune the search results by 'tags' or specific JSON 'meta', and hence may be slower. For example, `ck search dataset --tags=jpeg` returns jpeg datasets only.
  • show - for environment entities, shows in a user-friendly way all registered environments matching a given pattern and tags. For example, `ck show env --tags=compiler` shows all registered compilers. In contrast, `ck list env` only shows UIDs of _all_ environment entities (compilers or not).

Dealing with evolving software and hardware

Continuously evolving software and hardware dramatically complicates life of researchers (as we experienced during past 20 years), since they need to rebuild their ad-hoc experimental scripts any time software or hardware API is changing. CK helps solve this problem as described in the second section of the Getting Started Guide.

Conclusions

In this section, we presented all major CK conventions that should hopefully be enough to give you an idea about how CK works. The rest can be gradually learned by example during agile CK-based research and experimentation.

As you can see, CK allowed us to create a self-contained repository that is extremely similar to the original project, but just with some extra structure including modules and entries with Unique IDs, unified JSON API and schema-free meta description.

At the same time, we do not lock users on our or any other technology since all the used formats are totally open and already have considerable support from the community.

Basically CK is a low-level research SDK which lets you quickly warp and glue together all existing software, hardware and data while focusing on quick prototyping and crowdsourcing of research ideas.

Though seemingly simple, such approach proved to be powerful enough to help us and our colleagues convert many existing, ad-hoc, messy and unportable projects into modular CK workflows while making all artifacts searchable, referable, movable, human readable, reusable and customizable.

Only when research ideas are validated and wrong ones are discarded, researchers can gradually and collaboratively unify information flow between all components and provide schema.

Furthermore, we can continuously build upon others' components and implement very complex yet stable experimental workflows such as crowdsourcing multi-objective software and hardware optimization across multiple users, unify and apply statistical analysis and predictive analytics, and even enable truly adaptive self-tuning computer systems. We just need to reserve several keys in information flow similar to physics and then collaboratively model software and hardware behavior as a complex system (see full list in the CK development section:

  • characteristics - vector of execution time, performance, energy, accuracy, resilience, size, contentions and costs;
  • choices - gradually exposed design and optimization choices (OpenCL, CUDA, compiler passes, polyhedral transformations, algorithmic parameters, CPU/GPU frequency, etc);
  • features - program, data set and system features (image size, Milepost static features, hardware counters, etc);
  • state - run-time state (cache state, network state, etc)
  • host_os - host OS (compiling a program)
  • target_os - target OS (running a program)
  • device_id - target device (if running a program on a remote device such as Android mobile phone)

We demonstrate these and other practical academic and industrial research scenarios implemented in CK in the next sections of the CK documentation.

You may also check out (and reference if useful) our related publications with long term vision, Artifact Evaluation initiate for major ACM computer systems' conferences and the latest report on CK workflows for autotuning and machine learning.

Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.