Grigori Fursin edited this page Aug 1, 2018 · 87 revisions

[ Home ]

Table of Contents

CK development considerations

We had the following considerations in mind when developing Collective Knowledge (CK) and CK-based workflows:

  • Why: we are passionate about collaborative, reproducible, systematic and reusable R&D. We have open-sourced Collective Knowledge to help the community collaboratively tackle grand challenges in computer engineering and other sciences.
  • Open-source/agile: our vision for Collective Knowledge is to follow the Linux, Wikipedia and GitHub way for collaborative R&D. To fully embrace the open-source movement, we use a permissive license, no strings attached!
  • Community-driven: by partnering with leading companies, universities and conferences, we are growing a passionate community of users and developers to create a public repository of customizable and reusable artifacts!
  • Simple: Collective Knowledge is a cross-platform and relatively simple knowledge management system taking advantage of the best modern techniques including JSON, Git, Elasticsearch, SciPy, web services and agile R&D methodology to deal with ever changing software and hardware stack.
  • Portable and extensible: minimal software dependencies, plugin-based architecture and simple JSON API makes CK easily extensible and customizable. CK components can be assembled together just like LEGO bricks to focus on problem-solving. Full API validation can be added at any time after research idea is prototyped and fully validated.
  • Stable: stability of CK kernel as well as all components and workflows is ensured by always keeping backward compatibility!
  • Agile documentation (incremental improvements): We have spent considerable effort to document Collective Knowledge via community-maintained wiki covering everything from the basics to advanced use cases including artifact sharing, universal autotuning and predictive analytics. CK users also write blog articles describing their own ways to understand, use and extend CK (see Michel Steuwer's blog).

Ensuring CK stability

Since CK is a decentralized platform allowing researchers to exchange and reuse any components, its stability is ensured by the following ideas and conventions:

Stabilizing CK kernel

CK framework consists from a small CK kernel with a command line access and a small number of productivity functions exposed to users and portable across Python 2+ and 3+, and from a set of auxiliary modules to help users organize and share their local code and data as unified CK components (see their description under "default" repo in this list).

Note that we made a considerable effort to make the CK kernel work well across different platforms, operating systems, environments and Python versions, and therefore mostly fix bugs in the CK kernel while avoiding major changes now!

If you still want to propose an update to the CK kernel, please keep in mind, that many projects already depend on the CK kernel and should not suddenly break by your change! That is why you need to always keep new CK kernel versions backward compatible similar to Java or Linux kernel development, i.e. old projects should be able to run with a new kernel!

If you still want to update the CK kernel, we suggest you to read how Linux kernel developers describe their patches just to understand how development of similar distributed systems works, and then open an ticket on the CK GitHub describing why it is important for the CK community to have your update, and how you ensure that all existing projects will continue working across different platforms. It even better to discuss it before using CK mailing list or Spack channel. Avoid changing existing API, but either extend existing API with new dictionary keys or provide a new function name. Obviously, do not make new CK modules depend on your proposed functionality before it is accepted to the CK kernel (to avoid wasted code in case it is not accepted)! Finally, consider extending functionality via existing GitHub repositories such as ck-env rather than changing the CK kernel!

Stabilizing CK modules

Users substitute their ad-hoc scripts, provide an abstraction to tools and data, or implement new workflows using CK modules. If a given module is already shared and can be reused by anyone, any further changes to this module must be also backward compatible.

Similar to development of Linux or Java packages, CK allows users to extend functionality of existing modules without breaking compatibility simply by adding new keys in the JSON API or meta description, adding new function names, or eventually adding new modules. In such case, all other dependent modules will continue working, while new ones can rely on the new functionality.

This is a typical development approach for decentralized systems which ensures stability of the whole system without even introducing full and complex set of tests - the community can then gradually add such tests only when some functionality becomes frequently used or stable!

Since CK modules are mainly needed to provide stable API for different code and data, we also try to keep them as simple and portable as possible with minimal dependencies on third-party non-CK modules. However, if third-party Python modules (scipy, numpy) or tools are necessary, there are two ways to handle that: either by just asking users to install these modules and tools during CK installation, or by creating packages which will help automatically detect or install necessary modules and tools (please see related discussion)!

Stabilizing CK workflows

We use Travis build system for Linux/MacOS/Android and AppVeyour for Windows/Android to test CK workflows. See the following examples:

Keeping everything explicit and straightforward

We want to keep our workflows simple, explicit and straightforward for non-professional programmers (i.e. scientists) to help them quickly prototype their ideas and navigate through large code base.

That is why we introduced a concept of reusable CK functions which are implemented as CK modules and actions accessible either form a command line:

 $ ck {some_action} {some_module} ...

or Python:

  if r['return']>0: return r

It is now easy to find related Python code: it is implemented as function 'some_action' either in the '' in the CK entry 'module:some_module':

 $ cd `ck find module:some_module`
 $ cat

or in the CK kernel (default action):

Such approach allows quick reuse of existing functions, simple and explicit chaining of multiple functions, update of functions while keeping backward compatibility, and easy substitution of duplicate code from different modules into a common function.

Long-term ideas


Note that the following documentation must be updated!


Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.