Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[ Home ]
Are you an individual researcher or organization performing many experiments on a regular basis? You may find the Collective Knowledge framework (CK) useful if you suffer from one or more of the following problems:
- instead of innovating, you spend weeks and months preparing ad-hoc experimental workflows, which you either throw away when your ideas are not validated or need to maintain (adapting to ever changing software, hardware, interfaces and data formats);
- you have trouble sharing whole experimental workflows and results with your colleagues since they use different operating systems, tools, libraries and hardware (and they do need to use their latest environment rather than possibly outdated Docker or VM images);
- you have trouble managing and reusing your own scripts, tools, data sets and reproducing your own results from past projects;
- you have trouble retrieving data from your own or someone else's "black-box" database (particularly if you do not know the schema);
- you spend lots of time updating your reports and papers whenever you obtain new results;
- you do not have enough realistic workloads, benchmarks and data sets for your research;
- you face the ever increasing number of experimental choices to explore in complex design and optimization spaces;
- you accumulate vast amounts of raw experimental data but do not know what the data is telling you ("big data" problem);
- you want to extract knowledge from raw data in form of models but never find time to master powerful predictive analytics techniques;
- your organization pays dearly for its computational needs (in particular, for hardware and energy used in data centers and supercomputers) while you suspect they could be met at a fraction of the cost (if, for example, your deep learning algorithms could run 10 times faster).
Hence, we designed Collective Knowledge (CK) as just a small and highly customizable Python wrapper framework with a unified JSON API, command line, web services and meta-descriptions. This allows researchers gradually warp and glue together any existing software, hardware and data, share and reuse wrappers via Git, unify information flow between them, quickly prototype experimental workflows from shared artifacts, apply predictive analytics and enable interactive articles.
CK is an open-source (under permissive license), lightweight (less than 1 MB) and very portable research SDK. It has minimal dependencies and simple interfaces with software written in C, C++, Fortran, Java, PHP and other languages. Please check out CK documentation and Getting Started Guide for more details: http://github.com/ctuning/ck/wiki
Though seemingly simple, such agile approach already proved to be powerful enough to help scientists and research engineers:
- abstract and unify access to their software, hardware and data via CK modules (wrappers) with a simple JSON API while protecting users from continuous low-level changes and exposing only minimal information needed for research and experimentation (this, in turn, enables simple co-existence of multiple tools and libraries such as different versions of compilers including LLVM, GCC and ICC);
- provide a simple and user-friendly directory structure (CK repositories) to gradually convert all local artifacts (scripts, benchmarks, data sets, tools, results, predictive models, graphs, articles) into searchable, reusable and interconnected CK entries (components) with unique IDs and open JSON-based meta information while getting rid of all hardwired paths;
- quickly prototype research ideas from shared components as LEGO(TM), unify exchange of results in schema-free JSON format and focus on knowledge discovery (only when idea is validated you should spend extra time on adding proper types, descriptions and tests, and not vice versa);
- easily share CK repositories with whole experimental setups and templates with the community via popular public services including GitHub and BitBucket while keeping track of all development history;
- speed up search across all your local artifacts by JSON meta information using popular ElasticSearch (optional);
- involve the community or workgroups to share realistic workloads, benchmarks, data sets, tools, predictive models and features in a unified and customizable format;
- reproduce empirical experimental results in a different environment and under different conditions, and apply statistical analysis (similar to physics) rather than just replicating them - useful to analyze and validate varying results (such as performance and energy);
- use built-in CK web server to view interactive graphs and articles while easily crowdsourcing experiments using spare computational resources (mobile devices, data centers, supercomputers) and reporting back unexpected behavior;
- obtain help from an interdisciplinary community to explain unexpected behavior when reproducing experiments, solve it by improving related CK modules and entries, and immediately push changes back to the community (similar to Wikipedia);
- simplify the use of statistical analysis and predictive analytics techniques for non-specialists via CK modules and help you process large amount of experimental results (possibly on the fly via active learning), share and improve predictive models and features (knowledge), and effectively compact "big data".
For example, check out public GCC/LLVM optimization results of various shared workloads across diverse hardware including mobile devices provided by volunteers:
For more details, please check our
- publications with our long-term vision;
- live cKnowledge.org repository with results from crowdsourced experiments;
- ADAPT workshop reviewed by the community;
- artifact sharing and evaluation initiative for computer systems' conferences;
- example of the CK-based reproducible and interactive article;
- ARM's testimonials about CK (page 17).
You are welcome to get in touch with the CK community if you have questions or comments!