Repository structure

Grigori Fursin edited this page Jun 5, 2018 · 3 revisions

[ Home ]

When quickly prototyping research ideas during agile development, we often have to change API and data formats until we find the optimal solution. Using existing databases with pre-defined schemas or schema-free databases with proprietary formats make it very difficult and time consuming to update our code and data.

That is why in CK we decided to use an open scratch-pad like repository where all data and code is stored in directories and JSON-based schema-free files. All these files can be directly edited, copied, shared, moved, deleted, thus preventing researchers from being locked on some proprietary formats or "black-box" databases.

CK repository always has 3 main levels of directories related to module UOA,data UOA and JSON meta as schematically shown in the following figure:

The minimal format of a CK repository is the following:

Root directory:

  • .ckr.json - repository meta including dependencies on other repositories, URL if shared, UID, user-friendly description, etc.
  • LICENSE.txt - license of this repository.
  • AUTHORS - text file with authors of this repository.
  • COPYRIGHT.txt - copyright of the code/data in this repository.
  • CONTRIBUTIONS - community contributors to this repository.
  • CHANGES - list of changes to this repository.
  • Module UOA directories - module is used as an abstraction for its data, hence any CK data entry resides only in its host module UOA directory.
  • .cm - if above directories are aliases rather than UID, this directory will contain two disambiguator files alias-a-<module_uoa&gt which contains its UID and alias-u-<UID> which contains its alias. This allows CK to considerably speed up search of entries by UOA (UID or alias).
First level directory:
  • Data UOA directories - directories for data entries
  • .cm - if above directories are aliases rather than UID, this directory will contain two disambiguator files alias-a-<module_uoa&gt which contains its UID and alias-u-<UID> which contains its alias. This allows CK to considerably speed up search of entries by UOA (UID or alias).
Second level directory:
  • Files and sub-directories - anything related to a given CK entry (program source codes, data sets, zip files, etc)
  • .cm/meta.json - schema-free description of this entry
  • .cm/desc.json - schema for the description (rarely used at the moment)
  • .cm/info.json - info about the author, copyright, license, creation date, etc
  • (.cm/updates.json) - info about subsequent updates of this entry (another author, date of change, notes, "likes", etc)
That's all! Note, that we tried to make this structure as similar as possible to how we generally organize our file and directories in our USER space (i.e. without CK). However, now each artifact has it's own API, JSON meta and UID. This allows users to easily find, reuse and cross-link all past objects! For example, Grigori Fursin converted all his past artifacts (experiments, papers, benchmarks, data sets, tools, notes, etc) to this format, and even created an interactive CV powered by CK!

Questions and comments

You are welcome to get in touch with the CK community if you have questions or comments!

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.