Home

Stephen Koo edited this page Nov 7, 2016 · 144 revisions

Why CodaLab Worksheets?

While there has been tremendous progress in machine learning, data science, natural language processing, computer vision, and many other data- and computation-intensive fields, the research process is far from optimal. Most of the time, the output of research is simply a PDF file (published paper). Even when people release their data and code (which is a big step forward), it is often not obvious how to run it to obtain the results in a paper. Simply put:

Today, researchers spend excrutiating amounts of time reproducing published results.

The goal of CodaLab Worksheets is to fix this in order to both accelerate the rate of research and make it more sound.

How does CodaLab Worksheets work?

CodaLab keeps the full provenance of an experiment, from raw data to the final performance numbers that you put in your paper.

There are two important concepts in CodaLab: bundles and worksheets.

Bundles are immutable files/directories that represent the code, data, and results of an experimental pipeline. There are two ways to create bundles. First, users can upload bundles, datasets in any format or programs in any programming language. Second, users can create run bundles by executing shell commands that depend on the contents of previous bundles. A run bundle is specified by a set of bundle dependencies and an arbitrary shell command. This shell command is executed in a docker container in a directory with the dependencies. The contents of the run bundle are the files/directories which are written to the current directory by the shell command: In the end, the dependency graph over bundles precisely captures the research process in an immutable way.

Worksheets organize and present an experimental pipeline in a comprehensible way, and can be used as a lab notebook, a tutorial, or an executable paper. Worksheets contain references to bundles, and are written in a custom markdown language.

As an example, the figure below shows the dependency graph over four bundles, along with two worksheets, which contain both text and pointers to the bundles:

CodaLab's philosophy is to give you full control of how you want to run your experiments and get out of your way. It just maintains the dependency structure of your experiments and takes care of the actual execution. A good analogy is Git, which maintains the revision history and gives you total freedom in terms of what to put in your repository.

How do I learn more?

  • Quickstart: learn how to create bundles and worksheets (start here).
  • CLI Basics: learn how to use CodaLab from the comfort of your own shell.
  • Workflow: learn how to use CodaLab in your daily research.
  • Executable Papers: learn how to put your research paper on CodaLab.
  • CLI Reference: learn how to be an expert CodaLab user.
  • Worksheet Markdown: learn how to display tables of results and images in your worksheet.
  • Execution: learn how bundles are executed in docker.
  • Server Setup: if you want to run a CodaLab server for your own group.
  • Latest Features: what features have been recently added CodaLab lately?
  • Worksheet Examples: from the official CodaLab server.
  • About: who's behind CodaLab?

Where do I report bugs?

CodaLab is under active development. If you find bugs or have feature requests, please file a GitHub issue: