# Reproducible Data Science with Renku

# Set-up

To work through these notebooks, make sure you have an environment by following the set-up instructions in the [project README.md](../../README.md).

If you need help, use a red post-it, raise your hand, or ask on Gitter: https://gitter.im/SwissDataScienceCenter/renku

# Renku

Renku is software for doing data science that is directly and conceptually reproducible.

It has two parts:

* renku CLI (think `git`)
* Renkulab server (think `GitLab` or `GitHub`)

Renku is a tool for reproducible data science that we are developing at the [Swiss Data Science Center](http://datascience.ch/). It's quite new (only about 1.5 years old as of Jul 2019) and very actively being developed, with many new features underway.

Renku is made up of two parts: the renku command-line interface, and the Renkulab server. The distinction is similar to git vs. GitLab. `git` is a set of command-line tools for using version control on a project. GitLab is a server application for managing multiple projects and giving others access to them.

Similarly, `renku` is a set of command-line tools for working reproducibly; Renkulab is a server for sharing and collaborating on projects, which includes a zero-install environment for running code, including, but not limited to notebooks.

Just as with GitHub and git, projects can be started on the server (e.g., [renkulab.io](https://renkulab.io)), or locally, on your laptop or desktop computer. And it is easy to transition a project from one location to the other.

In this tutorial, we will start are project on our laptops, and, in the end, move them to Renkulab where we can share and collaborate with others.

<img alt="renku knowledge graph" src="images/evap_adelaide-reduced.svg" width="600"/>

([Evaluation of the Vegetation Optimality Model along the North-Australian Tropical Transect using a fully Open Science approach by Nijzink, Schymanski, et. al.](https://doi.org/10.5281/zenodo.3274346
))

Here is an example of some work that was done using renku. By using renku, it is possible to work reproducibly, documenting the process as a side effect.

# Renku's building blocks

<table class="table table-condensed" style="font-size: 16px; margin: 10px;">
    <thead>
        <tr>
            <th>Tool</th>
            <th>Environment</th>
            <th>Code</th>
            <th>Data</th>
            <th>Workflow</th>
            <th>Provenance</th>
        </tr>
    </thead>
    <tbody>
        <tr style="font-size:24px;">
            <th><a href="https://renkulab.io">Renku</a></th>
            <td>Docker</td>
            <td>git</td>
            <td>git-lfs</td>
            <td>CWL</td>
            <td>PROV-O/RDF</td>
        </tr>
     </tbody>
</table>



Renku combines many tools that you may be familiar with and packages them in a unified way. Renku is a sort of "syntatic sugar" for the building blocks: users are allowed to peek under the covers and work directly with git, e.g., if that is convenient.

# Task

Working with data from [US Dept. of Transportation, Bureau of Transportation Statistics](https://www.transtats.bts.gov), we will answer the following question:

- How many flights were there to Austin, TX in Jan 2019

The tools we will used for the task are a bit oversized for such a simple question. But it will give us an opportunity to look at reproducibility in an understandable and managable context.

# Hands-on

There are four versions of the hands-on that work through the same task in different environments. The two **local** versions uses renku installed on your computer; the two **hosted** versions use our public https://renkulab.io server as the execution environment. In each execution environment, the **plain** version implements the code in normal Python files, the **notebook** version implements the code in Jupyter Notebooks. Pick one from the matrix below.


<table style="font-size: 14px; margin: 10px;">
    <thead>
        <tr>
            <th></th>
            <th>Plain</th>
            <th>Notebook</th> 
        </tr>
    </thead>
    <tbody>
        <tr> 
            <th>Local</th>
            <td><a href="./local_plain/index.ipynb">Local/Plain</a></td>
            <td><a href="./local_notebook/index.ipynb">Local/Notebook</a></td>
        </tr>
        <tr> 
            <th>Hosted</th>
            <td><a href="./hosted_plain/index.ipynb">Hosted/Plain</a></td>
            <td><a href="./hosted_notebook/index.ipynb">Hosted/Notebook</a></td>
        </tr>
     </tbody>
</table>