### Pre-Setup

Make sure your git environment is correctly configured with a username and email.

In [None]:
!git config --global user.name
!git config --global user.email

In [None]:
# If the above does not print anything, set a username and email:
#!git config --global --add user.name "John Doe"
#!git config --global --add user.email "john.doe@example.com"

# Starting a Renku project

## Outline

1. **Create repository**
2. Declare environment
3. Import data

# Creating a repository

The first thing to do when starting a project is to initialize a repository. This is done with
```
renku init [repo name]
```

Let's call ours `renku-tutorial-flights`

In [None]:
!renku init renku-tutorial-flights

Let is take a look at what's inside the renku repository

In [None]:
%ls -l renku-tutorial-flights

Renku creates a git repository and generates two files: 
- `Dockerfile` 
- `requirements.txt`

The initial `requirements.txt` file is empty and there as a shell. In a minute, we will create fill it out. The Dockerfile is there to allow specifiying external dependencies if you have any (in this project, we don't). It's used to construct an environment on Renkulab.

## Housekeeping

Like git, renku commands are normally executed within the repository. For the rest of the tutorial, we will work in the renku repository.

In [None]:
%cd renku-tutorial-flights

## Declaring the environment

1. Create repository &#10004;
2. **Declare environment**
3. Import data

To make the project reproducible, we need to declare the environment it runs in. We will be working with pandas, numpy, scipy, matplotlib, and seaborn, so let us create a requirements.txt file that makes this explicit. The requirements can of course be modified at any later time.

In [None]:
# Normally, you would write a requirements file, but we have one ready here
%cp ../templates/requirements.txt ./

### Committing to git

As we work, we to track the process with version control.

In [None]:
!git status

Let us tell git about the requirements.txt file.

In [None]:
!git add requirements.txt
!git commit -m"Declare the python environment for the project."

## Importing data

1. Create repository &#10004;
2. Declare environment &#10004;
3. **Import data**

### Create a dataset

Renku uses the concept of a dataset to group data together. We will create a dataset called `flights` for our flights data.

In [None]:
!renku dataset create flights

### Add data to the data set

And we will get some data and add it to the flights dataset. Renku can import data from different kinds of sources including data repositories like [zenodo](https://zenodo.org), other renku or git projects, the file system, or a URL. Here we will import from a URL. The data is available under this URL:

`https://renkulab.io/gitlab/cramakri/renku-tutorial-flights-data-raw/raw/master/data/v1/2019-01-flights.csv.zip`


In [None]:
!renku dataset add flights https://renkulab.io/gitlab/cramakri/renku-tutorial-flights-data-raw/raw/master/data/v1/2019-01-flights.csv.zip

This copies the data into the folder for the flights dataset. Let's see what is in there.

In [None]:
!renku dataset ls-files flights

Renku datasets contain metadata which is used to derive the provnenance of results.

# Exercises

Solutions are provided in the commented `%load` statements.

Return to the project directory to do the exercises.

In [None]:
%cd ..

## Ex. 1.0

Create a renku repository called `renku-tutorial-diamonds`

In [None]:
# %load solutions/ex1-0.fragment

## Ex 1.2

In the renku-tutorial-diamonds repo, create a dataset called `diamonds`

In [None]:
# %load solutions/ex1-1.fragment

## Ex 1.3

Add the file from `https://github.com/seaborn/seaborn-data/raw/master/diamonds.csv` to the diamonds dataset.

In [None]:
# %load ../solutions/ex1-2.fragment

(You are now in the renku-tutorial-diamonds folder, so the solution to the last exercise is located in a folder one level up.)