[Kudos to **Aron Ahmadia** (US Army ERDC) and **David Ketcheson** (KAUST) from whom I copied shamelessly]

[Day 1 morning **_"the tools we use"_**]

# Course philosophy

What is data-science?
* hopefully involves data (lots of it)
* and a scientific approach that is
  - reproducible
  - transparent
  - _open_
  - scaleable

When working with lots of data you will 
* spend 90% of your time compiling and cleaning your data and figuring out which tools you should use to do so
* spend 9% of your time complaining about how long it takes you to manage your data
* spend 1% of your time to do the actual analysis

Therefore the course will
* commit lots of time to teach you how to use _tools_ to deal with data
* involve lots of hands on exercises (about 2/3 of the time!)
* follow a completely open and transparent approach

# Jupyter and Git

<a name="top"></a>Outline
---

* [Version control with Git](#git)
  * [Introduction to Git](#git intro)
  * [The local repository](#local repo)
  * [The remote repository](#remote repo)
  * [Committing files](#workflow)
  * [Branches](#branches)
* [Exercise 01: Git](#exercise01)
* [Jupyter notebooks](#jupyter)
  * [Introduction to Jupyter](#jupyter intro)
  * [Modal editor](#editor)
  * [Navigation & keyboard shortcuts](#navigation)
  * [Markdown](#markdown)
* [Exercise 02: Jupyter](#exercise02)

**Learning goals:** By the end or this lecture you will
* have access to a version controlled, local copy of the course material,
* your _own_ remote repository to store exercises or data,
* know how to write jupyter notebooks and
* know how to write and execute code.

<a name='git intro'></a>Introduction to Git
---

### What is Git?
* Git is a software for distributed version control of files.
* Git is the most widespread tool to work with code collaboratively.
* Git is open source and free.
* It is a command line based tool (alltough there are GUIs for it).  

Git has three data structures: 
* an **index** that caches information about the 
* **working directory** and the next version to be committed; 
* and an immutable, append-only **object database**.  

The last entry in this object database is called **HEAD** and is a snapshot of the most recent commit.

![trees](http://rogerdudler.github.io/git-guide/img/trees.png)

The object database contains four types of objects:
* A blob (binary large object) is the content of a file. Blobs have no proper file name, time stamps, or other metadata - a blob's name internally is a hash of its content.
* A tree object is the equivalent of a directory. It contains a list of file names, each with a type and a reference to a blob or tree object that is that file or directory's contents. These objects are a snapshot of the source tree. 
* A commit object links tree objects together into a history. It contains the name of a tree object (of the top-level source directory), a time stamp, a log message, and the names of zero or more parent commit objects.
* A tag object that can be used to tag specific releases of the repository (more on that maybe later).

### What can you do with Git?
* local version control (rolling back to earlier versions, recovering deleted files etc.)
* remote backup of your work
* collaborative work online via GitHub and GitLab
* making your code accessible to others
* publishing of code  

Git is way to large and powerful to even begin to teach it comprehensively here. We will introduce you to a few basics and then add new functionality as we go along.

<a name='local repo'></a>The local repository (init, status, clone)
--- 

**workflow**  
* create a new directory for this course and switch to it
* to create a new and empty git repository, type  
``` 
git init 
```  
* check the status of the repository using  
``` 
git status 
```
* create a working copy of a local repository  
``` 
git clone /path/to/repository 
```   


[top](#top)

<a name='remote repo'></a>The remote repository (clone, pull, remote add)
--- 

**workflow**
* when using a remote server, you need to specify the server to clone the repository  
```
git clone <server>
``` 

* in the case of this course, we created a repository at the server
```
git@gitlab.gwdg.de:pycnic/datascience-course-ggnb
```
* if you haven't cloned an existing remote repository, you can add your local one to the remote server via
```
git remote add origin <server>
```
* if you do that, don't forget to get your local version up to date with the remote repository
```
git pull origin master
```

HINT: use ```git remote``` to show the name of your remote repository

[top](#top)

<a name='workflow'></a> Committing files (add, commit, push)
---

1. propose changes for a single fill or all new files by adding them to the **Index**
```
git add <filename>  
git add *
```
2. commit _all_ changes to the **object database**, make sure to add a meaningful comment to _every_ commit
```
git commit -m "Commit message"
```
4. changes are now in the **object database** of your local working copy and the **HEAD** points to your most recent commit. To send them to the remote repository
```
git push origin master
```

HINT: use ```git status``` to see which files are already staged in the **Index**.

[top](#top)

<a name='branches'></a>Branches (checkout, branch)
---

* Branches are used to work on parts of the project isolated from the rest of the project.
* This can be useful to keep the _new_ things that are probably not working away from the working main part of the body and still retain version control for them.
* Once the part of the project is developed enough and working, its separate branch can be merged back into the main project branch (the ```master``` branch).
* The ```master``` branch is the "default" and only branch when you create a repository.  

![branching](http://rogerdudler.github.io/git-guide/img/branches.png)

1. create a new branch named feature_x and switch to it
```
git checkout -b feature_x
```
2. switch back to the master branch
```
git checkout master
```
3. delete the branch again
```
git branch -d feature_x
```
4. OR push the branch to the remote repository so it is available to others
```
git push origin feature_x
```
5. OR merge the branch to your active branch (e.g. master)
```
git merge feature_x
```

HINT: ```git status``` also shows you, which branch is currently your _active_ branch

[top](#top)

<a name='exercise01'></a>Exercise 01: Git
===

**A word on exercises**  
Our work as data-scientists and programmers involves a lot of searching for existing solutions online. Not knowing the command/keyboard shortcut/function by heart is not a shame at all - just look it up. There will be somebody that had exactly the same question in the past and got an answer for it - and we all look up stuff all the time! Therefore, if you are stuck
1. google your problem (stackoverflow is your friend here!), if you can't find an answer within 2-3 min proceed to
2. ask your neighbor, if she already is annoyed with you proceed to
3. ask the teachers

** No PC?**  
If you have trouble with your python setup on your PC or you did not bring a PC: You can use JupyterHub, a remote server we set up for you that has all the necessary libraries installed already. To work on the JupyterHub:
* go to https://134.76.24.148
* log in using your GWDG user credentials (if you do NOT have GWDG user credentials, approach us!)
* create a new jupyter-notebook server
* you are now at your very own jupyter-notebook dashbord
* by clicking new -> python 3 (upper right corner) you can create a new jupyter-notebook
* by clicking new -> terminal you can open a terminal to run git in


1. **Your own repository**
  1. clone the remote repository we use for the course to create a _local_ repository
  2. create your own _remote_ repository at https://gitlab.gwdg.de/users/sign_in using your GWDG user credentials
  3. add the remote adress of your own _remote_ repository to your _local_ repository  
  HINT: ```git remote add <name> <server>```   
2. **A branch a day...**
  1. create a new branch with the name of this exercise
  2. make sure the new branch is your active branch
3. **Basic workflow**
  1. create a new file in the new branch
  2. add the new file to the index
  3. commit the new file with a meaningful commit message
  4. merge the branch back to ```master```

[top](#top)

<a name='jupyter'></a>Jupyter
===

<a name='jupyter intro'></a>Jupyter introduction
---

##### What is jupyter?

Jupyter is

* an editor where we can write and structure text to describe what we do
* an interpreter where we can write code, execute it and display graphics embedded in the document

We will use Jupyter for the rest of the course to
1. Teach you new concepts by presenting a jupyter-notebook file with code snippets in a short talk.
2. Give you exercises for every new concept to play around with (coding is learning by doing first and foremost!).
3. Give you access to the teaching-notebook so you can extend existing code snippets directly.

Different functionalities require different _cells_:  
* markdown cells for text
* code cells for code 

*Markdown* is a simple way to structure text (make it look nicer) using a couple of symbols

**this** is a _markdown_ cell

In [1]:
#this is a code cell
3 * 10

30

##### The notebook document itself

* Notebooks are HTML code.
* Files are named .ipynb and reside in the _dashboard_ (click on the 'jupyter' icon in the upper left corner to get to the dashboard).
* At the dashboard you can create new files and stop execution of running notebooks.
* Notebooks are shown/run in a browser like firefox, chromium or edge.
* The code is executed by a kernel (in our case python, but can be other languages too).

[top](#top)

<a name='editor'></a>Modal editor
---

Jupyter notebook has a modal user interface. This means that the keyboard does different things depending on which mode the Notebook is in. There are two modes: edit mode and command mode.

**Edit mode** is indicated by a green and **Command mode** by a grey cell border:

When a cell is in edit mode, you can type into the cell, like a normal text editor.  

When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press `c`, you will copy the current cell and paste it by pressing `v` - no modifier is needed.

<div class="alert alert-success" style="margin: 10px">
Enter edit mode by pressing `enter` or using the mouse to click *inside* a cell's editor area.
</div>

<div class="alert alert-success" style="margin: 10px">
Enter command mode by pressing `esc` or using the mouse to click *outside* a cell's editor area.
</div>

[top](#top)

<a name='navigation'></a>Navigation
---

**Mouse navigation:** All navigation and actions in the Notebook are available using the mouse through the menubar and toolbar, which are both above the main Notebook area.  

**Keyboard navigation:** In edit mode, most of the keyboard is dedicated to typing into the cell's editor. In command mode, the entire keyboard is available for shortcuts.

##### Keyboard shortcuts (edit mode):

In edit mode, keyboard shortcuts are similar to text editors like word, gedit etc. Examples:
* `ctrl-c` to copy
* `ctrl-v` to paste
* `tab` for text-completion.


##### Keyboard shortcuts (command mode):

Most important:
* `enter` enters edit mode
* `esc` enters command mode
* `h` calls the help menu
---
Basic navigation:
* `up/down` select cell above/below
* `shift-enter` execute cell and select below
* `alt-enter` execute cell and insert new cell below
---
Cell types:
* `y` to code
* `m` to markdown
---
Cell editing:
* `d-d` delete cell
* `c` copy cell
* `x` cut cell
* `v` paste cell
* `shift-up/down` to select multiple cells

[top](#top)

<a name='markdown'></a>Markdown
---

##### Text formatting

You can make text _italic_ or **bold** or `monospace`

---
# You
## can
### make
#### headings

---
Courtesy of MathJax, you can beautifully render mathematical expressions, both inline: 
$e^{i\pi} + 1 = 0$, and displayed:

$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$

##### Lists
Itemized list
* One
  - sublist
    - subsublist
* Two
  - sublist
* Three
  - sublist
    - subsublist
      - subsubsublist

---
Enumerated list
1. First
  1. sublist
  2. sublist
2. Second

##### Code
This is a code snippet (NOT EXECUTABLE!):    
    
```Python
def f(x):
    """a docstring"""
    return x**2
```
  

##### Tables
Time (s) | Audience Interest
---------|------------------
 0       | High
 1       | Medium
 3       | Food

##### Command line
Execute commands on the command line by prepending a '!' to the command.

In [1]:
!ls

01-jupyter-git-intro.ipynb  04-control-structures.ipynb
02-data-types.ipynb	    05-functions-and-libraries.ipynb
03-data-containers.ipynb    airports.dat


[top](#top)

<a name='exercise02'></a>Exercise 02: Jupyter
===

0. **Git**
  1. Create a new branch for the current day and switch to it. A good idea is to create a new branch every day for the day-to-day work and merge it back to master at the end of the day if everything works. Alternatively you can create a separate branch for every exercise but that is probably a bit tedious.
1. **The Dashboard**
  1. Create a new notebook.
  2. Give the notebook a meaningful name.
  3. Add the notebook to your working branch's index.
  4. Give the notebook a structure for the exercises using new cells and headings.
2. **Keyboard shortcuts** 
  1. Learn 3-4 useful keyboard shortcuts and use them whenever possible.
  2. Make a table with your favourite keyboard shortcuts.
  3. Find three more useful shortcuts for jupyter notebook in the help menu/documentation.
3. **Markdown & Code**
  1. Practice text formatting using
  2. lists and
  3. LaTeX. 
  4. Make a code cell, and type in some basic calculations.
  5. Execute the code cell (```shift-enter```) to see the results of the calculations.
4. **(Optional) Command line**
  1. Experiment with command line input.
  2. Find out how to display an image in jupyter notebook using the command line.
5. **Git**
  1. Commit the notebook to the working branch (commit message!).
  2. (Optional) Make a change in the notebook (for example 'accidentally' deleting it) and undo the change using git.

[top](#top)