# Introduction to Programming
# Version control, Git and Github

_Hugo Lhuillier_ -- _Master in Economics, Sciences Po_

Give an example:

1. imagine work on some model; then you discover a new technique to solve it, or change a deep thing in the model $\Rightarrow$ rewrite the model entirely. here:
    - either you really bald and do not create another file with the new technique
    - or you create another file. if were to do that infinitely, run out of memory 
1. a month later, Florian jumps in the project, and we are both working simultaneously on the project. at some point, we are both working (remotely) on the same plot. now if we are working on Dropbox for instance, if we are not careful, one version will overwrite the other, and not necessarily the better one, just the last one to be saved. 

# Version control: what & why?

* Version control: a very clever way to manage changes made to some documents
    - nothing that is committed to version control is ever lost
    - automatically notifies users when there’s a conflict between two different codes

* How does it work?
    - starts with a base version of the document, and save only the changes made to that document
    - the key: changes are separated from the document itself. Allows you to
        - make independent sets of changes based on the same base document
        - merge two sets of changes onto the same base document

<table><tr><td><img src="https://swcarpentry.github.io/git-novice/fig/versions.svg"></td><td><img src="https://swcarpentry.github.io/git-novice/fig/merge.svg"></td></tr></table>

# Version control: some vocab'

- Commit (v.): to record the current state of a set of files (the current changes) in a repository
- Repository (n.): a storage area where a version control system stores the full history of commits of a project and some metadata (e.g. who changed what, when...) 
- Conflict (n.): a change made by one user that is incompatible with changes made by other users
- Merge (v.): to reconcile two sets of changes to a repository

# Github and the open source culture

* **Proprietary software**: "software that legally remains the property of the organisation, group, or individual who created it. The organisation that owns the rights to the product usually does not release the source code, and may insist that only those who have purchased a special licence key can use it."
* **Free software**: "licensed at no cost, or for an optional fee. It is usually closed source."
* **Open source software**: "free and openly available software to everyone. People who create open source products publish the code and allow others to use and modify it."
* If interested, see [this documentary](https://www.youtube.com/watch?v=vjMZssWMweA)

# The open source culture in science

* What science used to be
    - data are stored on local machines
    - codes are never published 
    - the final paper is published in a journal, therefore owned by the latter
* What "open" science looks like
    - data is stored in an open access repository (e.g. [figshare](https://figshare.com/), [zenodo](https://zenodo.org/), [dryad](https://datadryad.org/))
    - the code is accessible on Github
    - the paper is initially published to [arXiv](https://arxiv.org/) 
* From an individual pov, choosing the open science paradigm is optimal only if everybody else's choosing it $\Rightarrow$ **cooperarion** 

# Disclaimer

* The first part of the course is drawn from the course prepared by [sofrware carpentry](https://swcarpentry.github.io/git-novice/)

# Creating a repository 

- Create a directory 
- Tell `Git` that this directory is a repository 
- **Ex**: 

```bash 
cd; cd ./Dropbox/Teaching/IntroProg/2017-2018/3-github
mkdir planets
cd planets
git init
```

In [4]:
; ls -a

.
..
.git


In [5]:
; git status 

On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)


# Tracking changes 

* How to track changes made to a file?

```bash
nano mars.txt
```
> Cold and dry, but everything is my favorite color

In [6]:
; git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	mars.txt

nothing added to commit but untracked files present (use "git add" to track)


Git knows that it’s supposed to keep track of mars.txt, but it hasn’t recorded these changes as a commit yet

* To tell Git to record those changes in the repository, we need to
    1. add these changes 
    1. commit them

In [10]:
; git add mars.txt

In [11]:
; git commit -m "start notes on Mars as a base"

[master (root-commit) 282cfa9] start notes on Mars as a base
 1 file changed, 1 insertion(+)
 create mode 100644 mars.txt


With this command, Git stored a copy permanently inside the special .git directory. 282cfa9 is the short identifier of the commit.

Good commit messages start with a brief summary of changes made in the commit. If you want to go into more detail, add a blank line between the summary line and your additional notes.

* To know what has been done recently, ask Git to show the project's history via `git log`

In [12]:
; git log

commit 282cfa9d1c0e8eefab68ea573f4e6798a4e98f73
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Mon Jan 15 20:11:51 2018 +0100

    start notes on Mars as a base


git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.

# Tracking changes

* Let's add to `mars.txt`
> The two moons may be a problem for Wolfman

In [13]:
; git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")


“no changes added to commit”. We have changed this file, but we haven’t told Git we will want to save those changes

In [14]:
; git diff

diff --git a/mars.txt b/mars.txt
index df0654a..315bf3a 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,2 @@
 Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman


This is a series of commands for tools like editors and patch so that they can reconstruct one file given the other. If we break it down into pieces:

1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
1. The second line tells exactly which versions of the file Git is comparing; df0654a and 315bf3a are unique computer-generated labels for those versions.
1. The third and fourth lines once again show the name of the file being changed.
1. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

* Let's add and commit:
```bash
git add mars.txt
git commit -m "add concerns about effects of Mars' moons on Wolfram"
```

# Tracking changes

* You can add several changes before committing. In particular, not forced to commit all the changes made to the repository
* Advantages: Git tracks out changes in stages rather than in one big batch, therefore easier to recover some piece of code

* **Ex**: 
    1. add "But the Mummy will appreciate the lack of humidity" to `mars.txt`
    2. create a new file called eart.txt
    3. commit only the changes made to `mars.txt`

In [3]:
; cat mars.txt

Cold and dry, but everything is my favorite color
The two moons may be a problem for Wolfman
But the Mummy will appreciate the lack of humidity


In [5]:
; cat earth.txt

Is the Earth flat?


In [3]:
; git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mars.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	earth.txt

no changes added to commit (use "git add" and/or "git commit -a")


In [4]:
; git add mars.txt

In [6]:
; git diff --staged

diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
 Cold and dry, but everything is my favorite color
 The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity


In [7]:
; git commit -m "discuss concerns about mars' climate for mummy"

[master 089ad21] discuss concerns about mars' climate for mummy
 1 file changed, 1 insertion(+)


In [10]:
; git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	earth.txt

nothing added to commit but untracked files present (use "git add" to track)


# Tracking changes

<img src="https://swcarpentry.github.io/git-novice/fig/git-committing.svg">

# Tracking changes 

* If the output of `git log` is too long, `git` uses a program to split it into pages
    - move to the next page via the space bar 
    - to search for `x` in all pages, type `/x` and navigate through matches by pressing `n`
    - exit by running `q`
* You can limit the number of commits you want `git log` to output via `git log -N`, where `N` is the number of commits 
* You can also display the changes made by each commit (similar to `git show`) via `git log --patch <name-of-the-file>`

In [12]:
; git log -1

commit 089ad21f070b5420227111a0845cf91159f7887f
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 10:01:11 2018 +0100

    discuss concerns about mars' climate for mummy


# Tracking changes

* **Important**: `Git` tracks only files, not directories per se
* However, if you have a sub-directory and want to commit the changes made in all the files in this directory, you can run 
```bash 
git add <name-of-that-sub-directory>
```

# Exploring history 

* Every commit has its own identifiers
* The most recent commit is called `HEAD`

In [16]:
; git show HEAD mars.txt

commit 089ad21f070b5420227111a0845cf91159f7887f
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 10:01:11 2018 +0100

    discuss concerns about mars' climate for mummy

diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
 Cold and dry, but everything is my favorite color
 The two moons may be a problem for Wolfman
+But the Mummy will appreciate the lack of humidity


In [17]:
; git show HEAD~1 mars.txt

commit 8ed65cf6864dd9eba43f9944e9844246ad87816c
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 09:59:21 2018 +0100

    add concerns about effects of Mars' moons on Wolfram

diff --git a/mars.txt b/mars.txt
index df0654a..315bf3a 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,2 @@
 Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman


Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m./deprecated.jl:70[22m[22m
 [2] [1mwarn_shell_special[22m[22m[1m([22m[22m::String[1m)[22m[22m at [1m./shell.jl:8[22m[22m
 [3] [1m#shell_parse#236[22m[22m[1m([22m[22m::String, ::Function, ::String, ::Bool[1m)[22m[22m at [1m./shell.jl:103[22m[22m
 [4] [1m(::Base.#kw##shell_parse)[22m[22m[1m([22m[22m::Array{Any,1}, ::Base.#shell_parse, ::String, ::Bool[1m)[22m[22m at [1m./<missing>:0[22m[22m (repeats 2 times)
 [5] [1m@cmd[22m[22m[1m([22m[22m::ANY[1m)[22m[22m at [1m./process.jl:796[22m[22m
 [6] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m./loading.jl:522[22m[22m
 [7] [1minclude_string[22m[22m[1m([22m[22m::Module, ::String, ::String[1m)[22m[22m at [1m/Users/hugolhuillier/.julia/v0.6/Compat/src/Compat.jl:174[22m[22m
 [8] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[

# Exploring history 

* Instead of `HEAD`, can also refer to commits via their actual identifiers
* Only the first few characters are necessary

In [19]:
; git log -2

commit 089ad21f070b5420227111a0845cf91159f7887f
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 10:01:11 2018 +0100

    discuss concerns about mars' climate for mummy

commit 8ed65cf6864dd9eba43f9944e9844246ad87816c
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 09:59:21 2018 +0100

    add concerns about effects of Mars' moons on Wolfram


In [24]:
; git show 8ed65c mars.txt

commit 8ed65cf6864dd9eba43f9944e9844246ad87816c
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 09:59:21 2018 +0100

    add concerns about effects of Mars' moons on Wolfram

diff --git a/mars.txt b/mars.txt
index df0654a..315bf3a 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,2 @@
 Cold and dry, but everything is my favorite color
+The two moons may be a problem for Wolfman


# Recovering files 

* Suppose that we overwrote `mars.txt` and would like to come back to the previous version

In [25]:
; cat mars.txt

We will need to manufacture our own oxygen


In [26]:
; git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")


In [27]:
; git checkout HEAD mars.txt

In [28]:
; cat mars.txt

Cold and dry, but everything is my favorite color
The two moons may be a problem for Wolfman
But the Mummy will appreciate the lack of humidity


git checkout checks out (i.e., restores) an old version of a file. In this case, we’re telling Git that we want to recover the version of the file recorded in HEAD, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead.

# Recovering files

* If do not specify `mars.txt`, all the files in the directory will be reverted to their previous state
* Here, used 
```bash
git checkout HEAD mars.txt
``` 
because the (wrong) changes we made weren't commited 
* If had been commited, would have used 
```bash
git checkout HEAD~1 mars.txt
``` 

# Recovering files 

* Especially when using version control, try to organize your work in a clever way
* In particular, try to keep the documents' sizes relatively small
    - Easier to recover past work when dealing with several small files rather than one big files  

# Ignoring things

* It is likely that there are files in your directory that you want Git to ignore 
* For that, create a file called `.gitignore` and that lists all these files to be ignored

- **Ex:** 
    1. create blank files `a.dat` and `b.dat`
    1. create a new directory called results, and in it, blank files, `a.out` and `b.out`
    1. tell Git to ignore these four files

```bash
mkdir results
touch a.dat b.dat results/a.out results/b.out
nano .gitignore
```

In [63]:
; cat .gitignore

*.dat
results/


* Do not forget to commit this file!
```bash
git add .gitignore
git commit -m "add .gitignore"
```

# Ignoring things

* If try to `add` some files listed in the `.gitignore`, `Git` will not let that happen
* If want to override `.gitignore`, use the flag `-f`. Ex:
```bash
git add -f a.dat
```

# Git & Github

* Instead of storing all the info on our laptop, can store it on `Github`
    1. create a new repository
    1. need to connect this repo with the one we have on our local machine = making the GitHub repository a remote for the local repository
    ```bash 
    git remote add origin https://github.com/HugoLhuillier/planets.git
    ```
* `origin` is just a name we give to the remote repository
* remote (n.): a version control repository connected to another, in such way that both can be kept in sync exchanging commits

In [66]:
; git remote -v

origin	https://github.com/HugoLhuillier/planets.git (fetch)
origin	https://github.com/HugoLhuillier/planets.git (push)


In [67]:
; git log

commit ed3b2a379e6076251cb133dd016caab43073d407
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 11:27:42 2018 +0100

    add .gitignore

commit 089ad21f070b5420227111a0845cf91159f7887f
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 10:01:11 2018 +0100

    discuss concerns about mars' climate for mummy

commit 8ed65cf6864dd9eba43f9944e9844246ad87816c
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 09:59:21 2018 +0100

    add concerns about effects of Mars' moons on Wolfram

commit 86360f57243f9a961e6fa46592b47dd2c188f424
Author: Hugo Lhuillier <hugo.lhu@gmail.com>
Date:   Tue Jan 16 09:58:45 2018 +0100

    start notes on Mars as a base


In [68]:
; git push origin master

To https://github.com/HugoLhuillier/planets.git
 * [new branch]      master -> master


Therefore here
1. `push` to the remote called `origin`
1. `push` to the branch call `master`

<img src="https://swcarpentry.github.io/git-novice/fig/github-repo-after-first-push.svg">

# Git & Github

* If some changes are made on the remote repository (e.g. by you directly on Github, or by one of your collaborator), can pull these changes to your local repository via `git pull` 

In [70]:
; git pull origin master

Already up-to-date.


From https://github.com/HugoLhuillier/planets
 * branch            master     -> FETCH_HEAD


* Create a `README.md` file on Github

In [73]:
; git pull origin master

Updating ed3b2a3..4db0bc9
Fast-forward
 README.md | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 README.md


From https://github.com/HugoLhuillier/planets
 * branch            master     -> FETCH_HEAD
   ed3b2a3..4db0bc9  master     -> origin/master


In [75]:
; ls -F

README.md
a.dat
b.dat
mars.txt
results/


# Conflicts

* Particularly salient when working with other people 
* **Exercice:**
    1. Go on your Githuh repo, and modify the content of `README.md` to 
        > this is a test repo
    1. Without pulling these changes, modify the `README.md` on your local repository to 
        > a test repo
    1. Commit these changes (locally)
    1. Try to push these changes

In [4]:
; cat README.md

# planets
a test repo


In [5]:
; git add README.md

In [6]:
; git commit -m "update README.md"

[master af26722] update README.md
 1 file changed, 1 insertion(+), 1 deletion(-)


In [8]:
; git push origin master

To https://github.com/HugoLhuillier/planets.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'https://github.com/HugoLhuillier/planets.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.


# Conflicts

* So far, not a conflict per se, simply that we haven't updated our local repository

In [9]:
; git pull origin master

From https://github.com/HugoLhuillier/planets
 * branch            master     -> FETCH_HEAD
   4db0bc9..a03e822  master     -> origin/master


Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.


* Now we have a conflict (the changes made in one copy of the repo overlap with those made in the other copy) 
<img src="https://swcarpentry.github.io/git-novice/fig/conflict.svg">
* We have to resolve it manually (with the help of Git)

In [10]:
; cat README.md

# planets
<<<<<<< HEAD
a test repo
this is a test repo 
>>>>>>> a03e8222bef054d4a69d8dd171ece8d7a7bdca89


* Our change on the local repo is preceded by `<<<<<<< HEAD`
* `=======` separates the conflicting changes
* `>>>>>>>` marks the end of the content download from Github (with the commit associated)
* Up to you to remove these markers and resolve the conflict

# Conflicts

* To resolve any conflict
    1. modify the file(s) affected
    1. commit the changes
    1. push the new commit (= the merge)

In [17]:
; cat README.md

# planets
this is a test repo


In [12]:
; git add README.md

In [14]:
; git status

On branch master
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

	modified:   README.md



In [15]:
; git commit -m "merge README.md"

[master 3c40fe9] merge README.md


In [16]:
; git push origin master

To https://github.com/HugoLhuillier/planets.git
   a03e822..3c40fe9  master -> master


# Conflicts

* Resolving conflicts is a pain _#UN_
* To minimize them
    * pull from upstream more frequently
    * try as much as possible to break large files into smaller ones
    * make smaller commits
* And also, when working in groups
    * clarify which collaborator is responsible for what part of the code
    * establish a project convention to avoid stylistic conflict (e.g. **tabs vs. spaces**)

# Branches

* Branches allow you to work on some part of the code without affecting the main line of development (the main branch)
* Branches are very handy for instance when want to introduce new features, but not certain if these are going to work
* To use a branch
    1. Create a branch
    1. Work on it as much as you want (commits, push, pull etc. work similarly)
    1. When you are done, you can merge this branch to the main branch (= master) - with potential conflicts


<img src="ex-branch.png">

# Branches

* Some basic command:
```bash 
git branch            # list all existing branches
git branch <name>     # create a branch named <name>
git branch -m <name>  # rename the branch we are on to <name>
git branch -d <name>  # delete the branch <name> IF merged
git branch -D <name>  # delete the branch <name> even if not merged
```
* `git checkout <name>` tells `Git` to go to the branch `name`

# Branches

* **Ex**: create a new branch to work on potential changes to `mars.txt`

In [26]:
; git branch

* master


In [27]:
; git branch mars-second-verse

In [28]:
; git branch

  mars-second-verse
* master


In [33]:
; git checkout mars-second-verse

M	mars.txt


Switched to branch 'mars-second-verse'


In [34]:
; cat mars.txt

Cold and dry, but everything is my favorite color
The two moons may be a problem for Wolfman
But the Mummy will appreciate the lack of humidity

I am bad at poetry
So bad I cannot write poetry
Oh - why I am so bad?


In [None]:
; git add mars.txt

In [None]:
; git commit -m "add second verse"

In [None]:
; git push origin mars-second-verse

In [31]:
; git checkout master

M	mars.txt


Switched to branch 'master'


In [35]:
; cat mars.txt

Cold and dry, but everything is my favorite color
The two moons may be a problem for Wolfman
But the Mummy will appreciate the lack of humidity


# Merging branches

* Two options
    1. _own work_: do a local merge via `git merge`
    ```bash
    git merge <branch>
    ```
    will add the changes made in <branch> to the branch we are currently on
    1. _collab_: do a pull request on `Github`


* With our previous example
```bash 
git checkout master 
git merge mars-second-verse
```

# Fork

* Forking: cloning a Github project and registering it under your Github username
* Can make changes without affecting the original project
* Can update your version with changes mades to the original project (= **upstream**)
* Can make changes upstream (= **a pull request**)


# Forking a repository

* Go to the repository of your choice, and fork it
* Create a local clone of your work
* Setup an upstream repository

# Forking IntroProg2018

* Fork IntroProg2018, navigate to your forked repository on your profile, and copy the address under "Clone or download"
* In your directory of your choice, make a local copy of your repo (a clone). `git clone` will automatically setup your forked repository as the _origin_ remote 
```bash 
git clone https://github.com/YOUR-USERNAME/IntroProg2018
```
* Configure Git to pull changes from the upstream repository (same command as before, but call the new remote _upstream_ instead of _origin_)
```bash 
git remote add upstream https://github.com/HugoLhuillier/IntroProg2018.git
```
* Check it worked by running
```bash
git remote -v
```
* To sync your local remote, 
```bash 
git pull upstream master
```

* **Careful**: potential conflicts ahead... Alternatively, 
```bash
git fetch upstream 
git checkout master 
git merge upstream/master
```

# Pull request

* Similar to pushing to your remote repository, except that your pushing to somebody else's
* Allows to discuss and review the changes made, and add follow-up commit before the changes are merged

* To do a pull request:
    * Push some changes to your remote personal repository
    * Go on its Github page, and click on pull request
    * Add comments etc. 

# Pull request & homework

* Will use pull request for homework
* Idea:
    1. do the homework on your computer
    1. push it to your remote repository
    1. pull request to my repository so that I can see what you've done