# Git: Make version control great again!  

## Sources for this tutorial
This tutorial is adapted from the [Version control for fun and profit](https://github.com/jakevdp/git-intro/blob/master/git-intro.ipynb) by Jake Vanderplas. 

Some of the images used below are copied from [A successful Git branching model](http://nvie.com/posts/a-successful-git-branching-model/) by Vincent Driessen.

## Git vs GitHub
Before going into how Git and GitHub work we should first make a distinction between them:

**Git** - Version control softwate that tracks and maganges changes in source code and other text based files.

**GitHub** - Hosting server for Git repositories, designed to enable collaborations between code developers.

For the first part of this tutorial we will focus on Git only before moving our repository to GitHub and exploring the edvantages this gives us.

## Setting up Git
Before we start using Git we need to set it up first. The only two options which are required is your name and email address which are used to identify your commits locally. These should be the same as your GitHub credentials if you intend to use it later. 

In [1]:
%%bash
git config --global user.name "Szymon Prajs"
git config --global user.email "S.Prajs@soton.ac.uk"

Something that isn't necessary but will make your life **a lot** easier is changing your default text editor. Git uses `Vi` as default and unless you were born in the 70s you probably either never heard of it or at least don't know how to close it without killing your terminal window. I use `Atom` which is a fantastic text editor currently taking the world by storm. 

It is written by the same developers as Git and GitHub and its main advantage is that uses `.git` files as its project managment files, allowing for code autocompletions and much more with absolutely no extra setup. 

This is again just a single command:

In [2]:
%%bash
git config --global core.editor "atom --wait"

Last, and very optional, setting is to enable Git to colour the output. This is particularly useful for large commits and logs.

In [3]:
%%bash
git config --global color.ui "auto"

## Sample project

We can start by create a simple file to represent our project. From now on I will do most of the code changes in `Atom` and use this notebook as a terminal when using Git.

In [4]:
%%bash
cd ~/Projects
rm -r git-test; mkdir git-test; cd git-test

echo "Hello World" > first_file.txt
ls -a

.
..
first_file.txt


## `git init`

Now that we have an exciting project we can tell Git to start looking after it. It's as easy as:

In [5]:
%%bash
cd ~/Projects/git-test

git init

Initialized empty Git repository in /Users/szymon/Projects/git-test/.git/


Looking at the project directory you can see that a new folder has been added. It is **crucial** that you never delete this folder or you will loose all your version control history. Deleting it doesn't delete any of your code but if you do not use GitHub you loose all your previous code versions and all but your currect branch.

In [6]:
%%bash
cd ~/Projects/git-test

ls -a

.
..
.git
first_file.txt


We can now check what git sees in our project folder

In [7]:
%%bash
cd ~/Projects/git-test
 
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	first_file.txt

nothing added to commit but untracked files present (use "git add" to track)


## `git add`

Git does not track any files in your project folder automatically. You have to specify which files you want it to follow. When possible it is always a good idea to add the files individually to avoid addinging any temporary files by accident. 

In [8]:
%%bash
cd ~/Projects/git-test

git add first_file.txt

However, you can force Git to add follow all files in your directory using the `--all` flag

In [9]:
%%bash
cd ~/Projects/git-test

git add --all

Now if we check the `status` of our repository again we see that it looks different. 

In [10]:
%%bash
cd ~/Projects/git-test

git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   first_file.txt



## `git commit`

Once you have added all the code you can make a permanant record of your changes by running the `commit` command. Each commit is tagged with an internal key and a comment that is set by the user. Once you commit code you will be prompted to describe the changes. A very useful shortcut is to use the `-m` flag to record the commit with a message.  

In [11]:
%%bash
cd ~/Projects/git-test

git commit -m "My first commit"

[master (root-commit) 95587d1] My first commit
 1 file changed, 1 insertion(+)
 create mode 100644 first_file.txt


You can check your commit history using the `log` command.

In [12]:
%%bash
cd ~/Projects/git-test

git log

commit 95587d1a2870ea1e6deffba1ace7d90a521deb75
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:22:20 2016 +0000

    My first commit


## `git diff`

`diff` is a very useful command showing you the lines in which changes have been made since the last commit. Again if you're using `Atom` then your text editor will continuesly show you this by highlighting the edges of changed lines. Let's add a few more things for fun and check what we've done. 

In [14]:
%%bash
cd ~/Projects/git-test

git diff

diff --git a/first_file.txt b/first_file.txt
index 557db03..0df75d8 100644
--- a/first_file.txt
+++ b/first_file.txt
@@ -1 +1,3 @@
 Hello World
+
+STUPID line here


Now just add and commit the changes

In [15]:
%%bash
cd ~/Projects/git-test

git add --all
git commit -m "Added more meaningful lines"

[master 60346fe] Added more meaningful lines
 1 file changed, 2 insertions(+)


## `git mv` and `git rm`

When moving files tracked by Git you should use `git rm` and `git mv` instead of the build in bash command. Git can normally figure out that files have been renamed if you `git add` the renamed file but it often struggles to realise that you have removed a file.   

In [16]:
%%bash
cd ~/Projects/git-test

git mv first_file.txt README.md

We should now `add` and `commit` this change. 

In [18]:
%%bash
cd ~/Projects/git-test

git add --all
git commit -m "Create a README file"

On branch master
nothing to commit, working directory clean


## GitHub

While using Git by itself is a great productivity tool in its own right, it only becomes truly great when combined with remote repositories. GitHub is the "official" and most popular choice, although there are other options and it's also possible to set up a costum server. Some universities and most large companies use it in that way. 

GitHub provides you with a backup for your code and code history, easy way to publish it, track progress and a great way of collabatong with others on your projects. In recent years it has been the number one choice for both scientists and opensource developers. 

Once you register on GitHub and go to your profile you will find an empty list of repositories in the top-right corner. You should create a new repository, give it a name and a brief description and initialise as an empty project for now. GitHub will then give you a very useful reminder of some of the first fuctions you will need to run to sync your local and online repositories. Since we already have a project ready we will use the following. 

In [19]:
%%bash
cd ~/Projects/git-test

git remote add origin git@github.com:SzymonPrajs/git-test.git

I have an SSH key set up for my GitHub account hence I don't need to use my log in credentials but if this is the forst time you are using GitHub you should use the HTTPS upload option for this example. I would strongly suggest that you use the SSH approach too as it makes your like a lot easier in the long run!

When using HTTPS you need to change the address of your repository slightly to
```bash
git remote add origin https://github.com/SzymonPrajs/git-test.git
```

This tells Git that there is an remote repository which will be refered to as `origin` at the address `bla`. You can add several repositories for the same project. For example you could have a remote to someone else's version of the code allowing you to sync the projects between you and your collaborators. 

## `git push`

Finally we can now `push` our project to the remote.

In [20]:
%%bash
cd ~/Projects/git-test

git push origin master

To git@github.com:SzymonPrajs/git-test.git
 * [new branch]      master -> master


## `git pull`

If you're syncing your code between multiple computers or working in a collaboration you will probably want to be able to update the code from GitHub to your machine too. For this you can use the `pull` command. Technically underneath there are two command that get executed, first `fetch` which download the latest code and then `merge` which joins it into your project. If there are no conflicts in the project `pull` will work just fine and is all you need to do but later on we will discuss cases where you might need to do a bit more work when using `pull`.

Let's make some changes into the repository on the GitHub page and pull them back to our computers.

In [21]:
%%bash
cd ~/Projects/git-test

git pull origin master

Updating 99462ac..25a2047
Fast-forward
 README.md | 2 ++
 1 file changed, 2 insertions(+)


From github.com:SzymonPrajs/git-test
 * branch            master     -> FETCH_HEAD
   99462ac..25a2047  master     -> origin/master


## Branching in Git

So far we covered the basics of Git, now we can move on to what truly makes Git worth your time: branching. Let's add another file into the project first.

Regradless of the amount of coding you do day-to-day, you almost certainly came across a situation when you needed to write something that you are not sure if it will work straight away but know that at any point you might get an email from your supervisor/collaborator asking for *that* exact plot that you can't make right now because your code is broken into bits and nothing is compiling. The usual solution for this might be having code backups, so you end up in the situation where your folders look something like this:

```bash
code
code_old
code_for_mark
code_backup
code_what_even_is_this
```

Branches remove this problem in a very neat solution. Let's first check what out default branch is.

In [22]:
%%bash
cd ~/Projects/git-test

git branch

* master


Let's add a new branch and call it `experiment`

In [23]:
%%bash
cd ~/Projects/git-test

git branch experiment

Running `branch` again we now have a new workspace

In [24]:
%%bash
cd ~/Projects/git-test

git branch

  experiment
* master


We can switch to any branch using the `checkout` command

In [25]:
%%bash
cd ~/Projects/git-test

git checkout experiment
git branch

* experiment
  master


Switched to branch 'experiment'


Again, we can mess around with the project a bit, add a few more lines to the code. We can now add the changes to the new branch without affecting the our previous work.  

In [26]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Add some experimental changes"

[experiment 372e0e4] Add some experimental changes
 1 file changed, 1 insertion(+), 1 deletion(-)


Going back to the master branch we will find that there out last commit made no affect on it.

In [27]:
%%bash
cd ~/Projects/git-test

git checkout master

Switched to branch 'master'


To check the differences between our branches we can run the `diff` command. The branches here are separated by two dots.

In [28]:
%%bash
cd ~/Projects/git-test

git diff master..experiment

diff --git a/README.md b/README.md
index cbe13e0..adb369a 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-Hello World
+Hello UK
 
 STUPID line here
 


## `git merge`

Once your code is ready you may want to push it into the master branch. In most cases this is pretty straight forward. You have to switch to the branch you want to end up with. In our case `master` and the do the following

In [29]:
%%bash
cd ~/Projects/git-test

git merge experiment

Updating 25a2047..372e0e4
Fast-forward
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Both of our branches still exist and are now synced with each other. We can now push the master branch to GitHub and make the latest version of the code live

In [30]:
%%bash
cd ~/Projects/git-test

git push origin master

To git@github.com:SzymonPrajs/git-test.git
   25a2047..372e0e4  master -> master


## Solving conflicts

Probably the most common issue people encounter with git is merging branches that are out of sync with each other and all the mayhem that follows that. Let's try an to make a mess of our code. First make some changes on the master branch

In [31]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Git is not going to like this"

[master b97b3e8] Git is not going to like this
 1 file changed, 1 insertion(+), 1 deletion(-)


And then switch to the experiment branch and make some changes there too.

In [32]:
%%bash
cd ~/Projects/git-test

git checkout experiment

Switched to branch 'experiment'


In [33]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "What will happen"

[experiment ae97ae3] What will happen
 1 file changed, 1 insertion(+), 1 deletion(-)


In [34]:
%%bash
cd ~/Projects/git-test

git diff master..experiment

diff --git a/README.md b/README.md
index 8166735..f14a2bc 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 Hello UK
 
-VERY STUPID line here
+CRAZY line here
 
 Line added from the web


And now we can try to merge the code

In [35]:
%%bash
cd ~/Projects/git-test

git checkout master

git merge experiment

Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.


Switched to branch 'master'


In [36]:
%%bash
cd ~/Projects/git-test

cat README.md

Hello UK

<<<<<<< HEAD
VERY STUPID line here
CRAZY line here
>>>>>>> experiment

Line added from the web


In [37]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Fixed conflicts"

[master 9524f47] Fixed conflicts


## `git log`

Git logs are very useful and by default give us a lot of information.

In [38]:
%%bash
cd ~/Projects/git-test

git log

commit 9524f473e154fc5d94cff6e0aba7f42c9ac78ce7
Merge: b97b3e8 ae97ae3
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:33:49 2016 +0000

    Fixed conflicts

commit ae97ae34449d7d9607c40948ac3adecc770f3dd1
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:32:51 2016 +0000

    What will happen

commit b97b3e830614d26a719e9d1fdb33ba2fa1e12421
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:32:41 2016 +0000

    Git is not going to like this

commit 372e0e46ad680a0dd79fa33cad4e19efd5bd8268
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:30:44 2016 +0000

    Add some experimental changes

commit 25a2047e4f2564113fd98ef00df147985102a151
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:26:43 2016 +0000

    Update README.md

commit 99462aca69b8a9b69551f2c8bbfa0ded01dadee3
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 02:24:31 2016 +0000

    Create a README file

commit 60346fe88fe1481379a83c9

This can be very useful but it is pretty difficult to read and does not tell us anything about the branching in our project. This is where the `--oneline --topo-order --graph` flags come in very handy.

In [39]:
%%bash
cd ~/Projects/git-test

git log --oneline --topo-order --graph

*   9524f47 Fixed conflicts
|\  
| * ae97ae3 What will happen
* | b97b3e8 Git is not going to like this
|/  
* 372e0e4 Add some experimental changes
* 25a2047 Update README.md
* 99462ac Create a README file
* 60346fe Added more meaningful lines
* 95587d1 My first commit


We can make an allias to this setup using the following

In [40]:
%%bash
cd ~/Projects/git-test

git config --global alias.slog "log --oneline --topo-order --graph"

git slog

*   9524f47 Fixed conflicts
|\  
| * ae97ae3 What will happen
* | b97b3e8 Git is not going to like this
|/  
* 372e0e4 Add some experimental changes
* 25a2047 Update README.md
* 99462ac Create a README file
* 60346fe Added more meaningful lines
* 95587d1 My first commit


## .gitignore

The last thing which I believe is essential for anyone using Git is `.gitignore`, essentially a file that tells git which files it should not be traching and suggesting for `git add` allowing you to use `git add --all` without adding unwanted files such as data data, outputs etc.

In [41]:
%%bash
cd ~/Projects/git-test

touch test.dat test.out
ls

README.md
test.dat
test.out


In [42]:
%%bash
cd ~/Projects/git-test

git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	test.dat
	test.out

nothing added to commit but untracked files present (use "git add" to track)


In [44]:
%%bash
cd ~/Projects/git-test

git add .gitignore
git commit -m "Added .gitignore"

[master 7f8121c] Added .gitignore
 1 file changed, 2 insertions(+)
 create mode 100644 .gitignore


In [45]:
%%bash
cd ~/Projects/git-test

git status

On branch master
nothing to commit, working directory clean
