# Git, GitHub & Atom 

## Sources for this tutorial
This tutorial is adapted from the [Version control for fun and profit](https://github.com/jakevdp/git-intro/blob/master/git-intro.ipynb) by Jake Vanderplas. 

Some of the images used below are copied from [A successful Git branching model](http://nvie.com/posts/a-successful-git-branching-model/) by Vincent Driessen.

## Git vs GitHub
Before getting into how Git and GitHub work we should first make a distinction between them:

**Git** - Version control softwate that tracks and maganges changes in source code and other text based files.

**GitHub** - Hosting server for Git repositories, designed to enable collaborations between developers.

For the first part of this tutorial we will focus on Git only, before pushing our code to GitHub and exploring the advantages this gives us.

## Setting up Git
Before we start using Git we need to first set it up. The only two settings which are required is your name and email address. These are used to identify your commits locally but should be the same as your GitHub credentials if you intend to use it later. 

In [2]:
%%bash
git config --global user.name "Szymon Prajs"
git config --global user.email "S.Prajs@soton.ac.uk"

A setting that is not necessarily required but will make your life **a lot** easier is changing your default text editor. Git uses `Vi` by default so unless you were born in the 70s you probably never heard of it and almost certainly don't know how to use it. A great and modern replacement is `Atom` which is a fantastic text editor that works great with git. It is written by the same developers as Git and GitHub meaning that uses `.git` files as its project managment files, allowing for code autocompletions and much more with absolutely no extra setup. 

This is again just a single command

In [3]:
%%bash
git config --global core.editor "atom --wait"

Last, and very optional, setting is to enable Git to colour the output. This is particularly useful for large commits and logs.

In [4]:
%%bash
git config --global color.ui "auto"

## Sample project

We can start by create a simple file to represent our project. From now on I will do most of the code changes in `Atom` and use this notebook as a terminal when using Git but I will mentions what changes I am making if you want to follow this in your own time.

In [5]:
%%bash
cd ~/Projects
mkdir git-test; cd git-test

echo "Hello World" > first_file.txt
ls -a

.
..
first_file.txt


## `git init`

Now that we have an exciting project we can tell Git to start looking after it. This is done with an `init` function

In [6]:
%%bash
cd ~/Projects/git-test

git init

Initialized empty Git repository in /Users/szymon/Projects/git-test/.git/


Looking at the project directory you can see that a `.git` folder has been added. It is **crucial** that you never delete this folder or you will loose all your version history. Deleting it doesn't delete any of your code but if you do not use GitHub you would loose all your previous code versions and all but your currect branch.

In [7]:
%%bash
cd ~/Projects/git-test

ls -a

.
..
.git
first_file.txt


We can now use the `status` command to check what Git sees in our project folder

In [8]:
%%bash
cd ~/Projects/git-test
 
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	first_file.txt

nothing added to commit but untracked files present (use "git add" to track)


## `git add`

Git does not track any files in your project folder automatically. You have to specify which files you want it to follow. When possible it is always a good idea to add the files individually to avoid pushing any temporary files to your repository by accident. 

In [9]:
%%bash
cd ~/Projects/git-test

git add first_file.txt

However, you can force Git to add follow all files in your directory using the `--all` flag. Later on we will discuss a method of telling Git which files to ignore making `git add --all` much more usable. 

In [10]:
%%bash
cd ~/Projects/git-test

git add --all

Now if we check the `status` of our repository again we notice that files are now added and tracked. 

In [11]:
%%bash
cd ~/Projects/git-test

git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   first_file.txt



## `git commit`

Once you have added the code that you would like Git to preserve, you can make a kind of a snapshot of your project at this point using the `commit` command. Each commit is tagged with an internal key and a comment that is set by the user. Once you commit code you will be prompted to describe the changes. A very useful shortcut is to use the `-m` flag to make the commit with a message all in one line with no extra prompts.

It is very important for your workflow and core version records that you use meaningful commit messages that will be able to tell you what changes you have made since the previous commit in a few months or years. At the same time the length of the messages is capped at 50 characters so you have to be quite concise. 

In [12]:
%%bash
cd ~/Projects/git-test

git commit -m "My first commit"

[master (root-commit) af11ce3] My first commit
 1 file changed, 1 insertion(+)
 create mode 100644 first_file.txt


You can check your `commit` history using the `log` command.

In [13]:
%%bash
cd ~/Projects/git-test

git log

commit af11ce3b3325c7aa90ae651ce206e16990098dda
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 15:35:47 2016 +0000

    My first commit


## `git diff`

`diff` is a very useful command showing you the lines where changes have been made since the last commit. Again if you're using `Atom` your text editor will always show you which lines have been edited by highlighting the edges of changed lines. Let's add a few more lines to our `first_file.txt` document and check what we've done. 

In [14]:
%%bash
cd ~/Projects/git-test

git diff

diff --git a/first_file.txt b/first_file.txt
index 557db03..a641b98 100644
--- a/first_file.txt
+++ b/first_file.txt
@@ -1 +1,2 @@
 Hello World
+Added a new line for fun


Now just add and commit the changes

In [15]:
%%bash
cd ~/Projects/git-test

git add --all
git commit -m "Added more meaningful lines"

[master e3d89d7] Added more meaningful lines
 1 file changed, 1 insertion(+)


## `git mv` and `git rm`

When moving files tracked by Git you should use `git rm` and `git mv` instead of the build in bash command. Git can normally figure out that files have been renamed if you `git add` the renamed file but it often struggles to realise that you have removed files this can be a real pain to fix so try to always use the Git commands.   

In [16]:
%%bash
cd ~/Projects/git-test

git mv first_file.txt README.md

We can now `add` and `commit` this change. 

In [17]:
%%bash
cd ~/Projects/git-test

git add --all
git commit -m "Create a README file"

[master d449cb8] Create a README file
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename first_file.txt => README.md (100%)


## GitHub

While using Git by itself is a great productivity tool in its own right, it only becomes truly great when combined with remote repositories. GitHub is the "official" and most popular choice, although there are other options (GitLab, BitBucket etc.) and it's also possible to set up a custume server. Some universities and large companies use it in that way. 

GitHub provides you with a backup for your code and code history, easy way to publish it, track progress and a great way of collabating with others on your projects. In recent years it has been the number one choice for both scientists and opensource developers. 

Once you register on GitHub and go to your profile you will find an empty list of repositories in the top-right corner. You should create a new repository, give it a name and a brief description and initialise as an empty project for now. GitHub will then give you a very useful reminder of some of the most basic steps you will need to perform to sync your local and online repositories. Since we already have an existing project we can tell Git that we want to add a remote location, which we will call `origin`, at the below address. Your remote does not have to be called `origin` and can even have several remotes set up for your project.  

In [18]:
%%bash
cd ~/Projects/git-test

git remote add origin git@github.com:SzymonPrajs/git-test.git

I have an SSH key set up for my GitHub account hence I don't need to use my login credentials here but if this is the forst time you are using GitHub you should use the HTTPS upload option for this example. However, I would strongly suggest that you set up an SSH key as soon as possible as it will make your life a lot easier in the long run!

When using HTTPS you need to change the address of your remote repository slightly to
```bash
git remote add origin https://github.com/SzymonPrajs/git-test.git
```

## `git push`

Finally, we can `push` our project to the remote. We need to tell Git that we want to push into the remote repository called `origin` from branch `master`. We will discuss branches in more details later. 

In [20]:
%%bash
cd ~/Projects/git-test

git push origin master

To git@github.com:SzymonPrajs/git-test.git
 * [new branch]      master -> master


## `git pull`

If you're syncing your code between multiple computers or working in a collaboration you will probably want to update your local version of the code from GitHub to your machine too. For this you can use the `pull` command. Technically, under the bonnet there are two command that get executed, first `fetch` which download the latest code and then `merge` which joins it into your project. If there are no conflicts in the project `pull` will work just fine and is all you need to do but later on we will discuss cases where you might need to do a bit more work when using `pull`.

Let's make some changes into the repository using the online editing tools on GitHub and pull them back to our local repository

In [21]:
%%bash
cd ~/Projects/git-test

git pull origin master

Updating d449cb8..4f7fe4b
Fast-forward
 README.md | 5 +++++
 1 file changed, 5 insertions(+)


From github.com:SzymonPrajs/git-test
 * branch            master     -> FETCH_HEAD
   d449cb8..4f7fe4b  master     -> origin/master


## Branching in Git

So far we covered the basics of Git, now we can move on to what truly makes Git worth your time: branching. Regradless of the amount of coding you do day-to-day, you almost certainly came across a situation when you needed to write something that you are not sure will work straight away but know that at any point you may get an email from your supervisor/collaborator asking for just **that** exact plot that you can't make right now because your code is broken into bits and nothing is compiling. The usual solution for this might be to make backups of your code at every big milestone, so you end up in the situation where your folders looks something like this:

```bash
code
code_old
code_for_mark
code_backup
code_what_even_is_this
```

Branches remove this problem in a very neat solution. Let's first check what out default branch is

In [22]:
%%bash
cd ~/Projects/git-test

git branch

* master


By default Git creates a single `Master` branch. Let's add a new branch and call it `experiment`

In [23]:
%%bash
cd ~/Projects/git-test

git branch experiment

Running `branch` again shows the newly added branch

In [24]:
%%bash
cd ~/Projects/git-test

git branch

  experiment
* master


To make in the current working branch we can use the `checkout` command

In [32]:
%%bash
cd ~/Projects/git-test

git checkout experiment
git branch

* experiment
  master


Switched to branch 'experiment'


Let's add a few more lines of text to the `README.md` file. We can now add changes to the `experiment` branch without affecting the our `master` branch.  

In [33]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Add some experimental changes"

[experiment e3f5d03] Add some experimental changes
 1 file changed, 2 insertions(+)


Let's now go back to the master branch. We can see that our changes in the `experiment` branch had no affect on the current branch. 

In [34]:
%%bash
cd ~/Projects/git-test

git checkout master

Switched to branch 'master'


To check the differences between our branches we can run the `diff` command. The syntax for comparing branches is a bit different and uses two dots to separate the branch names.

In [35]:
%%bash
cd ~/Projects/git-test

git diff master..experiment

diff --git a/README.md b/README.md
index 5dc8599..556dd95 100644
--- a/README.md
+++ b/README.md
@@ -5,3 +5,5 @@ Added a new line for fun
 This is a STUPID line.
 
 This one is fine.
+
+Yet another line.


## `git merge`

Once your code is ready and stable you may want to push the changes into the master branch. In most cases this is pretty straight forward. You must rememeber to switch to the branch you want to end up with before merging. In our case that is `master`. You then do the following

In [36]:
%%bash
cd ~/Projects/git-test

git merge experiment

Updating 4f7fe4b..e3f5d03
Fast-forward
 README.md | 2 ++
 1 file changed, 2 insertions(+)


Both of our branches still exist and are now synced with each other. We can push the master branch to GitHub and make the latest version of the code live

In [37]:
%%bash
cd ~/Projects/git-test

git push origin master

To git@github.com:SzymonPrajs/git-test.git
   4f7fe4b..e3f5d03  master -> master


## Solving conflicts

Probably the most common issue people encounter when using Git is conflicts when merging branches that are out of sync with each other. Let's try an to make a bit of a mess in our code and modify the same line of code on both `master` and `experiment` branches. First make changes in the master branch and commit changes

In [38]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Git is not going to like this"

[master fe5a9c0] Git is not going to like this
 1 file changed, 1 insertion(+), 1 deletion(-)


Then switch to the experiment branch and make some changes there too in the same line as before

In [39]:
%%bash
cd ~/Projects/git-test

git checkout experiment

Switched to branch 'experiment'


In [40]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "What will happen now"

[experiment 38e3977] What will happen now
 1 file changed, 1 insertion(+), 1 deletion(-)


In [41]:
%%bash
cd ~/Projects/git-test

git diff master..experiment

diff --git a/README.md b/README.md
index 86f5dcb..23ee69d 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@ Hello World
 
 Added a new line for fun
 
-This is a VERY STUPID line.
+This is a CRAZY line.
 
 This one is fine.
 


And now we can try to merge the code

In [42]:
%%bash
cd ~/Projects/git-test

git checkout master

git merge experiment

Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.


Switched to branch 'master'


As expected this did not work so let's see what changes Git made in our `README.md` file

In [43]:
%%bash
cd ~/Projects/git-test

cat README.md

Hello World

Added a new line for fun

<<<<<<< HEAD
This is a VERY STUPID line.
This is a CRAZY line.
>>>>>>> experiment

This one is fine.

Yet another line.


There are many great GUIs one can use to make fixing conflucts easier but in this simple case al we need to do is to delete the lines that Git has added and decide on what we want to do with the code. In this case I will call it
```bash
This is a VERY STUPID, CRAZY line.
```

We can now add and commit the changes

In [44]:
%%bash
cd ~/Projects/git-test

git add README.md
git commit -m "Fixed conflicts"

[master 7e51052] Fixed conflicts


## `git log`

We've already used `git log` before but would like to show you a more useful way of displaying it in cases where you are working with multiple branches. By default Git gives you something like this

In [45]:
%%bash
cd ~/Projects/git-test

git log

commit 7e51052a07e72abfda7778cce7f536670a52c37d
Merge: fe5a9c0 38e3977
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 16:52:28 2016 +0000

    Fixed conflicts

commit 38e3977c6a2a1fcfeb7911cac1efee043da46b1e
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 16:13:01 2016 +0000

    What will happen now

commit fe5a9c034cb8e614e00bc1b4ff5172dca63428c7
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 16:06:51 2016 +0000

    Git is not going to like this

commit e3f5d037d244f19a477c4d519daeb38e4aa9ccda
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 16:02:31 2016 +0000

    Add some experimental changes

commit 4f7fe4b4dbe9c85a575ba0548b9ad6a6cbd6932e
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 15:53:09 2016 +0000

    Update README.md

commit d449cb8f5ca7da84e4ebb5c8876497989cce8f94
Author: Szymon Prajs <S.Prajs@soton.ac.uk>
Date:   Tue Nov 22 15:39:22 2016 +0000

    Create a README file

commit e3d89d7c7873f635c83

This is pretty difficult to read once we get a lot of commits and does not tell us anything about the branching in our project. Adding the `--oneline --topo-order --graph` flags come in very handy and make our output much neater with a graphical representation of our branches.

In [46]:
%%bash
cd ~/Projects/git-test

git log --oneline --topo-order --graph

*   7e51052 Fixed conflicts
|\  
| * 38e3977 What will happen now
* | fe5a9c0 Git is not going to like this
|/  
* e3f5d03 Add some experimental changes
* 4f7fe4b Update README.md
* d449cb8 Create a README file
* e3d89d7 Added more meaningful lines
* af11ce3 My first commit


We can make an allias to this inside Git. You only need to set this up once

In [47]:
%%bash
cd ~/Projects/git-test

git config --global alias.slog "log --oneline --topo-order --graph"

git slog

*   7e51052 Fixed conflicts
|\  
| * 38e3977 What will happen now
* | fe5a9c0 Git is not going to like this
|/  
* e3f5d03 Add some experimental changes
* 4f7fe4b Update README.md
* d449cb8 Create a README file
* e3d89d7 Added more meaningful lines
* af11ce3 My first commit


## .gitignore

At the beginning of this tutorial we talked about the problem of using `git add --all` in a project with a lot of temporary files or data. Git allows you to add a `.gitignore` file which tells it which files it should not be traching and suggesting for `git add` allowing you to use `git add --all` without accidentally adding unwanted files.

First we can add a few place holder files

In [48]:
%%bash
cd ~/Projects/git-test

touch test.dat test.out
ls

README.md
test.dat
test.out


Git will suggest these to be added as new files for the next commit 

In [49]:
%%bash
cd ~/Projects/git-test

git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	test.dat
	test.out

nothing added to commit but untracked files present (use "git add" to track)


We can create a `.gitignore` file and add two new lines to it, telling Git to ignore everything with the expentions `.dat` and `.out`  

In [54]:
%%bash
cd ~/Projects/git-test

touch .gitignore
echo '*.dat' >> .gitignore
echo '*.out' >> .gitignore

Finally we need to add and commit the changes for `.gitignore`

In [56]:
%%bash
cd ~/Projects/git-test

git add .gitignore
git commit -m "Added .gitignore"

[master 2b909c6] Added .gitignore
 1 file changed, 2 insertions(+)


In [57]:
%%bash
cd ~/Projects/git-test

git status

On branch master
nothing to commit, working directory clean


## Switch code to an older state

There are many ways of undoing the changes you have made to your repositories. These vary depending on whether you have already commited the changes and if you only want to run the code temporarily or work an older version of the code. One could use the `reset`, `revert` or `cherry-pick` functions, however, perhapse the most useful method is to simply create a new branch which is a copy of the code at a point of a chosen commit. 

As mentioned before, commits are labeled internally by Git using SHA hash keys. You can find their long forms by running the `log` command or short versions using `slog`. These are also displayed on GitHub next to each commit. You can use either long or short form of the key.

To create a new branch we can make a small shortcut and use the `git checkout -b [name]` command which makes a new branch called [name] and immediately switched to it. Adding a commit ID after the new branch name tells Git to branch the code at the point of selected commit.   

In [58]:
%%bash
cd ~/Projects/git-test

git checkout -b old_code e3f5d03

Switched to a new branch 'old_code'


If we look at the `log` now we will find the code at the state it was several commits ago.

In [1]:
%%bash
cd ~/Projects/git-test

git slog

* 2b909c6 Added .gitignore
* 561b216 Added .gitignore
*   7e51052 Fixed conflicts
|\  
| * 38e3977 What will happen now
* | fe5a9c0 Git is not going to like this
|/  
* e3f5d03 Add some experimental changes
* 4f7fe4b Update README.md
* d449cb8 Create a README file
* e3d89d7 Added more meaningful lines
* af11ce3 My first commit
