# Git: Version control

## Wikipedia

“Revision control, also known as version control, source control
or software configuration management (SCM), is the
**management of changes to documents, programs, and other
information stored as computer files.**”

**Reproducibility?**

* Tracking and recreating every step of your work
* In the software world: it's called *Version Control*!

What do (good) version control tools give you?

* Peace of mind (backups)
* Freedom (exploratory branching)
* Collaboration (synchronization)


## Git is an enabling technology: Use version control for everything

* Write documents (never get `paper_v5_john_jane_final_oct22_really_final.tex` by email again!)
* Write code
* Backup your computer configuration

## The plan for this tutorial

This tutorial is structured in the following way: we will begin with a brief overview of key concepts you need to understand in order for git to really make sense.  We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 5 "stages of git" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:
            
1. Local, single-user, linear workflow
2. Single local user, branching
3. Using remotes as a single user
4. Remotes for collaborating in a small team
5. Full-contact gitlab/github: distributed collaboration with large teams
    
In reality, this tutorial only covers stages 1-4, since for #5 there are many software develoment-oriented tutorials and documents of very high quality online.  But most scientists start working alone with a few files or with a small team, so I feel it's important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world.  Once you've become familiar with 1-4, the excellent tutorials that exist about collaborating on github on open-source projects should make sense.

## Very high level picture: an overview of key concepts

The **commit**: *a snapshot of work at a point in time* Every ball in this diagram represents a commit of all the files in a code repository, that we can go later in time, compare it with. We can also add labels/tags to this commits in case we want to develop new features.

![](_images/gitflow.png)

Credit: Gitflow Atlassian

A **hash**: a fingerprint of the content of each commit *and its parent*

In [1]:
import hashlib

# Our first commit
data1 = 'This is the start of my paper2.'
meta1 = 'date: 1/1/12'
hash1 = hashlib.md5((data1 + meta1).encode()).hexdigest()
print('Hash:', hash1)

Hash: 36ccf772af91632e1d223564298f6f4a


In [2]:
# Our second commit, linked to the first
data2 = 'Some more text in my paper...'
meta2 = 'date: 1/2/12'
# Note we add the parent hash here!
hash2 = hashlib.md5((data2 + meta2 + hash1).encode()).hexdigest()
print('Hash:', hash2)

Hash: 3e1b41451d964f91deb5e899097a1964


And this is pretty much the essence of Git!

## First things first: git must be configured before first use

The minimal amount of configuration for git to work without pestering you is to tell it who you are:

### Exercise

Uncomment the following lines and replace `engineer` with your name.

The preceding `!` marks that this code  will execute in the `bash` terminal interpreter instead of in `python`

In [3]:
#!git config --global user.name "engineer"
#!git config --global user.email "engineer@engineers.com"

 And how you will edit text files (it will often ask you to edit messages and other information, and thus wants to know how you like to edit your files):

In [4]:
# Put here your preferred editor. If this is not set, git will honor the $EDITOR environment variable
# On Windows: Notepad works, Notepad++, sublime or atom
# On mac/linux: vim, sublime or atom as a basic option
!git config --global core.editor nvim  # my lightweight neovim editor

# And while we're at it, we also turn on the use of color, which is very useful
!git config --global color.ui "auto"

## Stage 0: Configure GIT

Github is offers in its help pages instructions on how to configure the credentials helper for [Mac OSX](https://help.github.com/articles/set-up-git#platform-mac) and [Windows](https://help.github.com/articles/set-up-git#platform-windows).

First we are going to create an SSH key that you can upload to Gitlab to recognize you. The key has a private part `id_rsa` that you should keep secret, and a public part `id_rsa.pub` that can upload it to Gitlab so gitlab knows it's you.

Uncomment the following 2 lines, and make sure you run them only once, so you generate the key only one time.

In [5]:
#! ssh-keygen -f ~/.ssh/id_rsa -t rsa -N ''

In [6]:
#! less ~/.ssh/id_rsa.pub

## Stage 1: Local, single-user, linear workflow

Simply type `git` to see a full list of all the 'core' commands.  We'll now go through most of these via small practical exercises:

In [7]:
!git

usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone      Clone a repository into a new directory
   init       Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add        Add file contents to the index
   mv         Move or rename a file, a directory, or a symlink
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect     Use binary search to find the commit that introduced a bug
   grep      

We are going to create a test repo for git to play

### `git init`: create an empty repository

first we create a folder called `playground_repo`

In [8]:
! mkdir playground_repo

In [9]:
cd playground_repo

/Users/joaquin/files/code/training/git_notebooks/playground_repo


Let's look at what git did:

In [10]:
ls -la

total 0
drwxr-xr-x   2 joaquin  staff   64 Mar 22 14:06 [34m.[m[m/
drwxr-xr-x  11 joaquin  staff  352 Mar 22 14:06 [34m..[m[m/


The folder is empty now.

Lets create a new repo in the folder.

In [11]:
! git init

Initialized empty Git repository in /Users/joaquin/files/code/training/git_notebooks/playground_repo/.git/


In [12]:
ls -la

total 0
drwxr-xr-x   3 joaquin  staff   96 Mar 22 14:06 [34m.[m[m/
drwxr-xr-x  11 joaquin  staff  352 Mar 22 14:06 [34m..[m[m/
drwxr-xr-x   7 joaquin  staff  224 Mar 22 14:06 [34m.git[m[m/


Now you can see that there is a hidden folder `.git` (notice the dot that marks it as a hidden folder), which is the GIT repo

In [13]:
ls -l .git

total 16
-rw-r--r--  1 joaquin  staff   23 Mar 22 14:06 HEAD
-rw-r--r--  1 joaquin  staff  137 Mar 22 14:06 config
drwxr-xr-x  6 joaquin  staff  192 Mar 22 14:06 [34mhooks[m[m/
drwxr-xr-x  4 joaquin  staff  128 Mar 22 14:06 [34mobjects[m[m/
drwxr-xr-x  4 joaquin  staff  128 Mar 22 14:06 [34mrefs[m[m/


Now let's edit our first file in the test directory with a text editor... I'm doing it programatically here for automation purposes, but you'd normally be editing by hand

In [14]:
!echo "My first bit of text in the repo" > README.md

`ls` lists the contents of the current working directory

In [15]:
ls

README.md


### `git add`: tell git about this new file

In [16]:
!git add README.md

We can now ask git about what happened with `status`:

In [17]:
!git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   README.md[m



### `git commit`: permanently record our changes in git's database

For now, we are *always* going to call `git commit` either with the `-a` option *or* with specific filenames (`git commit file1 file2...`).  This delays the discussion of an aspect of git called the *index* (often referred to also as the 'staging area') that we will cover later.  Most everyday work in regular scientific practice doesn't require understanding the extra moving parts that the index involves, so on a first round we'll bypass it.  Later on we will discuss how to use it to achieve more fine-grained control of what and how git records our actions.

In [18]:
!git commit -a -m "First commit"

[master (root-commit) 6ce7b8f] First commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md


In the commit above, we  used the `-m` flag to specify a message at the command line.  If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.  By default, git refuses to record changes that don't have a message to go along with them (though you can obviously 'cheat' by using an empty or meaningless string: git only tries to facilitate best practices, it's not your nanny).

### `git log`: what has been committed so far

In [19]:
!git log

[33mcommit 6ce7b8f7e3ee20f1f34c4754bc22c5d01d037462[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Author: joaquin <joaquin@mac>
Date:   Fri Mar 22 14:06:51 2019 +0100

    First commit


### `git diff`: what have I changed?

Let's do a little bit more work... Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)

In [20]:
!echo "And now we add a second line..." >> README.md

And now we can ask git what is different:

In [21]:
!git diff

[1mdiff --git a/README.md b/README.md[m
[1mindex db447fe..466ed60 100644[m
[1m--- a/README.md[m
[1m+++ b/README.md[m
[36m@@ -1 +1,2 @@[m
 My first bit of text in the repo[m
[32m+[m[32mAnd now we add a second line...[m


### The cycle of git virtue: work, commit, work, commit, ...

In [22]:
!git commit -a -m "added second line."

[master 8391fee] added second line.
 1 file changed, 1 insertion(+)


### `git log` revisited

First, let's see what the log shows us now:

In [23]:
!git log

[33mcommit 8391fee7881d088a9f8c1bf6c2bfcfe5d66d0701[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Author: joaquin <joaquin@mac>
Date:   Fri Mar 22 14:06:52 2019 +0100

    added second line.

[33mcommit 6ce7b8f7e3ee20f1f34c4754bc22c5d01d037462[m
Author: joaquin <joaquin@mac>
Date:   Fri Mar 22 14:06:51 2019 +0100

    First commit


Sometimes it's handy to see a very summarized version of the log:

In [24]:
!git log --oneline --topo-order --graph

* [33m8391fee[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m added second line.
* [33m6ce7b8f[m First commit


Git supports *aliases:* new names given to command combinations. Let's make this handy shortlog an alias, so we only have to type `git slog` and see this compact log:

In [25]:
# We create our alias (this saves it in git's permanent configuration file):
!git config --global alias.slog "log --oneline --topo-order --graph"

# And now we can use it
!git slog

* [33m8391fee[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m added second line.
* [33m6ce7b8f[m First commit


### `git mv` and `rm`: moving and removing files

While `git add` is used to add fils to the list git tracks, we must also tell it if we want their  names to change or for it to stop tracking them.  In familiar Unix fashion, the `mv` and `rm` git commands do precisely this:

In [26]:
!git mv README.md README.markdown
!git status

On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	[32mrenamed:    README.md -> README.markdown[m



Note that these changes must be committed too, to become permanent!  In git's world, until something hasn't been committed, it isn't permanently recorded anywhere.

In [27]:
!git commit -a -m "I like this new name better"
!echo "Let's look at the log again:"
!git slog

[master 34406ee] I like this new name better
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename README.md => README.markdown (100%)
Let's look at the log again:
* [33m34406ee[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m I like this new name better
* [33m8391fee[m added second line.
* [33m6ce7b8f[m First commit


And `git rm` works in a similar fashion removing the file from your repo.

### Exercise

Add a new file `README2.md`, commit it, make some changes to it, commit them again, and then remove it (and don't forget to commit this last step!).

## Local user, branching

What is a branch?  A branch is a label for the state of a GIT repositories. It makes it easy to develop features and go back and forth between the original `master` and the copy `feature branch` version of the files inside the GIT repo 

![](_images/branches.png)

Credit: Gitflow Atlassian

There can be multiple branches alive at any point in time; the working directory is the state of a special pointer called HEAD.  In this example there are two branches, *master* and *develop*:

Once new commits are made on a branch, HEAD and the branch label move with the new commits:

This allows the history of both branches to diverge:

But based on this graph structure, git can compute the necessary information to merge the divergent branches back and continue with a unified line of development:

Let's now illustrate all of this with a concrete example.

In [28]:
!git status
!ls

On branch master
nothing to commit, working tree clean
README.markdown


We are now going to try two different routes of development: on the `master` branch we will add one file and on the `emojis` branch, which we will create, we will add a different one.  We will then merge the emojis branch into `master`.

In [29]:
!git branch emojis
!git checkout emojis

Switched to branch 'emojis'


In [30]:
!echo "Some emojis :smile:, :horse:, :cat:" > emojis.md
!git add emojis.md
!git commit -a -m "Adding some emojis"
!git slog

[emojis fa34cbb] Adding some emojis
 1 file changed, 1 insertion(+)
 create mode 100644 emojis.md
* [33mfa34cbb[m[33m ([m[1;36mHEAD -> [m[1;32memojis[m[33m)[m Adding some emojis
* [33m34406ee[m[33m ([m[1;32mmaster[m[33m)[m I like this new name better
* [33m8391fee[m added second line.
* [33m6ce7b8f[m First commit


In [31]:
!git checkout master
!git slog

Switched to branch 'master'
* [33m34406ee[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m I like this new name better
* [33m8391fee[m added second line.
* [33m6ce7b8f[m First commit


In [32]:
!ls

README.markdown


As you can see there are no emjois file in master yet

In [33]:
!echo "All the while, more work goes on in master..." >> README.markdown
!git commit -a -m "The mainline keeps moving"
!git slog

[master 01fa505] The mainline keeps moving
 1 file changed, 1 insertion(+)
* [33m01fa505[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m The mainline keeps moving
* [33m34406ee[m I like this new name better
* [33m8391fee[m added second line.
* [33m6ce7b8f[m First commit


In [34]:
ls

README.markdown


In [35]:
!git merge emojis -m 'merge emojis'
!git slog

Merge made by the 'recursive' strategy.
 emojis.md | 1 [32m+[m
 1 file changed, 1 insertion(+)
 create mode 100644 emojis.md
*   [33m060a794[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m merge emojis
[31m|[m[32m\[m  
[31m|[m * [33mfa34cbb[m[33m ([m[1;32memojis[m[33m)[m Adding some emojis
* [32m|[m [33m01fa505[m The mainline keeps moving
[32m|[m[32m/[m  
* [33m34406ee[m I like this new name better
* [33m8391fee[m added second line.
* [33m6ce7b8f[m First commit


## Using remotes as a single user

We are now going to introduce the concept of a *remote repository*: a pointer to another copy of the repository that lives on a different location.  This can be simply a different path on the filesystem or a server on the internet.

For this discussion, we'll be using remotes hosted on the [Gitlab.com](http://gitlab.psiquantum.com) service, but you can equally use other services like [BitBucket](http://bitbucket.org) or [Github](https://github.com/) as well as host your own.

Let's see if we have any remote repositories here:

In [36]:
!git remote -v

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.  We will now proceed to do so.  Once logged into Gitlab, go to the new repository page and make a repository called `test`.  Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository here.  That option is useful when you're starting first at Github and don't have a repo made already on a local computer.

We can now follow the instructions from the next page:

### An important aside: conflict management

While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  Let's see how that works by intentionally creating a conflict.

We start by creating a branch and making a change to our experiment file:

In [37]:
!git branch trouble
!git checkout trouble
!echo "This is going to be a problem..." >> README.markdown
!git commit -a -m "Adding a file for trouble"

Switched to branch 'trouble'
[trouble 5bca89d] Adding a file for trouble
 1 file changed, 1 insertion(+)


And now we go back to the master branch, where we change the *same* file:

In [38]:
!git checkout master
!echo "At the same time master keeps working on same line will cause a MERGE CONFLICT ..." >> README.markdown
!git commit -a -m "Keep working on the experiment"

Switched to branch 'master'
[master 50b8526] Keep working on the experiment
 1 file changed, 1 insertion(+)


So now let's see what happens if we try to merge the `trouble` branch into `master`:

In [39]:
!git checkout master

Already on 'master'


In [40]:
!git merge trouble

Auto-merging README.markdown
CONFLICT (content): Merge conflict in README.markdown
Automatic merge failed; fix conflicts and then commit the result.


Let's see what git has put into our file:

At this point, we go into the file with a text editor, decide which changes to keep, and make a new commit that records our decision.  I've now made the edits, in this case I decided that both pieces of text were useful, but integrated them with some changes:

Let's then make our new commit:

In [None]:
#!git commit -a -m "Completed merge of trouble, fixing conflicts along the way"
#!git slog

*Note:* While it's a good idea to understand the basics of fixing merge conflicts by hand, in some cases you may find the use of an automated tool useful.  Git supports multiple [merge tools](https://www.kernel.org/pub/software/scm/git/docs/git-mergetool.html): a merge tool is a piece of software that conforms to a basic interface and knows how to merge two files into a new one.  Since these are typically graphical tools, there are various to choose from for the different operating systems, and as long as they obey a basic command structure, git can work with any of them.

## Collaborating on git with a small team

Single remote with shared access: we are going to set up a shared collaboration with one partner (the person sitting next to you).  This will show the basic workflow of collaborating on a project with a small team where everyone has write privileges to the same repository.  

Note for SVN users: this is similar to the classic SVN workflow, with the distinction that commit and push are separate steps.  SVN, having no local repository, commits directly to the shared central resource, so to a first approximation you can think of `svn commit` as being synonymous with `git commit; git push`.

We will have two people, let's call them Alice and Bob, sharing a repository.  Alice will be the owner of the repo and she will give Bob write privileges.  

We begin with a simple synchronization example, much like we just did above, but now between *two people* instead of one person.  Otherwise it's the same:

- Bob clones Alice's repository.
- Bob makes changes to a file and commits them locally.
- Bob pushes his changes to github.
- Alice pulls Bob's changes into her own repository.

Next, we will have both parties make non-conflicting changes each, and commit them locally.  Then both try to push their changes:

- Alice adds a new file, `alice.txt` to the repo and commits.
- Bob adds `bob.txt` and commits.
- Alice pushes to github.
- Bob tries to push to github.  What happens here?

The problem is that Bob's changes create a commit that conflicts with Alice's, so git refuses to apply them.  It forces Bob to first do the merge on his machine, so that if there is a conflict in the merge, Bob deals with the conflict manually (git could try to do the merge on the server, but in that case if there's a conflict, the server repo would be left in a conflicted state without a human to fix things up).  The solution is for Bob to first pull the changes (pull in git is really fetch+merge), and then push again.

## Full-contact with gitlab/github: distributed collaboration with large teams

Multiple remotes and merging based on pull request workflow: this is beyond the scope of this brief tutorial, so we'll simply discuss how it works very briefly, illustrating it with the activity on the [IPython github repository](http://github.com/ipython/ipython).

## Git resources

### Introductory materials

There are lots of good tutorials and introductions for Git, which you
can easily find yourself; this is just a short list of things I've found
useful.  For a beginner, I would recommend the following 'core' reading list, and
below I mention a few extra resources:

1. The smallest, and in the style of this tuorial: [git - the simple guide](http://rogerdudler.github.com/git-guide)
contains 'just the basics'.  Very quick read.

1.  The concise [Git Reference](http://gitref.org): compact but with
    all the key ideas. If you only read one document, make it this one.

1. In my own experience, the most useful resource was [Understanding Git
Conceptually](http://www.sbf5.com/~cduan/technical/git).
Git has a reputation for being hard to use, but I have found that with a
clear view of what is actually a *very simple* internal design, its
behavior is remarkably consistent, simple and comprehensible.

1.  For more detail, see the start of the excellent [Pro
    Git](http://progit.org/book) online book, or similarly the early
    parts of the [Git community book](http://book.git-scm.com). Pro
    Git's chapters are very short and well illustrated; the community
    book tends to have more detail and has nice screencasts at the end
    of some sections.

If you are really impatient and just want a quick start, this [visual git tutorial](http://www.ralfebert.de/blog/tools/visual_git_tutorial_1)
may be sufficient. It is nicely illustrated with diagrams that show what happens on the filesystem.

For windows users, [an Illustrated Guide to Git on Windows](http://nathanj.github.com/gitguide/tour.html) is useful in that
it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well
as screenshots of the Windows interface.

Cheat sheets
:   Two different
    [cheat](http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html)
    [sheets](http://jan-krueger.net/development/git-cheat-sheet-extended-edition)
    in PDF format that can be printed for frequent reference.

### Beyond the basics

At some point, it will pay off to understand how git itself is *built*.  These two documents, written in a similar spirit, 
are probably the most useful descriptions of the Git architecture short of diving into the actual implementation.  They walk you through
how you would go about building a version control system with a little story. By the end you realize that Git's model is almost
an inevitable outcome of the proposed constraints:

* The [Git parable](http://tom.preston-werner.com/2009/05/19/the-git-parable.html) by Tom Preston-Werner.
* [Git foundations](http://matthew-brett.github.com/pydagogue/foundation.html) by Matthew Brett.

[Git ready](http://www.gitready.com)
:   A great website of posts on specific git-related topics, organized
    by difficulty.

[Git Magic](http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html)
:   Another book-size guide that has useful snippets.

The [learning center](http://learn.github.com) at Github
:   Guides on a number of topics, some specific to github hosting but
    much of it of general value.

A [port](http://cworth.org/hgbook-git/tour) of the Hg book's beginning
:   The [Mercurial book](http://hgbook.red-bean.com) has a reputation
    for clarity, so Carl Worth decided to
    [port](http://cworth.org/hgbook-git/tour) its introductory chapter
    to Git. It's a nicely written intro, which is possible in good
    measure because of how similar the underlying models of Hg and Git
    ultimately are.

[Intermediate tips](http://andyjeffries.co.uk/articles/25-tips-for-intermediate-git-users)
:   A set of tips that contains some very valuable nuggets, once you're
    past the basics.

Finally, if you prefer a video presentation, this 1-hour tutorial prepared by the GitHub educational team will walk you through the entire process:

### A few useful tips for common tasks

#### Better shell support

Adding git branch info to your bash prompt and tab completion for git commands and branches is extremely useful.  I suggest you at least copy:

- [git-completion.bash](https://github.com/git/git/blob/master/contrib/completion/git-completion.bash)
- [git-prompt.sh](https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh)
 
You can then source both of these files in your `~/.bashrc` and then set your prompt (I'll assume you named them as the originals but starting with a `.` at the front of the name):

    source $HOME/.git-completion.bash
    source $HOME/.git-prompt.sh
    PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '   # adjust this to your prompt liking

See the comments in both of those files for lots of extra functionality they offer.

# References

**Note:** this tutorial is based on Francisco Perez GIT notebook tutorial and has some ideas from the other links:

- [Francisco Perez GIT notebook](https://github.com/fperez/reprosw)
- [J.R. Johansson](https://github.com/jrjohansson)'s [tutorial on version control](http://nbviewer.ipython.org/urls/raw.github.com/jrjohansson/scientific-python-lectures/master/Lecture-7-Revision-Control-Software.ipynb) 
- ["Git for Scientists: A Tutorial"](https://github.com/johnmcdonnell/Git-Tutorial) by John McDonnell 
- Emanuele Olivetti's lecture notes and exercises from the G-Node summer school on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule).
- [Pro Git book](http://git-scm.com/book) 
