# Git: Version control

---

“Revision control, also known as version control, source control
or software configuration management (SCM), is the
**management of changes to documents, programs, and other
information stored as computer files.**”

**Reproducibility?**

* Tracking and recreating every step of your work
* In the software world: it's called *Version Control*!

**What do (good) version control tools give you?**

* Peace of mind (backups)
* Freedom (exploratory branching)
* Collaboration (synchronization)

Useful links: 

http://rogerdudler.github.io/git-guide/

https://inst.eecs.berkeley.edu/~cs61b/sp20/docs/using-git.html

---


In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('8LOGXA4qkCc', width=800, height=300)

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('hwP7WQkmECE', width=800, height=300)

## Git is an enabling technology: You can use version control for everything!

* Write code
* Write documents (never get `paper_v5_john_jane_final_oct22_really_final.tex` by email again!)
* Even backup your computer configuration!

## The plan for this tutorial

This tutorial is structured in the following way: we will begin with a brief overview of key concepts you need to understand in order for git to really make sense.  We will then dive into hands-on work: after a brief interlude into necessary configuration we will discuss 4 "stages of git" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:
            
1. Local, single-user workflow
2. Local user, branching
3. Using remotes as a single user
4. Collaborating on git with a small team, or in a research group

**Our goal is to reach step 4, together as a lab**

---

## Very high level picture: an overview of key concepts

The **commit**: *a snapshot of work at a point in time* 

Every ball in this diagram represents a commit of all the files in a code repository, that we can go later in time, compare it with. We can also add labels/tags to this commits in case we want to develop new features.

![](_images/gitflow.png)

Credit: Gitflow Atlassian

## Stage 0: Configure GIT

The minimal amount of configuration for git to work without pestering you is to tell it who you are:

The preceding `!` marks that this code  will execute in the `bash` terminal (command line) interpreter instead of in `python`

In [None]:
!git config --global user.name "username"
!git config --global user.email "email"

---

## 1. Local, single-user workflow

Simply type `git` to see a full list of all the 'core' commands. 

In [None]:
!git

**We are going to create a test repo for git to play**

### `git init`: create an empty repository

first we create a folder called `playground_repo`

In [None]:
!pwd

In [None]:
!mkdir playground_repo

In [None]:
cd playground_repo

Let's look at what git did:

In [None]:
ls -la

The folder is empty now.

Lets create a new repo in the folder.

In [None]:
! git init

In [None]:
ls -la

Now you can see that there is a hidden folder `.git` (notice the dot that marks it as a hidden folder), which is the GIT repo

In [None]:
ls -l .git

Now let's edit our first file in the test directory with a text editor... I'm doing it programatically here for automation purposes, but you'd normally be editing by hand

In [None]:
!echo "My first bit of text in the repo" > README.md

`ls` lists the contents of the current working directory

In [None]:
ls

### `git add`: tell git about this new file

In [None]:
!git add README.md

We can now ask git about what happened with `status`:

In [None]:
!git status

### `git commit`: permanently record our changes in git's database

For now, we call `git commit` with the `-a` and `-m` options, but usually you only need `-m`.

This commands commits a snapshot of all changes in the working directory. This only includes modifications to tracked files ( those that have been added with `git add` at some point in their history).

In [None]:
!git commit -a -m "First commit"

In the commit above, we  used the `-m` flag to specify a message at the command line. 
* If we don't do that, git will open the editor we specified in our configuration above and require that we enter a message.  
* By default, git refuses to record changes that don't have a message to go along with them 

### `git log`: what has been committed so far

In [None]:
!git log

### `git diff`: what have I changed?

Let's do a little bit more work... Again, in practice you'll be editing the files by hand, here we do it via shell commands for the sake of automation (and therefore the reproducibility of this tutorial!)

In [None]:
!echo "And now we add a second line..." >> README.md

And now we can ask git what is different:

In [None]:
!git diff

### The cycle of git virtue: work, commit, work, commit, ...

In [None]:
!git commit -a -m "added second line."

### `git log` revisited

First, let's see what the log shows us now:

In [None]:
!git log

## Exercise

Add a new file `README2.md`, commit it, make some changes to it, commit them again, and then remove it (and don't forget to commit this last step!).

In [None]:
!echo "new file" > README2.md

In [None]:
!git add README2.md

In [None]:
!git commit -a -m "added new readme file"

In [None]:
!git log

---

## 2. Local user, branching

What is a branch?  A branch is a label for the state of a GIT repositories. It makes it easy to develop features and go back and forth between the original `master` and the copy `develop` version of the files inside the GIT repo 

![](_images/branches.png)

Credit: Gitflow Atlassian

There can be multiple branches alive at any point in time; the working directory is the state of a special pointer called HEAD.  

In this example there are two branches, *master* and *develop*:

Once new commits are made on a branch, HEAD and the branch label move with the new commits:

**This allows the history of both branches to diverge**

But based on this graph structure, git can compute the necessary information to merge the divergent branches back and continue with a unified line of development:

**Let's now illustrate all of this with a concrete example.**

In [None]:
!git status

We are now going to try two different routes of development: on the `master` branch we will add one file and on the `develop` branch, which we will create, we will add a different one.  We will then merge the `develop` branch into `master`.

In [None]:
!git branch develop
!git checkout develop

In [None]:
!echo "Definitely needs improvement!" > code_to_develop.md
!git add code_to_develop.md
!git commit -a -m "Working on a fix"
!git log

In [None]:
!git checkout master

In [None]:
!ls

As you can see there are no files from `develop` branch in `master` yet!

In [None]:
!echo "All the while, more work goes on in master..." >> README.md
!git commit -a -m "The main branch master keeps moving"

In [None]:
!git merge develop -m "features developed, merge back"

### An important aside: conflict management

While git is very good at merging, if two different branches modify the same file in the same location, it simply can't decide which change should prevail.  At that point, human intervention is necessary to make the decision.  Git will help you by marking the location in the file that has a problem, but it's up to you to resolve the conflict.  

Let's see how that works by intentionally creating a conflict. We start by creating a branch and making a change to our experiment file:

In [None]:
!git branch trouble
!git checkout trouble
!echo "This is going to be a problem..." >> README.md
!git commit -a -m "Adding a file for trouble"

In [None]:
!git checkout master
!echo "At the same time master keeps working on same line will cause a MERGE CONFLICT ..." >> README.md
!git commit -a -m "Keep working on the experiment"

In [None]:
!git merge trouble

At this point, we go into the file with a text editor to open `README.md` and to decide which changes to keep, and make a new commit that records our decision.  I've now made the edits, in this case I decided that both pieces of text were useful, but integrated them with some changes:


And now we go back to the master branch, where we change the *same* file:

In [None]:
!git commit -a -m "Completed merge of trouble, fixed conflicts along the way"
!git log

---

## 3. Using remotes as a single user

We are now going to introduce the concept of a *remote repository*: a pointer to another copy of the repository that lives on a different location. 

An example of this is [our repo](https://github.com/ucsb/tutorial_series) on our group's Github.

When we talk about a remote, we start to say `git pull` and `git push` because we are downloading/uploading with respect to the cloud.

Let's see if we have any remote repositories in this directory:

In [None]:
#!git remote remove origin

In [None]:
!git remote -v

Since the above cell didn't produce any output after the `git remote -v` call, it means we have no remote repositories configured.  We will now proceed to do so.  

Go to your Github profile, go to the repositories page and create a new repository called `test`. Do **not** check the box that says `Initialize this repository with a README`, since we already have an existing repository here. You can make it private if you want.

In [None]:
!git remote add origin https://github.com/aisichenko/test

Now, we `push` the code in `playground_repo` to the repository `test`

In [None]:
!git push --set-upstream origin master

Now check out your repository `test` on Github.com. You'll see everything there!

Let's practice a branch:

In [None]:
!git branch new_branch
!git checkout new_branch

In [None]:
!echo "adding things under new branch" >> README.md
!git commit -a -m "adding things"

In [None]:
!git push origin new_branch

Now you'll see on Github.com that there is a new branch!

In [None]:
!git checkout master
!git merge new_branch

Since we have configured pushing before, now you can just write `git push`

In [None]:
!git push

**That's it! You just create your own repository and experimented with branches!**

---

## 4. Collaborating on git with a small team, or in a research group

This will show the basic workflow of collaborating on a project with a small team where everyone has write privileges to the same repository.  

First, let's create a new repository called "test" on your personal Github. Then we clone it:

`git clone https://github.com/aisichenko/test.git`


In [None]:
!git clone https://github.com/aisichenko/test.git

In [None]:
cd test

In [None]:
!ls -la

We all clone this repository, where-ever on our computers. I always suggest under `/Users/user/Documents/code`. For now, it is under playground_repo.

Let's say Andrei writes some new code.

In [None]:
!echo "#Andrei's code additions" > andrei_code.py

In [None]:
!git add andrei_code.py

In [None]:
!git commit -m "andrei added code"

In [None]:
!git push

Great. Now Bob wants to use it, so initially he clones the repository, and makes a change to the `andrei_code.py` file. He pushes his changes to Github. Then Andrei pull's Bob's changes to his local machine.

Next, we will have both parties make non-conflicting changes, and committing them locally. Then they both try to push their changes:

* Andrei adds a new file, `andrei_other_code.py` and commits.
* Bob adds `Bob_code.py` then commits.
* Andrei does `git push` to Github
* Bob tries to push to Github. What happens here?

The problem is that Bob's changes create a commit that conflicts with Andrei's, so git refuses to apply them.  It forces Bob to first do the merge on his machine, so that if there is a conflict in the merge, Bob deals with the conflict manually (git could try to do the merge on the server, but in that case if there's a conflict, the server repo would be left in a conflicted state without a human to fix things up).  The solution is for Bob to first `git pull` the changes (pull in git is really fetch+merge), and then push again.

**That's it!**

---

#### The key take-away workflow:

For 90% of the time, your workflow will be this:

`git clone` to clone a repository from Github.com

`git add .` to stage all modifications in the repository (all files)

`git commit -m "commit message here"` commit the changes to what has been staged (i.e. all files)

`git pull` when there are new changes on the remote repository. Do this before pushing.

`git push` this will push the changes to the Github from where you cloned it. You don't need to specify anything else.

---

## Git resources

### Introductory materials

There are lots of good tutorials and introductions for Git, which you
can easily find yourself; this is just a short list of things I've found
useful.  For a beginner, I would recommend the following 'core' reading list, and
below I mention a few extra resources:

1. The smallest, and in the style of this tuorial: [git - the simple guide](http://rogerdudler.github.com/git-guide)
contains 'just the basics'.  Very quick read.

1.  The concise [Git Reference](http://gitref.org): compact but with
    all the key ideas. If you only read one document, make it this one.

1. In my own experience, the most useful resource was [Understanding Git
Conceptually](http://www.sbf5.com/~cduan/technical/git).
Git has a reputation for being hard to use, but I have found that with a
clear view of what is actually a *very simple* internal design, its
behavior is remarkably consistent, simple and comprehensible.

1.  For more detail, see the start of the excellent [Pro
    Git](http://progit.org/book) online book, or similarly the early
    parts of the [Git community book](http://book.git-scm.com). Pro
    Git's chapters are very short and well illustrated; the community
    book tends to have more detail and has nice screencasts at the end
    of some sections.

If you are really impatient and just want a quick start, this [visual git tutorial](http://www.ralfebert.de/blog/tools/visual_git_tutorial_1)
may be sufficient. It is nicely illustrated with diagrams that show what happens on the filesystem.

For windows users, [an Illustrated Guide to Git on Windows](http://nathanj.github.com/gitguide/tour.html) is useful in that
it contains also some information about handling SSH (necessary to interface with git hosted on remote servers when collaborating) as well
as screenshots of the Windows interface.

Cheat sheets
:   Two different
    [cheat](http://zrusin.blogspot.com/2007/09/git-cheat-sheet.html)
    [sheets](http://jan-krueger.net/development/git-cheat-sheet-extended-edition)
    in PDF format that can be printed for frequent reference.

### Beyond the basics

At some point, it will pay off to understand how git itself is *built*.  These two documents, written in a similar spirit, 
are probably the most useful descriptions of the Git architecture short of diving into the actual implementation.  They walk you through
how you would go about building a version control system with a little story. By the end you realize that Git's model is almost
an inevitable outcome of the proposed constraints:

* The [Git parable](http://tom.preston-werner.com/2009/05/19/the-git-parable.html) by Tom Preston-Werner.
* [Git foundations](http://matthew-brett.github.com/pydagogue/foundation.html) by Matthew Brett.

[Git ready](http://www.gitready.com)
:   A great website of posts on specific git-related topics, organized
    by difficulty.

[Git Magic](http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html)
:   Another book-size guide that has useful snippets.

The [learning center](http://learn.github.com) at Github
:   Guides on a number of topics, some specific to github hosting but
    much of it of general value.

A [port](http://cworth.org/hgbook-git/tour) of the Hg book's beginning
:   The [Mercurial book](http://hgbook.red-bean.com) has a reputation
    for clarity, so Carl Worth decided to
    [port](http://cworth.org/hgbook-git/tour) its introductory chapter
    to Git. It's a nicely written intro, which is possible in good
    measure because of how similar the underlying models of Hg and Git
    ultimately are.

[Intermediate tips](http://andyjeffries.co.uk/articles/25-tips-for-intermediate-git-users)
:   A set of tips that contains some very valuable nuggets, once you're
    past the basics.

Finally, if you prefer a video presentation, this 1-hour tutorial prepared by the GitHub educational team will walk you through the entire process:

### A few useful tips for common tasks

#### Better shell support

Adding git branch info to your bash prompt and tab completion for git commands and branches is extremely useful.  I suggest you at least copy:

- [git-completion.bash](https://github.com/git/git/blob/master/contrib/completion/git-completion.bash)
- [git-prompt.sh](https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh)
 
You can then source both of these files in your `~/.bashrc` and then set your prompt (I'll assume you named them as the originals but starting with a `.` at the front of the name):

    source $HOME/.git-completion.bash
    source $HOME/.git-prompt.sh
    PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '   # adjust this to your prompt liking

See the comments in both of those files for lots of extra functionality they offer.

# References

**Note:** this tutorial is based on Francisco Perez GIT notebook tutorial and has some ideas from the other links:

- [Francisco Perez GIT notebook](https://github.com/fperez/reprosw)
- [J.R. Johansson](https://github.com/jrjohansson)'s [tutorial on version control](http://nbviewer.ipython.org/urls/raw.github.com/jrjohansson/scientific-python-lectures/master/Lecture-7-Revision-Control-Software.ipynb) 
- ["Git for Scientists: A Tutorial"](https://github.com/johnmcdonnell/Git-Tutorial) by John McDonnell 
- Emanuele Olivetti's lecture notes and exercises from the G-Node summer school on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule).
- [Pro Git book](http://git-scm.com/book) 


---

Tutorial credit:
    
Joaquin Matres (Google) and many others