<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Git for Version Control

### Keeping Track of Everything

Dr. Yves J. Hilpisch

The Python Quants GmbH

<a href='http://tpq.io'>http://tpq.io</a> | <a href='mailto:training@tpq.io'>training@tpq.io</a>

## Agenda

* Why use Git for version control?
* Working with an existing repository
* Setting up Git
* Setting up an empty repository
* Branching

## Why use Git for Version Control?

There are a number of reasons to work with `Git` for version control:

* managing different versions ("linear progress") for a software project
* managing alternative versions ("parallel progress") for a software project 
* working together with others on the same project
* keeping backups of different states of your project
* documenting and signing a history of a software project ("block chain")
* sharing an open source project with the community
* allowing for contributions to an open source project
* ...

There is an **excellent book** available online for free about `Git`:

https://git-scm.com/book/en/v2

You find information about **how to install `Git`** e.g. under:

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

In general, you should be able to easily install it from the shell via (Ubuntu):

    (sudo) apt-get install git

## Working with an Existing Repository

Software projects with `Git` are called **repositories** ("repos"). They can be really small and only used by a single author, or they can be huge and be used by a large number of programmers. The following is one of my repos on Github http://github.com/yhilpisch/dx.

<img src="http://hilpisch.com/images/github_dx.png">

It is easy to copy, i.e. **`clone`**, such a repo to your local machine or server.

In [None]:
path = '/Users/yves/Temp/'  # define target path
folder = path + 'dx-analytics'  # define target folder

In [None]:
!git clone http://github.com/yhilpisch/dx $folder

Let us have a look at the **contents** of the just created folder.

In [None]:
cd $folder

In [None]:
ls

It all looks like a regular folder. However, the **`Git` specific files** are stored in a hidden sub-folder.

In [None]:
ls .git

Using `ls-files`, `Git` gives you an **overview of all files** in the repo's index. More on the index and it's role later.

In [None]:
!git ls-files

Let us use `Git` to further **inspect the repo** and its (`commit`) history.

In [None]:
!git log -n 3  # show latest 3 commits
# note "different" authors (here: the same from different machines)

Show all commits in a single line.

In [None]:
!git log --pretty=oneline

You can easily access help texts.

In [None]:
!git help log | head -n 25

The `READ.me` file includes in general a **documentation/guide/description/etc.** for the repo to be rendered nicely by Github.

In [None]:
!head -n 30 README.md

Let us delete the cloned repository.

In [None]:
cd $path

In [None]:
pwd

In [None]:
!rm -rf $folder

## Setting up Git

You can configure `Git` in many ways. However, we'll stick to the basics here. The following two configurations represent the bare minimum if you want to commit.

In [None]:
!git config --global user.name "yves"

In [None]:
!git config --global user.email "yves@tpq.io"

Others might be useful for convience and/or taste.

In [None]:
!git config --global core.editor vim  # which editor to use

In [None]:
!git config --global color.ui auto  # colored output

The options are stored in the `~/.gitconfig` file.

In [None]:
!cat ~/.gitconfig

## Setting up an Empty Repository

We are set to initialize our first empty repository.

In [None]:
repo = path + 'gitrepo'

In [None]:
!rm -rf $repo

In [None]:
!git init $repo

Of course, the repo is empty apart from the `.git` folder.

In [None]:
cd $repo

In [None]:
ls -a

Let us add the first file. Consider the following simple Python function.

In [None]:
def f(x):
    ''' Function to compute the square of a number.
    
    Parameters
    ==========
    x: float
        input number
    
    Returns
    =======
    y: float
        (positive) output number
    '''
    y = x ** 2
    return y

In [None]:
# saving the last input cells content
%save -f function.py _i

A quick check.

In [None]:
ls -a  # our new file

In [None]:
!cat function.py

Although the file physically exists in the folder, it is not part yet of the repository.

In [None]:
# no file in the index yet
!git ls-files

We need to `add` (stage) the new file, i.e. add it to the `Git index`, in order see it in the repo's files.

In [None]:
!git add function.py

In [None]:
!git ls-files

We do not have a commit history yet.

In [None]:
!git log

Therefore, let us `commit` the new file, i.e. "freeze" a current status.

In [None]:
!git commit -am'Initial commit.'

In [None]:
!git log

We can now, for instance, make changes to the existing file.

In [None]:
def f(x):
    ''' Function to compute the square of a number.
    
    Parameters
    ==========
    x: float
        input number
    
    Returns
    =======
    y: float
        (positive) output number
        
    Raises
    ======
    ValueError if x is neither int or float
    '''
    if type(x) not in [int, float]:
        raise ValueError('Parameter must be integer or float.')
    y = x ** 2
    return y

In [None]:
# saving the last input cells content
%save -f function.py _i

In [None]:
f('python')

Let us check the status of the repo.

In [None]:
!git status

Let us stage and commit the changes.

In [None]:
!git add .

In [None]:
!git commit -m'Added input checking.'

Another check.

In [None]:
!git status

In [None]:
!git log

What exactly is the difference between the two commits? Let us look at the `diff`. First, relative to the first commit (we added lines of code for the second commit). `HEAD~0` and `HEAD~` represent the last commit. `HEAD~2` the second but last commit. You can also use hashes directly.

In [None]:
!git diff HEAD~1 HEAD~0

Second, relative to the last commit (we need to delete lines of code to get to the previous commit).

In [None]:
# specifying a file name
!git diff HEAD~0 HEAD~1 function.py

Some more changes.

In [None]:
#
# Simple Function
# The Python Quants GmbH
#


def f(x):
    ''' Simple function to compute the square of a number.
    
    Parameters
    ==========
    x: float
        input number
    
    Returns
    =======
    y: float
        (positive) output number
        
    Raises
    ======
    TypeError if x is neither int nor float
    '''
    if type(x) not in [int, float]:
        raise TypeError('Parameter must be integer or float.')
    y = x ** 2
    return y

In [None]:
# saving the last input cells content
%save -f function.py _i

The third commit.

In [None]:
!git commit -am'Added header to file, corrections.'

In [None]:
!git log --pretty=oneline

A look at the changes from the last commit relative to the previous one.

In [None]:
!git diff HEAD~1 HEAD~0 function.py

And a look at the changes from from the last commit relative to the initial one.

In [None]:
!git diff HEAD~2 HEAD~0

Sometimes you notice that you have made a mistake after you have commited your changes. You can easily reset the repo to a previous commit.

In [None]:
# all "physical" files remain unchanged
# by a soft reset
!git reset --soft HEAD^

In [None]:
!git log --pretty=oneline

In [None]:
# correct your mistakes, make other changes ...
!git commit -am'Added a TPQ header; corrections.'

In [None]:
!git log --pretty=oneline

What happens if we import the `function.py` file into e.g. our Jupyter Notebook runtime? Another (Python byte code) file gets created &mdash; `function.pyc` in the `__pycache__` folder.

In [None]:
import function

In [None]:
function.f(10)

In [None]:
ls -an

In [None]:
ls -an __pycache__

Do we want to track/commit such files? Probably not ... As a standard convention, you can define those file(s) type(s) to be excluded from any staging/commiting procedure in the `.gitignore` file. The following does it via a general exclusion of all `.pyc` files and of the folder `__pycache__`. 

In [None]:
exclude = ['__pycache__', '*.pyc']

In [None]:
with open('.gitignore', 'w') as f:
    for e in exclude:
        f.write(e + '\n')

In [None]:
!cat .gitignore

Let us test whether it works.

In [None]:
# function.pyc does not appear
!git status

Now, we can safely stage and commit. Note that the `.gitignore` file gets of course added and commited.

In [None]:
!git add --all .

In [None]:
!git commit -am'Added .gitignore.'

In [None]:
!git log -n 2

What has changed?

In [None]:
!git ls-files

In [None]:
!git diff HEAD~1 HEAD~0

A more comprehensive `.gitignore` for Python can be found under

https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore

## Branching

Why branching after all? So far, we have seen how to work with a linear commit history (stacking one commit on top of others).

**Branching** allows you to (temporarily) manage two or more histories in parallel.

In [None]:
!git checkout -b classapproach

In [None]:
!git status

The ideas for the `classapproach` branch is to test the idea of implementing a Python class instead of a simple Python function.

In [None]:
class Square(object):
    def __init__(self, x):
        self.x = x
    def calculate(self):
        return self.x ** 2

In [None]:
%save class.py _i

In [None]:
s = Square(4)

In [None]:
s.calculate()

In [None]:
!git status

Let us commit the new file.

In [None]:
!git add class.py

In [None]:
!git commit -am'Implemented Python class.'

In [None]:
!git log --pretty=oneline

You can also show commits per branch only.

In [None]:
!git log classapproach --not master

We can now change back to our master branch.

In [None]:
!git checkout master

In [None]:
!git log -n 2

If we are satisfied with the stuff we did in the `classapproach` branch, we can **`merge`** the two. In our case, this is straightforward since we do not expect any conflicts.

In [None]:
!git merge classapproach

The last commit from the `classapproach` branch is now on master as well.

In [None]:
!git log -n 3

The other branch lives on.

In [None]:
!git checkout classapproach

In this branch, we can also change existing files from the other (`master`) branch.

In [None]:
#
# Simple Function
# The Python Quants GmbH
#


def f(x):
    ''' Simple function to compute the square of a number.
    
    Parameters
    ==========
    x: float
        input number
    
    Returns
    =======
    y: float
        (positive) output number
        
    Raises
    ======
    TypeError if x is neither int nor float
    '''
    if type(x) not in [int, float]:
        raise TypeError('Parameter must be integer or float.')
    y = x * x  # this line is changed
    return y

In [None]:
%save function.py _i

The current state of the branch and our commit.

In [None]:
!git status

In [None]:
!git add .

In [None]:
!git commit -am'Changed way of calculation.'

Our second merge.

In [None]:
!git checkout master

In [None]:
!git merge classapproach

In [None]:
!git diff HEAD~1 HEAD~0

Our commit history.

In [None]:
!git log --graph

## Cleaning Up

In [None]:
cd ..

In [None]:
!rm -rf gitrepo

## Conclusions

In conclusion, we can state the following:

* version control helps yourself and others
* it helps collaborating on smaller, mid-sized as well as huge projects
* Git has become a standard in software engineering (but not the only player in the field)

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:team@tpq.io">team@tpq.io</a>