# Version Control and Git

In any software development, one of the most important tools are version control software 

They are used in virtually all software development and in all environments, by everyone and everywhere.

Version control an used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!



## There are two main purposes of VCS systems:

1. Keep track of changes in the source code.
    * Allow reverting back to an older revision if something goes wrong.
    * Work on several "branches" of the software concurrently.
    * Tags revisions to keep track of which version of the software that was used for what (for example, "release-1.0", "paper-A-final", ...)
2. Make it possible for serveral people to collaboratively work on the same code base simultaneously.
    * Allow many authors to make changes to the code.
    * Clearly communicating and visualizing changes in the code base to everyone involved.

## Basic principles and terminology for VCS systems

In an VCS, the source code or digital content is stored in a **repository**. 

* The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository. 

* A user can **checkout** the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated. 

* When a task has been completed, the changes to the local files are **commited** (saved to the repository).

* If someone else has been making changes to the same files, a **conflict** can occur. In many cases conflicts can be **resolved** automatically by the system, but in some cases we might manually have to **merge** different changes together.

* It is often useful to create a new **branch** in a repository, or a **fork** or **clone** of an entire repository, when we doing larger experimental development. The main branch in a repository is called often **master** or **trunk**. When work on a branch or fork is completed, it can be merged in to the master branch/repository.

* With distributed VCSs such as **GIT** or **Mercurial**, we can **pull** and **push** changesets between different repositories. For example, between a local copy of there repository to a central online reposistory (for example on a community repository host site like [github.com](github.com).

## GIT

* Created by Linus Torvalds, 2005 
 
  * https://github.com/torvalds

### Why git?

 * Popular (~50% of open source projects)
 * Truly distributed
 * Very fast
 * Everything is local
 * Free
 * Safe against corruptions
 * **GitHub!**
 
### GitHub

GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.

This tutorial teaches you GitHub essentials like repositories, branches, commits, and Pull Requests. You’ll create your own Hello World repository and learn GitHub’s Pull Request workflow, a popular way to create and review code.

https://guides.github.com/activities/hello-world/

**In the rest of this lecture we will look at `git`.**

## Installing git

On Windows
 
    Download git https://github.com/git-for-windows/git/releases and run the downloaded installer.
        
On Linux:
    
    $ sudo apt-get install git

On Mac (with macports):

    $ sudo port install git

## Configure the author information:

The first time you start to use git, you'll need to configure your author information:

In [None]:
!git config --global user.name  thermalogic
!git config --global user.email cmhnj@189.cn

### Here, we set up a new git repo, add a file, and commit it to the repo.

In [None]:
%pwd

In [None]:
%cd ..
%mkdir myfirstrepo
%cd myfirstrepo

## Creating and cloning a repository

To create a brand new empty repository, we can use the command
```bash
>git init repository-name
```

In [None]:
# create a new git repository called gitdemo:
!git init gitdemo

If we want to fork or clone an existing repository, we can use the command 
```bash
>git clone repository
```

In [None]:
!git clone https://github.com/PySEE/SEUIF97

Git clone can take a URL to a public repository, like above, or a path to a local directory:

In [None]:
!git clone gitdemo gitdemo2

## Adding files,Status and committing

To add a new file to the repository, we first create the file and then use the `git add filename` command:

In [None]:
%%file README.md

A file with information about the gitdemo repository.

In [None]:
!git add README.md

## Status

Using the command 
```base
>git status
```
we get a summary of the current status of the **working directory**. It shows if we have modified, added or removed files.

In [None]:
!git status

In this case, after having added the file `README.md`, the command `git status` list it as an *untracked* file and has not yet been **commited** to the **repository**.

It is therefore not in the repository.

### Commit

In [None]:
!git commit -m "added README.mdn file"  README.md

In [None]:
!git status

### Add a python file

In [None]:
%%file Hello.py

print('Hello,World!')

In [None]:
!git add Hello.py

In [None]:
!git commit -m "added python file" Hello.py

In [None]:
!git status 

After *committing* the change to the **repository** from **the local working directory**, `git status` again reports that working directory is clean.

## Commiting changes

When files that is tracked by GIT are changed, they are listed as *modified* by `git status`:

In [None]:
%%file README.md

A file with information about the gitdemo repository.

A new line.

In [None]:
!git status

Again, we can commit such changes to the repository using the `git commit -m "message"` command.

In [None]:
!git commit -m "added one more line in README" README.md

In [None]:
!git status

## Removing files

To remove file that has been added to the repository, use `git rm filename`, which works similar to `git add filename`:

In [None]:
%%file tmpfile

A short-lived file.

Add it:

In [None]:
!git add tmpfile

In [None]:
!git commit -m "adding file tmpfile" tmpfile 

Remove it again:

In [None]:
!git rm tmpfile

In [None]:
!git commit -m "remove file tmpfile" tmpfile 

## Commit logs

The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the `-m "message"` is omitted when invoking the `git commit` message an editor will be opened for you to type a commit message (for example useful when a longer commit message is requried). 

We can look at the revision log by using the command `git log`:

In [None]:
!git log

In the commit log, each version is shown with a timestampe, a unique has tag that, and author information and the commit message.

## Diffs

All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use `git diff` so see what has changed in a file:

In [None]:
%%file README.md

A file with information about the gitdemo repository.

README files usually contains installation instructions, and information about how to get started using the software (for example).

In [None]:
!git diff README.md

That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.

In github (a web-based GIT repository hosting service) it can look like this:

![git-diff](./img/git-diff.jpg)

## Discard changes in the working directory

To discard a change (revert to the latest version in the repository) we can use the `checkout` command like this:

In [None]:
!git checkout -- README.md

In [None]:
!git status

## Checking out old versions

If we want to get the code for a specific version, we can use "git checkout" and giving it the hash code for the version we are interested as argument:

In [None]:
!git log

In [None]:
!git checkout 1f26ad648a791e266fbb951ef5c49b8d990e6461

Now the content of all the files like in the version with the hash code listed above (first revision)

In [None]:
%pycat README.md

We can move back to "the latest" (master) with the command:

In [None]:
!git checkout master 

In [None]:
%pycat README.md

In [None]:
!git status

## Tagging and branching

### Tags

Tags are named version. They are useful for marking particular version for later references. For example, we can tag our code with the tag "paper-1-final" when when simulations for "paper-1" are finished and the paper submitted. Then we can always retreive the exactly the code used for that paper even if we continue to work on and develop the code for future projects and papers.

In [None]:
!git log

In [None]:
!git tag -a demotag1 -m "Code used for this and that purpuse" 

In [None]:
!git tag -l 

In [None]:
!git show demotag1

To retreive the code in the state corresponding to a particular tag, we can use the `git checkout tagname` command:

    $ git checkout demotag1

## Branches

With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when serveral people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk.

In GIT, we can create a new branch like this:

In [None]:
!git branch expr1 

We can list the existing branches like this:

In [None]:
!git branch

And we can switch between branches using `checkout`:

In [None]:
!git checkout expr1

Make a change in the new branch.

In [None]:
%%file README.md

A file with information about the gitdemo repository.

README files usually contains installation instructions, and information about how to get started using the software (for example).

Experimental addition.

In [None]:
!git commit -m "added a line in expr1 branch" README.md

In [None]:
!git branch

In [None]:
!git checkout master

In [None]:
!git branch

We can merge an existing branch and all its changesets into another branch (for example the master branch) like this:

First change to the target branch:

In [None]:
!git checkout master

In [None]:
!git merge expr1

In [None]:
!git branch 

We can delete the branch `expr1` now that it has been merged into the master:

In [None]:
!git branch -d expr1

In [None]:
!git branch

In [None]:
%pycat README.md

## pulling and pushing change sets between repositories

If the respository has been cloned from another repository, for example on [github.com](github.com), it automatically remembers the address of the parant repository (called origin):

In [None]:
!git remote

In [None]:
!git remote show origin

### pull

We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository:

In [None]:
!git pull origin

We can register addresses to many different repositories, and pull in different changesets from different sources, but the default source is the origin from where the repository was first cloned (and the work origin could have been omitted from the line above).

### push

After making changes to our local repository, we can push changes to a remote repository using `git push`. Again, the default target repository is `origin`, so we can do:

In [None]:
!git status

In [None]:
!git add Hello.py

In [None]:
!git commit -m "added python file" Hello.py

In [None]:
!git push

## Hosted repositories

[Github.com](Github.com) is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed).

With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc. 


![SEUIF97](./img/github-repo.jpg)

## Graphical user interfaces

There are also a number of graphical users interfaces for GIT. 

* **GitHub Desktop** 
  [Download here](https://desktop.github.com/)
  
* **EGit In Eclipse** 
  If you have installed Eclipse, EGit is ready for you
  
* **Git in Visual Studio Code** 
  If you have installed Git and Visual Studio Code [Download here](https://code.visualstudio.com/)   

We strongly recommend that you use version control for your projects. 
![EGit](./img/git-egit.jpg)

![Code](./img/git-code.jpg)



## Further reading

* http://git-scm.com/book
* John McDonnell. Git for Scientists: A Tutorial http://nyuccl.org/pages/GitTutorial/
* Maximilian Koegel,Jonas Helming. EGit Toturial http://eclipsesource.com/blogs/tutorials/egit-tutorial/    
* Scott Chacon，Ben Straub. Pro Git. https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
* 廖雪峰. Git教程  http://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000
* 知乎：怎样使用GitHub. http://www.zhihu.com/question/20070065