# Version Control and Git

## 1.  Introduction

In any software development, one of the most important tools are **version control software** 

They are used in virtually all software development and in all environments, by everyone and everywhere.

Version control an used on **almost any digital content**, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!



### There are two main purposes of VCS systems:

#### 1. Keep `track of changes` in the source code.
   
* Allow `reverting `back to an older revision if something goes wrong.

* Work on several `"branches"` of the software concurrently.

* `Tags revisions` to keep track of which version of the software that was used for what (for example, "release-1.0", "paper-A-final", ...)

#### 2. Make it possible for serveral people to `collaboratively work` on the same code base simultaneously.

* Allow many authors to make changes to the code.

* Clearly communicating and visualizing changes in the code base to everyone involved.

### Basic principles and terminology for VCS systems

In an VCS, the source code or digital content is stored in a **repository**. 

* The repository does not only contain the **latest version of all files**, but **the complete history of all changes** to the files since they were added to the repository. 


* A user can **checkout** the repository, and obtain **a local working copy of the files**. All changes are made to the files in the local working directory, where files can be added, removed and updated. 


* When a task has been completed, the changes to the local files are **commited** (saved to the repository).


* If someone else has been making changes to the same files, a **conflict** can occur. In many cases conflicts can be **resolved** automatically by the system, but in some cases we might manually have to **merge** different changes together.


* It is often useful to create a new **branch** in a repository, or a **fork** or **clone** of an entire repository, when we doing larger experimental development. The main branch in a repository is called often **master** or **trunk**. When work on a branch or fork is completed, it can be merged in to the master branch/repository.


* With distributed VCSs such as **GIT** or **Mercurial**, we can **pull** and **push** changesets between different repositories. For example, between a local copy of there repository to a central online reposistory,for example on a community repository host site like [github.com](github.com).

## 2 GIT

* Created by ·Linus Torvalds` 2005 ,https://github.com/torvalds

### Why git?

 * Popular (~50% of open source projects)
 * Truly distributed
 * Very fast
 * Everything is local
 * Free
 * Safe against corruptions
 * **GitHub!**
 
### GitHub

GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.

This tutorial teaches you GitHub essentials like repositories, branches, commits, and Pull Requests. You’ll create your own Hello World repository and learn GitHub’s Pull Request workflow, a popular way to create and review code.

https://guides.github.com/activities/hello-world/

**In the rest of this lecture we will look at `git`.**

### 2.1 Setting Up Git
### 2.1.1 Installing git

#### On Windows
 
Download git https://github.com/git-for-windows/git/releases and run the downloaded installer.
        
#### On Debian/Ubuntu:
    
```bash
$sudo apt install git
```

### 2.1.2  Configure the author information:

The first time you start to use git, you'll need to configure your author information:

In [2]:
!git config --global user.name  thermalogic
!git config --global user.email cmh@seu.edu.cn

**Note:**  if you have the account of Github,Please use the same: user.name and user.email

The settings are kept in **"<GIT_HOME>/etc/gitconfig"** (of the GIT installed directory) and **"<USER_HOME>/.gitconfig"** (of the user's home directory.
You can issue
```bash
>git config --list
```
to list the settings: 

In [1]:
!git config --list

core.symlinks=false
core.autocrlf=true
core.fscache=true
color.diff=auto
color.status=auto
color.branch=auto
color.interactive=true
help.format=html
rebase.autosquash=true
http.sslbackend=schannel
diff.astextplain.textconv=astextplain
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
credential.helper=manager
user.name=thermalogic
user.email=cmh@seu.edu.cn
credential.helper=wincred
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
core.autocrlf=false
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true


## 3 Getting Started with Local Repo

There are 2 ways to start a Git-managed project:

* **Starting your own project**;

* Cloning an existing project from a GIT host.

Here, We shall begin with **Starting your own project**,then manage the project under Git, add  files, and commit it to the repo.

### 3.1 Setup the Working Directory for a New Project

Let's start a programming project under the working directory called **gitdemo** with one source file **Hello.c** under **D:/PySEE/**

The working directory: D:/PySEE/gitdemo


In [None]:
%cd D:/PySEE/
%mkdir gitdemo
%cd gitdemo
%pwd

The source file **Hello.c** in **gitdemo** 

In [None]:
%%file ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("C Hello, world!\n");  
    return 0;                   
}                             

###  3.2 Manage the project under Git - Initialize a new local Git Repo 

To manage a project under Git, run "git init" at the project root directory 
```bash
>git init 
```

In [None]:
%cd D:/PySEE/gitdemo

In [None]:
# manage the project under Git
!git init 

A **hidden** sub-directory called **.git** will be created under your project root directory (as shown in the above **"ls"** listing), which contains ALL Git related data.
![git-repo](./img/git-repo.jpg)
Take note that **EACH Git repo** is associated with **a project directory** (and its sub-directories). The Git repo is completely contain within the project directory. Hence, it is safe to copy, move or rename the project directory. 

### 3.3 Cloning a Project from a Remote Repo

You may **"git clone <remote-url>"** to copy from an existing remote project

If we want to fork or clone an existing repository, we can use the command 
```bash
>git clone repository
```

In [None]:
!git clone https://github.com/PySEE/SEUIF97

Git clone can take a URL to a public repository, like above, or a path to a local directory:

In [None]:
!git clone gitdemo gitdemo2

## 4  Adding files,Status and committing

### 4.1 Add a new file

To add a new file to the repository, we first create the file and then use the command:
```bash
>git add filename

```

In [None]:
!git add hello.c

### 4.2 Status

Using the command 
```base
>git status
```
we get a summary of the current status of the **working directory**. It shows if we have modified, added or removed files.

In [None]:
!git status

In this case, after having added the file `hello.c`, the command `git status` list it as an *untracked* file and has not yet been **commited** to the **repository**.

It is therefore not in the repository.

### 4.3 Commit

In [None]:
!git commit -m "added hello.c file"   hello.c

In [None]:
!git status

### 4.3 Add a python file

In [None]:
%%file ./hello.py

print('Hello,World,Python!')

In [None]:
!git add hello.py

In [None]:
!git commit -m "added python file" hello.py

After *committing* the change to the **repository** from **the local working directory**, `git status` again reports that working directory is clean.

In [None]:
!git status 

### 4.5 Commiting changes

When files that is tracked by GIT are changed, they are listed as *modified* by `git status`:

In [None]:
%%file ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("C Hello, world!\n"); 
    // new line
    printf("Test Commiting changes!\n"); 
    return 0;                   
}       

In [None]:
!git status

Again, we can commit such changes to the repository using the `git commit -m "message"` command.

In [None]:
!git commit -m "added one more line in hello.c"  hello.c

In [None]:
!git status

## 5 File Status

### 5.1 Git Storage Model

A file could be `untracked` or `tracked`.

As mentioned, Git tracks file changes at commits.

In Git, changes for a `tracked` file could be:

* 1 `unstaged` (in **Working Tree**)` - called unstaged changes,

* 2 `staged` (in **Staging Area** or Index or **Cache**)` - called staged changes, or

* 3 `committed` (in **local repo** object database)`.

**Git Storage Model**

![GitStorageModel](./img/Git_StorageModel.png)


The files in **"working tree"** or **"staging area"** could have status of `unmodified, added, modified, deleted, renamed, copied`, as reported by `"git status"`.

The `"git status"` output is divided into 3 sections:

* "Changes not staged for commit" for the unstaged changes in "working tree", 

* "Changes to be committed" for the staged changes in the "staging area", 

* and "Untracked files". 

In each section, It lists all the files that have been changed, i,e., files having status other than unmodified.

### 5.2 When a new file is created in the working tree, 

* it is marked as new in working tree and shown as an untracked file. 

* When the file change is staged, it is marked as new (added) in the staging area, and unmodified in working tree. 

* When the file change is committed, it is marked as unmodified in both the working tree and staging area.

![Git_FileNew](./img/Git_FileNew.png)

### 5.3 When a committed file is modified,

* it is marked as modified in the working tree and unmodified in the staging area.

* When the file change is staged, it is marked as modified in the staging area and unmodified in the working tree. 

* When the file change is committed, it is marked as unmodified in both the working tree and staging area.

![GitFileModified](./img/Git_FileModified.png)



## 6 Removing files

To remove file that has been added to the repository, use 
```bash
git rm filename, 
```
which works similar to ```git add filename```:

In [None]:
%%file ./tmpfile

A short-lived file.

Add it:

In [None]:
!git add tmpfile

In [None]:
!git commit -m "adding file tmpfile" tmpfile 

In [None]:
%ls

Remove it again:

In [None]:
!git rm tmpfile

In [None]:
!git commit -m "remove file tmpfile" tmpfile 

In [None]:
%ls

## 7 Commit logs

The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the `-m "message"` is omitted when invoking the `git commit` message an editor will be opened for you to type a commit message (for example useful when a longer commit message is requried). 

We can look at **the revision log** by using the command `git log`:

In [None]:
!git log

In the commit log, each version is shown with a `timestampe`, a `unique` has tag that, and `author` information and the commit message.



## 8 Diffs

All commits results in a changeset, which has a **diff** describing the changes to the file associated with it. We can use `git diff` so see what has changed in a file:

In [None]:
%%file ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("Hello, world!\n"); 
    // new line
    printf("Test Commiting change!\n");
    // new line
    printf("Test diff!\n");
    return 0;                   
}       


In [None]:
!git diff hello.c

That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.

In **VS code**, it can look like this:

![git-code-diff](./img/git-code-diff.jpg)


## 9 Discard changes in the working directory

To discard a change (revert to the latest version of the file in the repository) we can use the `checkout` command like this:

In [None]:
!git checkout -- hello.c

In [None]:
!git status

## 10 Branching

With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when serveral people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk.

### 10.1 Create a new branch

In GIT, we can create a new branch:

In [None]:
!git branch test1

We can list the existing branches like this:

In [None]:
!git branch

And we can **switch between branches** using `checkout`:

In [None]:
!git checkout test1

Make a change in the new branch.

In [None]:
%%file ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("Hello, world!\n"); 
    // new line
    printf("Test !\n");
    // new line
    printf("Test Commiting changes!\n");
     // new line
    printf("Test test1 branch!\n");
    return 0;                   
}   


In [None]:
!git commit -m "added a line in expr1 branch" hello.c

In [None]:
!git branch

In [None]:
!git checkout master

In [None]:
!git branch

Your current working version hello.c in master branch

In [None]:
# %load ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("C Hello, world!\n"); 
    // new line
    printf("Test Commiting changes!\n"); 
    return 0;                   
}       

Return to test1 branch

In [None]:
!git checkout test1

In [None]:
!git branch

Your current working version hello.c in test1 branch

In [None]:
# %load ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("Hello, world!\n"); 
    // new line
    printf("Test !\n");
    // new line
    printf("Test Commiting changes!\n");
     // new line
    printf("Test test1 branch!\n");
    return 0;                   
}   

### 10.2  Merge the existing branch 

We can merge an existing branch and all its changesets into another branch (for example the master branch) like this:

**1** First change to the target branch:

In [None]:
!git checkout master

**2** then, merge to the target branch:

In [None]:
!git merge test1

In [None]:
!git branch 

Your current working version `hello.c` in `master` branch merged with `test1` branch 

In [None]:
# %load ./hello.c

#include <stdio.h>  
 
int main() {                   
    printf("Hello, world!\n"); 
    // new line
    printf("Test !\n");
    // new line
    printf("Test Commiting changes!\n");
     // new line
    printf("Test test1 branch!\n");
    return 0;                   
}   

you can delete the branch `test1` now that it has been merged into the master:

In [None]:
!git branch -d test1

In [None]:
!git branch

## 11 Pulling and pushing change sets between repositories

### 11.1 Hosted repositories

[Github.com](Github.com) is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed).

With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc. 

Creat the repo in your github within the name,which is the same as your local repo
 
**Note** the repo must is empty,without README.md

#### 1 Add the repo at github as the remote origin of the locale git repo

In [None]:
!git remote add origin https://github.com/your-username/gitdemo.git 

In [None]:
!git remote

#### 2 push

After making changes to our local repository, we can push changes to a remote repository using `git push`. Again, the default target repository is `origin`, so we can do:

In [None]:
!git status

In [None]:
%%file ./hello.py

print('Hello,World,Python!')
print('Test Push!')

In [None]:
!git add hello.py

In [None]:
!git commit -m "added python file" hello.py

In [None]:
!git push -u origin master

#### 3 Pull

We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository:

In [None]:
!git pull origin

you may shallowly clone the branch of repository for saving bandwidth

In [None]:
!git clone --depth 1 -b master https://github.com/your-username/gitdemo.git 

## 12 Graphical user interfaces

There are also a number of graphical users interfaces for GIT. 

* **GitHub Desktop** 
  [Download here](https://desktop.github.com/)
 
  
* **Git in Visual Studio Code** 
  If you have installed Git and Visual Studio Code [Download here](https://code.visualstudio.com/)   

We strongly recommend that you use version control for your projects.

## Further reading

* http://git-scm.com/book

* How to get started with GIT and work with GIT Remote Repo  http://www3.ntu.edu.sg/home/ehchua/programming/howto/Git_HowTo.html
   
* Scott Chacon，Ben Straub. Pro Git. https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

* 廖雪峰. Git教程  http://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000

* 知乎：怎样使用GitHub. http://www.zhihu.com/question/20070065