# What is version control? Why we need it?
- Keeps track of all changes in an organized way
- A must for collaboration

# What is `git`? 

# How does it work?
- Saves changes
- Changes the file system
- History cannot be re-written

# `git` workflow diagram

<img src="img/git-workflow.png">

Open a terminal, run 

    git 
    
to see if it is already installed. Otherwise go to https://git-scm.com/downloads and download a release for your system.

In [1]:
!git

usage: git [--version] [--help] [-C <path>] [-c name=value]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone      Clone a repository into a new directory
   init       Create an empty Git repository or reinitialise an existing one

work on the current change (see also: git help everyday)
   add        Add file contents to the index
   mv         Move or rename a file, a directory, or a symlink
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect     Use binary search to find the commit that introduced a bug
  

# Set up your credentials
    git config --global user.email "email"
    git config --global user.name "username"

# Create a repo

`git init` creates a repo in the current folder. A repo is essentially a folder with a collection of certain files, which save some `git`-related information.

In [27]:
!git init test_repo

Initialised empty Git repository in /media/sergey/DATA/Projects/advanced-python-training/test_repo/.git/


In [28]:
%cd test_repo

/media/sergey/DATA/Projects/advanced-python-training/test_repo


`git status` is a command which will tell you if there are any files changed since the last commit (last saved state of the repo)

In [29]:
!git status

On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)


Create a file `some_text.txt` inside the repo folder

In [30]:
%%file some_text.txt
This is a file containing some text.

Writing some_text.txt


Now you can see that the file is changed

In [31]:
!git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31msome_text.txt[m

nothing added to commit but untracked files present (use "git add" to track)


To commit this change to the project, you first need to "stage" the change using `git add`. You can add separate files (or folders) or all changes by using a dot: `git add .`

In [32]:
!git add some_text.txt

In [35]:
!git status

On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	[32mnew file:   some_text.txt[m



Then you can commit the change. Always add a commit message to the change, describing, as best you can, the changes you made in this commit.

In [36]:
!git commit -m"Added a txt file for testing"

[master (root-commit) 552f5c8] Added a txt file for testing
 1 file changed, 1 insertion(+)
 create mode 100644 some_text.txt


In [37]:
!git status

On branch master
nothing to commit, working directory clean


`git log` will print out the history of commits in this project. Use flag `-n <number>` to specify the number of commits to display.

In [38]:
!git log

[33mcommit 552f5c80e1d8d1528b6993ccdd259bf8839e29c6[m
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 10:34:07 2018 +0200

    Added a txt file for testing


Let's modify the file we had:

In [43]:
%%file some_text.txt
This is a file containing some text. Now I added some more text.

Overwriting some_text.txt


Checking the status we can see that git detected the change:

In [45]:
!git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   some_text.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


`git diff` can be used to examine the exact lines changed in each file.

In [46]:
!git diff

[1mdiff --git a/some_text.txt b/some_text.txt[m
[1mindex d10d321..c9c9961 100644[m
[1m--- a/some_text.txt[m
[1m+++ b/some_text.txt[m
[36m@@ -1 +1 @@[m
[31m-This is a file containing some text.[m
\ No newline at end of file[m
[32m+[m[32mThis is a file containing some text. Now I added some more text.[m
\ No newline at end of file[m


You can commit this change by adding it first and then using the commit operation. This will add another step in the chain of commits. However, if you don't like this change and want to return to the previous step, you can use `git reset --hard`. Hard reset discards all the changes until the last commit, so be careful. You can also use `git reset` to unstage the changes you staged before.

In [47]:
!git reset --hard

HEAD is now at 552f5c8 Added a txt file for testing


Look at the files we just changed and see that the change was reverted. (`cat` is a Shell command to print out the contents of a file)

In [54]:
!cat some_text.txt

This is a file containing some text.

# Pushing to the remote repo
- Create an account on github.com
- Go to Your Profile -> Repositories, click on New
- Create a repo with name `test_repo`
- Copy its address (Clone or download green button -> then copy the address)

Then run:

    git remote add origin <copied address>
    git push origin master
    
Enter your github credentials and if everything works, refresh the repo page and see that the files appeared there. Also take a look at "commits" page and see your commit message there.

From now on, to push this branch you only need to type `git push`, as the address for the repo is already set up.

# Cloning a repo

Go to: https://github.com/antopolskiy/advanced-python-training and copy the repo address (under green button "Clone or download").

In the console, navigate to the folder where you want to have a directory with course materials. Then run:

    git clone <repo address>
    
This will download all of the repo into a dedicated folder.

*(Now you can run *`jupyter notebook`* in the same console and open this notebook for yourself)*

You now know how to:

- create a new repo
- commit files to the repo, adding useful commit messages about the changes
- connect that repo to the remote repo on your github account and push updates there
- look at the changes you made since the last commit
- reset unwanted (e.g. temporary) changes before they are commited
- clone another remote repository

>*Note that instead of creating a repo locally with* `git init` *and then connecting it to the existing repo on Github, you can create a repo on Github and then clone it. Then remote address will be set up automatically.*

# `git` collaboration workflow

### Branches

Why we need branches?

Create a new branch. The new branch will be created from the current branch, i.e. it will be a copy of the current branch. Now you are on a `master` branch.

In [57]:
!git branch test_branch

To switch to the new branch, use `git checkout <branch name>`

In [59]:
!git checkout test_branch

Switched to branch 'test_branch'


Let's create a new file on that branch

In [60]:
%%writefile helloworld.py
print('Hello World!')

Writing helloworld.py


In [61]:
!git status

On branch test_branch
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31mhelloworld.py[m

nothing added to commit but untracked files present (use "git add" to track)


In [62]:
!git add .

In [63]:
!git commit -m"added helloworld.py"

[test_branch c863e0d] added helloworld.py
 1 file changed, 1 insertion(+)
 create mode 100644 helloworld.py


When you first push a new brach to the remote repo, you need to specify a remote address (which repo and the name of the remote branch) with which this new branch is associated. In our case we just have 1 remote connection called `origin` and the name of the remote branch is the same as the local branch (this is usually the case).

In [81]:
!git push origin test_branch

Everything up-to-date


On Github, examine that you have the new branch available.

After you have finished developing a feature on that branch, you can merge it to the `master` *(in big projects usually there is an additional `devel` branch, to which you need to merge first, and `master` is only for stable versions)*. You can do it locally or on Github, let's do locally. Switch to the `master` branch:

`git merge` will merge the branch you specified into the **current** branch you're on. If we want to merge test_branch into master, we need to switch to master first:

In [83]:
!git checkout master

Switched to branch 'master'


In [84]:
!git merge test_branch

Updating 552f5c8..c863e0d
Fast-forward
 helloworld.py | 1 [32m+[m
 1 file changed, 1 insertion(+)
 create mode 100644 helloworld.py


In [86]:
!git log

[33mcommit c863e0d2ff2390ae65e25884860da3d3ea0f0313[m
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 13:09:48 2018 +0200

    added helloworld.py

[33mcommit 552f5c80e1d8d1528b6993ccdd259bf8839e29c6[m
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 10:34:07 2018 +0200

    Added a txt file for testing


This is a super-easy merge, because the changes on the branches are not conflicting, in fact they are to completely different files. But what happens if the changes are to the existing files or conflicting with other branches?

In the simple cases, `git` will try to merge the changes together. For example, let's make a change on our test branch to the original `.txt` file:

In [94]:
!git checkout test_branch

Switched to branch 'test_branch'


In [95]:
%%file some_text.txt
I changed the content of the file!

Overwriting some_text.txt


In [96]:
!git add .

In [97]:
!git commit -m"changes to the txt file"

[test_branch a34defa] changes to the txt file
 1 file changed, 1 insertion(+), 1 deletion(-)


In [98]:
!git checkout master

Switched to branch 'master'


In [99]:
!git merge test_branch

Updating 8a8f5df..a34defa
Fast-forward
 some_text.txt | 2 [32m+[m[31m-[m
 1 file changed, 1 insertion(+), 1 deletion(-)


Essentially we didn't even notice this merge, because `git` knows the dates of all commits and it knows that there were no changes on the `master` which we didn't see. Therefore it assumes that we are aware of the fact that we deleted the content and we did it willingly.

Another scenario: there were some changes on the `master` which conflict with the `test_branch`. This often happens if you work with other people, who will merge their changes into `master`.

In [105]:
%%file some_text.txt
This is the master version of the .txt file

Overwriting some_text.txt


In [107]:
!git add .
!git commit -m"changes to the txt file"

[master d184f45] changes to the txt file
 1 file changed, 1 insertion(+), 1 deletion(-)


In [108]:
!git checkout test_branch

Switched to branch 'test_branch'


In [109]:
%%file some_text.txt
This is the test_branch version of the .txt file

Overwriting some_text.txt


In [110]:
!git add .
!git commit -m"changes to the txt file"

[test_branch c7b1a8d] changes to the txt file
 1 file changed, 1 insertion(+), 1 deletion(-)


In [111]:
!git checkout master

Switched to branch 'master'


In [112]:
!git merge test_branch

Auto-merging some_text.txt
CONFLICT (content): Merge conflict in some_text.txt
Automatic merge failed; fix conflicts and then commit the result.


In [113]:
!git status

On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")

Unmerged paths:
  (use "git add <file>..." to mark resolution)

	[31mboth modified:   some_text.txt[m

no changes added to commit (use "git add" and/or "git commit -a")


Open `some_text.txt` file, you should see this:

In [114]:
cat some_text.txt

<<<<<<< HEAD
This is the master version of the .txt file
This is the test_branch version of the .txt file
>>>>>>> test_branch


This shows you both versions and it asks you to resolve the conflict. Basically, just make the file look like you want to look after merge. In my case I want not to discard any of the content, but I want both strings to remain, so I change the file like this:

In [115]:
%%file some_text.txt
This is the master version of the .txt file
This is the test_branch version of the .txt file

Overwriting some_text.txt


Now we need to commit these changes to finish the merge:

In [118]:
!git add .
!git commit -m"merged test_branch into master, conflict resolved"

[master a0d1839] merged test_branch into master, conflict resolved


This is how you resolve a conflict in the merge from the command line, which is the most low-level way. 

There are `git` GUIs which allow you to do this visually, without the need to edit files manually, it will show you the changes and the conflicts and let you choose one of the options or leave both, etc. One of such tools is IDE called Atom, which we will look at later. But there are others, from very basic to highly complicated, made to work with huge projects and many branches. List of GUIs can be found on the official `git` website: https://git-scm.com/downloads/guis

# Using the power of version control

Navigating the commit tree

    git log
    
Look at the log diagram
    
Check out commits, creating branches from commits

Look at the log in console, or go to Github and look at the log there:

In [126]:
!git log -n 3

[33mcommit a0d1839f7063476bf888cee34ea1c370db8f380e[m
Merge: d184f45 c7b1a8d
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 13:49:42 2018 +0200

    merged test_branch into master, conflict resolved

[33mcommit c7b1a8d81275a3fa82fc64d550cd7ebf255a14bf[m
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 13:30:29 2018 +0200

    changes to the txt file

[33mcommit d184f45285b04c619d4ac1beb4b2e748f5bdc55d[m
Author: santopolskiy <s.antopolskiy@camlintechnologies.com>
Date:   Fri Jun 29 13:30:14 2018 +0200

    changes to the txt file


`git log` has some interesting parameters, like `--oneline` and `graph` which can help you depending on what you want to do:

In [132]:
!git log -n 5 --oneline

[33ma0d1839[m merged test_branch into master, conflict resolved
[33mc7b1a8d[m changes to the txt file
[33md184f45[m changes to the txt file
[33ma34defa[m changes to the txt file
[33m8a8f5df[m changes to the txt file


Look at how branching looks on the log graph:

In [133]:
!git log -n 5 --graph --oneline

*   [33ma0d1839[m merged test_branch into master, conflict resolved
[31m|[m[32m\[m  
[31m|[m * [33mc7b1a8d[m changes to the txt file
* [32m|[m [33md184f45[m changes to the txt file
[32m|[m[32m/[m  
* [33ma34defa[m changes to the txt file
* [33m8a8f5df[m changes to the txt file


# Navigating the commit tree
You can `checkout` any commit as if you `checkout` a branch, using the hash of the commit:

In [134]:
!git checkout a34d

Note: checking out 'a34d'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at a34defa... changes to the txt file


`git` gives you a very helpful message when you do that: basically that the state of the file system now is not attached to any branches, it is "detached" from anything ("detached HEAD"). If you want to continue doing some changes from this point, you need to create a new branch from this commit by running:

    git branch <new-branch-name>
    git checkout <new-branch-name>
    
Or a shortcut ("a combo") for these 2 commands:

    git checkout -b <new-branch-name>
    
Checking out commits can be extremely useful when you want to compare the current version of your code with the onl version.

If you want to go back to one of the top (or `HEAD`s) of your normal branches, just use `git checkout` like you would normally do to checkout the branch.

# Contributing to an open-source project

## Forking

Working with Python means working with open source projects. At one point or another you would like to change something in one of these projects, to fit your need, or even contribute to it. Github makes it very easy to do in a structured way. We will try it now.

Go to the course repo (https://github.com/antopolskiy/advanced-python-training) and click on "Fork" button on the top right. This will create a copy of the repo on your Github account (by the way, you can use it to keep your own copy of a repo if you are afraif it will change in the future). Go back to your repos and copy the address of the fork. The clone it using `git clone <repo address>`.

Create a file `fork_<your name>.txt`. Stage the file and commit the change. Use `git push` to push the file to your remote fork.

## Creating a pull request

Now you can navigate to your fork on Github and click the button "New pull request". There you can select into which branch of the original repo you would like to merge your changes. If everything is fine, write a message to the maintainers of the original repo (usually explaining what the change does and why you want to implement it) and submit the pull request. It will create a special page on the original repo's "Pull requests" section, where other people can see the changes you made and discuss them. If everything goes well, your change will be merged with the project and you will officially become a contributor to the project!

# Other details

## `.gitignore`

`.gitignore` is a special file in the repo directory, which tells `git` which files to ignore completely when tracking. This is useful in case of:
- Big files in the repo (e.g. data files) which you don't want to commit
- Test files which you don't need to commit
- Something produced by the scripts (e.g. figures for the analysis)
- Other kind of temporary files, which doesn't make sense to commit

How to do it:
Simply create a file callen `.gitignore` (yes, it doesn't have a name, just an extension). Edit the file with text editor. Each line is 1 ignore statement, which can be:
- a file name
- a folder name (add `/` in the end of the name)
- a name pattern, for example using "wildcard character" `*`; in this case `*test*` means that any file or folder which has `test` in it, will be ignored

For more advanced patterns, see documentation for `gitignore`: https://git-scm.com/docs/gitignore