![IE](../img/ie.png)

# Session 1: Introduction to git

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data

### Overview

git is a **version control system** that helps us track changes in our code (and actually any text files), allowing the user to go back in time at any previous state and compare two given states.

### References

* Pro Git https://git-scm.com/book/en/v2/
* Changing history, or How to Git pretty http://justinhileman.info/article/changing-history/

### Glossary

* **Repository**: Directory tracked by git, contains a `.git` folder and it's created by `$ git init`
* **Commit**: State or snapshot of the repository, they are created by `$ git commit`
* **Branch**: A parallel or separate line of development, the default one is `master` and they are created by `$ git branch` or `$ git checkout -b`

![Branches](https://git-scm.com/book/en/v2/images/advance-master.png)

### Usage on Windows

The Git installer for Windows comes with [MSYS2](http://www.msys2.org/), "a software distro and building platform for Windows" that, among other things, provides a Linux-like command line interface. The advantage is that Linux is the Operative System used in 96 % of the world's servers, 100 % of TOP500 supercomputers, and 90 % of all Cloud infrastructure[[1](https://www.zdnet.com/article/can-the-internet-exist-without-linux/)][[2](https://itsfoss.com/linux-runs-top-supercomputers/)][[3](https://www.cbtnuggets.com/blog/certifications/open-source/why-linux-runs-90-percent-of-the-public-cloud-workload)], therefore learning it is very good use of one's time.

<div class="alert alert-info">To enable an existing Anaconda installation on Git Bash though, one has to follow some extra steps.</div>

1. Find out where conda is installed, by opening the Anaconda prompt and typing `> where conda`.
2. Look for the `Scripts\conda.exe` path, and translate it to Linux-like syntax by replacing `C:\` by `/c/` and all backward slashes `\` by forward slashes `/`.
3. On Git Bash, run `$ /c/TranslatedPath/Anaconda3/Scripts/conda.exe init bash`, then close the terminal as requested by the instructions.
4. Check that conda works now on Git Bash by running `$ conda --version`.

### Linux command line 101

* `whoami` (who am I)
* `pwd` (print working directory)
* `ls` (list): display contents of current directory
  - `ls --color`: show color
  - `ls -a`: show all files, also hidden ones (those starting with `.`)
  - Two special directories: `.` (current) and `..` (parent)
* `cd`: change directory
* `touch`: create empty file
* `nano`: edit a file from the command line
  - Advanced alternative: `vim`
* `cat` (concatenate): print file contents

### Workflow

To be done only once: https://help.github.com/en/articles/setting-your-username-in-git#setting-your-git-username-for-every-repository-on-your-computer

1. Create a directory `$ mkdir test_project` and navigate there `$ cd test_project`
2. Init a git repository `$ git init`
3. Check status `$ git status` ("on branch master, no commits yet, nothing to commit")
4. Create some files `$ nano README.txt`
5. Stage the files `$ git add README.txt`
6. Commit the changes `$ git commit -m "First commit"`

Summary:

![Workflow](https://git-scm.com/book/en/v2/images/lifecycle.png)

<div class="alert alert-warning">Do not run <code>git init</code> on your home directory, as it can lead to confusion and potential data loss. If <code>git status</code> gives a lot of untracked files unrelated to your project, you might want to <code>rm -rf .git</code> and start in another directory. Notice that this command removes all git history.</div>

### Branching

1. Create **and** checkout to new branch `$ git switch -c new-branch`
2. Commit there (see above)
3. Go back to main branch `$ git switch master`
4. Merge changes `$ git merge new-branch`
5. Delete branch `$ git branch -d new-branch` (don't forget this step!)

Normally, the `git merge` step happens online using [pull requests](https://help.github.com/en/articles/about-pull-requests) or [merge requests](https://docs.gitlab.com/ee/user/project/merge_requests/index.html), which are **not** git concepts, but GitHub/GitLab concepts.

<div class="alert alert-info">If <code>git switch</code> does not work for you, you might have an older version of Git. Consider upgrading, or alternatively replace all <code>git switch -c</code> with <code>git checkout -b</code>.</div>

### Merging

Two types of git merging:

* **Fast-forward merge**: There is no diverging history, and git just "advances the pointer" of the current branch
  - `$ git merge new-branch --ff-only` will fail if a fast-forward merge is not possible
* **Non fast-forward merge**: The history diverged, and git will create a merge commit (hence ask for a commit message) with two parents that combines the two branches
  - `$ git merge new-branch --no-ff` always creates a merge commit even if a fast-forward merge is possible

Non fast-forward merges can end up in conflicts. In that case, git will halt the merge operation and leave traces in the affected files like this:

```
$ cat README.txt
If you have questions, please
<<<<<<< HEAD
open an issue
=======
ask your question in IRC.
>>>>>>> branch-a
```

* To abort a merge `$ git merge --abort` (useful if we are scared and don't know what to do)
* To merge overriding everything with the upcoming branch `$ git merge new-branch --strategy-option theirs`
* To merge overriding everything with the current branch `git merge new-branch --strategy-option ours`

**Be careful** while editing files that are in conflict. [Use your editor](https://www.jetbrains.com/help/pycharm/resolving-conflicts.html).

### Other

* Ignoring files: `$ nano .gitignore` (this file has to be committed to the repository as well), better to use https://www.gitignore.io/
* Amend the last commit: `$ git commit --amend` (for more information, check out the flow chart below)
* Show pretty history: `$ git log --graph --oneline --decorate --all`
* Configuring git aliases: `$ git config --global alias.lg "log --graph --oneline --decorate"` (and now you have `$ git lg`!)

![git flow chart](http://justinhileman.info/article/git-pretty/git-pretty.png)

## Triangular workflows in git

When collaborating with a project hosted online on GitHub or GitLab, the most common setup is having a central repository, one remote fork per user, and local clones/checkouts:

![Triangular workflow](https://github.blog/wp-content/uploads/2015/07/5dcdcae4-354a-11e5-9f82-915914fad4f7.png?resize=2000%2C951)

(Source: https://github.blog/2015-07-29-git-2-5-including-multiple-worktrees-and-triangular-workflows/)

Following this workflow requires discipline and sticking to a subset of actions and git commands to avoid common mistakes. This website contains all you need to know to setup your triangular workflow and we don't need to reproduce it here:

https://www.asmeurer.com/git-workflow/

*Notice* the different naming conventions between this website and the first image:

1. **Convention 1**: upstream/origin/local
2. **Convention 2**: origin/&#x3C;username&#x3E;/local

We will be consistent with the Aaron Meurer guide and therefore use Convention 2 all the time.

### ⚠ After creating a pull request ⚠

After your pull request has been merged to `master`, your local `master` and `<username>/master` will be outdated with respect to `origin/master`. On the other hand, **you should avoid working on this branch anymore in the future**: remember branches should be ephemeral and short-lived.

To put yourself in a clean state again, you have to:

1. Click "remove branch" in the pull request (don't click "remove fork"!)
2. `git checkout master` (go back to `master`)
3. `git fetch origin` (**never, ever use `git pull` unless you know exactly what you're doing**)
4. `git merge --ff-only origin master` (update your local `master` with `origin/master`, and fail if you accidentally made any commit in `master`)
5. `git fetch -p <username>` (✨ acknowledge the removal of the remote branch ✨)
6. `git branch -d old-branch` (remove the old branch)
7. `git push <username> master` (update your fork with respect to `origin`)
8. `git checkout -b new-branch` (start working in the new feature!)

This process **has to be repeated after every pull request**.

🌈 

<div class="alert alert-info">Some organizations where all the members are trusted do not use forks, and everybody pushes their branches to the same repository instead. While this simplifies some parts of the workflow, it also requires proper checks in place to prevent bad code to be merged - for example, by <a href="https://help.github.com/en/github/administering-a-repository/enabling-required-reviews-for-pull-requests">requiring a minimum number of reviews</a> or <a href="https://help.github.com/en/github/administering-a-repository/about-required-status-checks">some automated status checks</a>.</div>