# Git and GitFlow

David Orme

# Workshop format

* Tricky to do worked examples
* Slides as a presentation but do ask questions!

# Other sources of help 

* The [Multilingual Quantitative Biologist](https://mhasoba.github.io/TheMulQuaBio/notebooks/03-Git.html) introduction to Git. Heavily pirated here!
* Reference book: <http://git-scm.com/book>
* Tutorials: <https://try.github.io>


# Tools

* `git` is the **software**
* GitHub is a server system providing **`git`**
* `git` is a **command line application**
* Some programs provide a **graphical user interface**
    * [Visual Studio Code](https://code.visualstudio.com/)
    * [Github Desktop](https://desktop.github.com/)
    * [PyCharm](https://www.jetbrains.com/pycharm/)

# Version Control

* Version control tracks changes to **text files** automatically.
* Often computer code but can be data files etc.
* A **repository** is a directory containing files under version control.
* All changes to files are archived and can be recalled.
* Provide **repository branches** for code development.


# Why Version Control?

* Roll back changes to code and other data
* A remote backup of a code project
* Organise code cleanly 
* Collaborate with others on developing new code
* Distribute new code to collaborators

<!--
![cvs.png](attachment:83bd501d-b9ba-41da-92a0-c45a9768238d.png)
http://maktoons.blogspot.com/2009/06/if-dont-use-version-control-system.htm
-->

# Local vs remote

* A repository can exist as a single **local** directory
* Code synchronised with one (or more!) **remote** repositories

![Centralised_Git.png](attachment:abe696ff-2563-4987-9042-ae42df41c397.png)

# Decentralised

* But a local repository **can** have other **remote** repositories.

![Decentralised_Git.png](attachment:dfc6a586-c721-4b73-a08b-4da8e6e19805.png)

# Branching

* Repositories can contain **branches**
* A branch tracks an alternative parallel sets of changes
* Can contain changes to code
* Can contain completely different content such as `gh-pages`

![GitFlow.png](./images/git/GitFlow.png)


# Main git commands

|     Command    |   What it does     |
| :------------- |:-------------| 
|`git init`|           Initialize a new repository|
|`git clone`|          Download a repository from a remote|
|`git status`|         Show the current status of a branch|
|`git diff`|           Show differences between commits|
|`git blame`|          See who changed a file|
|`git log`|            Show commit history|
|`git commit`|         Commit changes to current branch|
|`git branch`|         Show branches|
|`git branch name`|    Create new branch|
|`git checkout name`|  Switch to a different code state (e.g. branch)|
|`git fetch`|          Get remote changes without merging|
|`git merge`|          Merge versions of files|
|`git pull`|           Synchronise code state to remote|
|`git push`|           Send committed changes to remote |

# Making changes to a branch

![git.png](./images/git/git.png)

<!-- https://blog.osteele.com/2008/05/my-git-workflow/ -->


# Git `status`

```bash
On branch feature/isotopic_discrimination
Your branch is ahead of 'origin/feature/isotopic_discrimination' by 3 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   pyrealm/pmodel.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    .pylintrc
    old_untidyfile.xlsx
```

# Ignoring Files

* Not all files in a repository should be tracked
    * Created files - logs, compiled code 
    * Files that do not `git` well
* Use the `.gitignore` files to tell git to ignore patterns.
    * `./.gitignore`
    * `./specific/subdirectory/.gitignore`
    * ` ~/.gitignore_global` 
* The `.gitignore` files are **part of the repository**

#`.gitignore` patterns

|     Pattern    |   gitignore result     |
| :------------- |:-------------| 
| `#comment` | This is a comment |
|`target`|     Ignore files/directories called `target` everywhere|
|`target/`|   Ignore directories (trailing `/`) named `target` everywhere |
|`/target`|   Ignore files/directories called `target` in the project root (leading `/`)|
|`/target/`|  Ignore directories called `target` in the project root|
| `*.ext` | Ignore anything with the extension `.ext`|
| `*.py[co]`|  Ignore anything ending in `.pyc` or `.pyo`|


### `.gitignore`-ng after `commit`-ing

If you find that a file or directory belonging to a pattern that you incuded in your `.gitignore` fails to be ignored (still comes under version control), it most likely means that you gitignored it AFTER committing and pushing it. In this scenario, you need to use 

```bash
git rm --cached <file>
```
for a file, and 

```bash 
git rm -r --cached <folder>
```
for a directory. 

While these commands will not remove the physical file from your local repository, it will remove the files from other locals on their next `git pull`.

# What does not `git` well?

* The `git` index records `diffs`: incremental changes in text files.
* The entire history of changes are stored in the `.git` directory.
* Recording only `diffs` keeps `.git` small.
* `git count-objects -vH` gives repo size.

```{admonition}Binary files
:class: danger

You **cannot** store diffs for changes in binary files - so `git` stores a **complete new copy** when a binary file is changed.

```

## The README file

A README (like the `README.md` that you created in your git repo above) is a text file that introduces and explains a project. It contains information that is required to understand what the project is about and how to use or run it. 

While READMEs can be written in any text file format, Markdown (saved as an `.md` file) is most commonly used, as it allows you to add simple text formatting easily. Two other formats that you might most often see are plain text and [reStructuredText](https://en.wikipedia.org/wiki/ReStructuredText) (saved as an `.rst` file, common in Python projects).

You can find many README file suggestions (and templates) online. Essentially, it should ideally have the following features/content:  

* **Project name / title**
* **Brief description**: what your project does and/or is for. Provide context and add links to any references to help new visitors.
* **Languages**: List language(s) and their versions used in the project
* **Dependencies**: What special packages (which are not part of standard libraries of the language(s) used) might be needed for a new user to run your project
* **Installation**: Guidelines for installing the project (if applicable), including dependencies.
* **Project structure and Usage**: How the project is structured and how to run/use it. Explain, if relevant, what specific files do. No need to list every file, such as data or experimental ones (like the ones in `sandbox`). 
* **Author name and contact**

In addition, you may want to include(but not necessary for your current coursework), [License](https://en.wikipedia.org/wiki/Software_license), Acknowledgments, and instructions for Contributing.

The 100M means 100 mb – you can reset it to whatever you want.

*Then what about code that needs large files?* For this, the best approach is write code that scales up with data size. If it works on a 1 mb file, it should also work on a 1000 mb file! If you have written such code, then you can include a smaller file as a MWE (minimum working example).   

*And how do you back up your large data files?* Remember, version control software like git are not meant for backing up data. The solution is to back up separately, either to an external hard drive or a cloud service. `rsync` is a great Linux utility for making such backups. Google it! 

You may also explore alternatives such as `git-annex` (e.g., [see this](https://git-annex.branchable.com)), and `git-lfs` (e.g., [see this](https://www.atlassian.com/git/tutorials/git-lfs)).

```{tip}
**Checking size of your git repo**: You have two options in Linux/UNIX to check the size of your git repo. You can use (`cd` to your repo first) `du -sh .git`, or for more detailed information about what's using the space, use `git count-objects -vH` (this will work across platforms as this is a git command).

```






## Removing files

To remove a file (i.e. stop version controlling it) use `git rm`

### Un-tracking files

`.gitignore` will prevent untracked files from being added to the set of files tracked by git. However, git will continue to track any files that are already being tracked. To stop tracking a file you need to remove it from the index. This can be achieved with this command.

```bash
git rm --cached <file>
```
The removal of the file from the head revision will happen on the next commit.


### Reverting to a previous version

If things go horribly wrong with new changes, you can revert to the previous, "pristine" state:

```bash 
git reset --hard
git commit -am "returned to previous state" #Note I used -am here
```

If instead you want to move back in time (temporarily), first find the
“hash” for the commit you want to revert to, and then check-out:

```bash
git status
```
And then, 

```bash 
git log
```

Then, you can 

```bash
git checkout *version number*
```

e.g, `git checkout 95f7d0`

Now you can play around. However, if you do want to commit changes, you create a "branch" (see below). To go back to the future, type 

```bash 
git checkout master
```

## Branching

Imagine you want to try something out, but you are not sure it will work well. For example, say you want to rewrite the Introduction of your paper, using a different angle, or you want to see whether switching to a library for a piece of code improves speed. What you then need is branching, which creates a project copy in which you can experiment:

`git branch anexperiment`

`git branch`

`git checkout anexperiment`

`git branch`

`echo "Do I like this better?" >> README.txt`

`git commit -am "Testing experimental branch"`

If you decide to merge the new branch after modifying it:

`git checkout master`

`git merge anexperiment`

`cat README.txt`

Unless there are conflicts, i.e., some other files that you changed locally had diverged from those files in the Master  in the meantime (due to new changes pushed by another collaborator), you are done, and you can delete the branch:

`git branch -d anexperiment`

If instead you are not satisfied with the result, and you want to
abandon the branch:

`git branch -D anexperiment`

When you want to test something out, always branch! Reverting changes, especially in code, is typically painful. Merging can be tricky, especially if multiple people have simultaneously worked on a particular document. In the worst-case scenario, you may want to delete the local copy and re-clone the remote repository.

---
:::{figure-md} git-XKCD-2
<img src="./images/git/git_xkcd_1.png" alt="git desperation" width="250px">

**Try not to do this.** But most of us mortals will have, at some point! <br> (Source: [XKCD](https://xkcd.com/1597/))
:::

---

# Git Flow

* A [branching strategy for development using Git original description](https://nvie.com/posts/a-successful-git-branching-model/)
* Not the only strategy!
* Two core branches: develop and production (often `main` or `master`)

## Implementation

* Git Flow describes a strategy - there is more than one implementation!
* I have been using [`gitflow-avh`](`https://github.com/petervanderdoes/gitflow-avh`)
* Adds yet another set of commands, but these simply wrap [`git` commands](https://gist.github.com/JamesMGreene/cdd0ac49f90c987e45ac)


# Git Flow branches

* Circles represent commits to the repository

![GitFlow.png](./images/git/GitFlow.png)

# Example commands

```bash
git flow release start 0.1      # move from develop to release/0.1
git flow release publish 0.1    # push release/0.1 to remote origin 
git commit -m "Release fix" mycode.py
git flow release finish 0.1     # code merged to master and develop
git flow hotfix start 0.1_bug   # master branched to hotfix/0.1_bug
git commit -m "Hotfix" mycode.py
git flow hotfix finish 0.1_bug  # hotfix/0.1_bug merged to master v0.1.1 and develop
```





# Readings & Resources

### Branching 
* Guidelines for brancing: https://gist.github.com/digitaljhelms/4287848



---

**Footnotes**

<a name="git:word">1</a>: There you will find the following phrase: "...one of the most annoying problems known to humanity: version-controlling Microsoft Word documents.". LOL!

<a name="git:largefiles">2</a>: None of the computing weeks assessments will require you to use such large files anyway