# Introduction to Git
<br><br>

**ONS / NISR** <br>
2021

<h3>Git is an <b style="color:#2259A9">open-source</b> <b style="color:#6DA34D">distributed</b> <b style="color:#F67E7D"> version control system</b>.</h3>

We can break this down into the following parts.

* <b style="color:#2259A9">Open-source</b>: The code that runs git is freely available to anyone who wants to use it.

* <b style="color:#6DA34D">Distributed</b>: Git operates across multiple different computers.

* <b style="color:#F67E7D">Version control system</b>: Git tracks the changes made to files.

### Git vs Github

**Git** is the program that lets us version control our projects 
**GitHub** is an online platform stores and allows us to manage all our different projects.

You don't have to use **Github** to use **Git** but you do need to use **Git** in order to use **Github**.

<center><a href="https://github.com/NISR-analysis/"><img src="./imgs/github_logo.png"></a></center>

## Why use Git?

<img src="./imgs/git_versionfinal.png" width=800>

### Git is a time-machine 

Git keeps track of all the changes you've made to your project, and when you made them. 

Git lets you move backwards and forwards through your projects history without losing any of the changes you've made. 

<center><img src="./imgs/timemachine.jpeg"></center>

### Git is a collaboration tool

Git makes combine work from multiple people a breeze. It even lets multiple people edit the same file and git seamlessly combines all the changes.

<center><img src="./imgs/collaboration.png"></center>

### Git is an experimentation tool

Git lets you manage multiple versions of your project all in the same place. This means you can try something completely new without losing your old work, or having to make a new copy of your project. 

<center><img src="./imgs/experiment.jpeg" width=300></center>

## Excerise 1: Opening and exploring a Git "repository"

A <b>respository</b> is the name for the folder that contains your project, and all the previous versions of your project. Lets look at the `git_sandbox` repository.

[https://github.com/NISR-analysis/ds-gitsandbox](https://github.com/NISR-analysis/ds-gitsandbox)

<center><img src="./imgs/git_sandbox.png" width=600></center>

<br><br>We can explore the git repository in our web browser on Github but if we want to start editing things we need to copy the **repository** to our local machine. 

To do this we need to **clone** our repository. **Cloning** is where we take a copy of the entire repository. 

In order to **clone** our repository we first need to get a link to the repository, we can get this from Github by clicking the **Code** dropdown button. 

<center><img src="./imgs/git_clonelink.png" width=300></center>

### Cloning 

There are two ways to clone a repository, using the command line and using Github desktop. 

In order to clone a repository using the command line we can use
`git clone <url_to_repository>` 

In order to clone a repository using Github desktop we just copy that same URL into the **clone repository** dialog box.

<table>
    <tr><td><img src="./imgs/git_clonecmd.png" width=400></td>
    <td><img src="./imgs/git_clonedesktop.png" width=350></td></tr>
</table>

## Branches

In Git we do all our development in **branches**. By default git will create a **main** branch, but you can create any branch you want. 

A branch is essentially a **copy** of the code that you're working on, that is stored within the repository. You can have as many **branches (copies)** of your code in your repository as you want. 

Branches let you (and others) work on the same code without changing or breaking code that someone else is working on / or code that has been deployed into production.


### Exercise 2 - Switching Branches

By default you will find your repository defaults to the **main** branch (sometimes **master** in older repositories).

We can list all the branches within our repository using `git branch -a` on the command line or clicking **Current Branch** on Github desktop. 

*Note: If you use the command-line you'll notice some branches with the path `/remotes/origin/`, these are used by git to reference the version of these branches stored in Github itself*

<table>
    <tr><td><img src="./imgs/git_branchlistcmd.png" width=300></td>
        <td><img src="./imgs/git_branchlistdesktop.png" width=300></td>
    </tr>
</table>

### Checking-out a branch

In order to access the code and files stored in a branch we need to **checkout** that branch. This will change that files currently stored in our directory to match the files stored in the branch we're checking out. 

Using the command-line we can do `git checkout <branchname>`. Using Github desktop we can click on the name of the branch we want to checkout. 

**Can you find the treasure?**

<img src="./imgs/git_checkoutcmd.png">

## History 

We mentioned that git is a **time-machine**. On any branch we can look at the history of changes that have been made to that branch. 

We can use `git log` to see the history of changes on the branch we currently have checked-out, or use the **history** tab in Github Desktop.

<table>
    <tr>
        <td><img src="./imgs/git_logcmd.png"></td>
        <td><img src="./imgs/git_logdesktop.png"></td>
    </tr>
</table>

## Making changes

When we have a branch checked-out we can edit the files just as we would if we weren't using git. However, if we want git to remember our changes we need to both **save** the file and **commit** the changes. 

Git tracks changes to files via these **commits**. Git calculates the changes you've made between the most recent commit and your current version of the file. **Committing** tells git to save these changes, and add them to the history of the branch. This mean we can go back to this point in time regardless of any changes we make. 

### Exercise 3 - Translating 

Checkout the **translate** branch. In there there should be a file called `to_translate.txt` which contains **10** questions we want to ask, unfortunately they've been written in French!

<img src="./imgs/git_french.png">

#### 1. Create a new branch 

If we all start editing the same branch, we're going to have issues. Git wont be able to tell which edits are the correct ones. To avoid this we employ something called a **branching-strategy**. There are many different types of branching strategies but we'll use a simple one here. 

Everyone creates their own branch from the **translate** branch. 

<center><img src="./imgs/git_branchingstrat.png" width=400></center>

This can be done easily using a command we've already encountered `git checkout`

To create a new branch (based on the currently checked-out branch) and switch to it use `git checkout -b <branch_name>`. For this exercise set the branch_name to **translate_your_name**

Or using Github desktop, click **New Branch** and make sure to base the branch on **translate**

<table><tr><td><img src="./imgs/git_createbranchdesktop.png" width=400></td><td><img src="./imgs/git_createbranchdesktop2.png" width=200></td></tr></table>

Great, now we all have a unique branch containing the file that needs to be translated. Now we can work in parallel to solve this problem.

#### 2. Staging your changes

Once you've made and saved your changes, we need to tell git to remember them. We can do this using the `commit` command. A git commit how we tell git to take a snapshot of the current branch and record any changes that have occurred.

To create a commit we need to first tell git which files we want to include, this is known as **staging** the changes. On the command line this can be done with `git add <filename>`. If you're using Github desktop, staging is controlled by ticking or un-ticking the file in the changes tab.

<table>
    <tr>
        <td><img src="./imgs/git_stagecmd.png"></td>
        <td><img src="./imgs/git_stagedesktop.png"></td>
    </tr>
</table>

#### 3. Committing your changes

Once we've **staged** our changes, we need to **commit** them. This will tell git to create a checkpoint for all the staged files. 

When creating a commit, we write a **commit message**. This is a short message describing what changes have occurred in this commit. Writing clear and concise commit messages is good practice and will make both your life and the life of anyone reviewing your code much easier. It is a great way to record why changes have been made. 

Committing your changes in command line is as easy as `git commit`. By default this will open **vim** a classic text editor which is powerful but also difficult to learn. 

- To begin typing your message press **i** to enter insert mode.
- Type in your commit message 
- When you're done press `Esc` to exit insert mode. 
- Type `:wq` to write your message and quit.

It can be often much easier to use the `-m` flag in `git commit` to include the message immediately. `git commit -m "translated the first line"`

<table>
    <tr>
        <td><img src="./imgs/git_commitcmd.png" width=400></td>
        <td><img src="./imgs/git_commitdesktop.png" width=300></td>
    </tr>
</table>

You can check your changes have been saved by looking at the `git log` or the history tab in Github desktop. 

<center><img src="./imgs/git_commitcheck.png" width=600></center>

<br>

#### 4. Pushing your changes

We've successfully saved and committed our changes into our copy of the translate branch. This is all stored on our computer. But what if we want other people to be able to access our changes?

To do that we need to **push** our branch into the github repository so that anyone can access the changes we've made. 

We can do this in the command line by using the following command `git push --set-upstream origin <branch_name>`. This tells git to tell github to set up a branch in the repository linked to our local one, and copy all the changes.

<table>
    <tr>
        <td><img src="./imgs/git_pushcmd.png"></td>
        <td><img src="./imgs/git_pushdesktop.png"></td>
    </tr>
</table>

*Note you only need to use `--set-upstream` the first time you push, after than you can just use `git push`*

If we look on the github repository now, we'll see that our new branch is available to view, including the changes that we've just made.

<center><img src="./imgs/git_branchpushed.png"></center>

<br><br>
#### 5. Merging your changes

We've now got 9 branches each with a different line translated. Ideally we'd just have the translated file. This is where git becomes extremely useful. We can **merge** all of those branches back into our original **translate** branch, carrying all the changes that we've made.

<center><img src="./imgs/git_branchstratfull.png"></center>

In order to do this we need to create what is known as a **pull request**. This is a request to **pull** the changes from one branch, into another branch. This is usually done on Github so as to allow everyone the change to review and changes that are going into our branch. 

To create a pull request we open the branch with the changes we've made on, and click **Compare and Pull Request**

<img src="./imgs/git_createpullrequest.png">

Opening the pull request leads us the the pull request page. There are three main sections, the header describing which branches are going to be merged, the body containing templated text to describe the pull request and the panel on the right that allows us to add extra information and requirements.

<center><img src="./imgs/git_pullrequestpage.png" width=600></center>

The first section is the header, this allows us to control which branches we want to pull into and which branches we want to pull from. 

By default Github will assume you want to pull your branch into the **main/master** branch. In this case we want to pull it into the **translate** branch. In order to do this we just need to change the **base** to **translate**.

<img src="./imgs/git_setbasebranch.png">

Next we want to write a description of our pull request. This is just like writing a commit message expect it usually has more detail as pull requests often have multiple commits and make significant changes to another branch.

By default all NISR repositories have a template for pull requests to help you format your request and make sure that all the information that needs to be recorded is being captured.

Git supports [markdown](https://www.markdowntutorial.com/) formatting of text which you can see a preview of using the **Preview tab**

<br><br><center><img src="./imgs/git_pullrequestmessage.png" width=600></center>

The final thing to do is add a review to the pull request. Code reviewers check the code does what is says it does and also act as gatekeepers to make sure that branches aren't being broken by code changes that don't work. 

To add a reviewer click the cog next to **Reviewers** and pick someone to review your pull request. This will usually be your team leader, or someone who has a good knowledge of the code you're writing. 

<center><img src="./imgs/git_reviewers.png" width=200></center>


Clicking **Create Pull Request** at the bottom will **Open** this pull request. This will automatically take us to the pull request page. Here people can comment on the changes made in
this pull request.

Once your reviewer has approved your pull request, you can click **Merge pull request**. This will copy your changes into the **translate** branch. 

<center><img src="./imgs/git_mergepullrequest.png"></center>

<br><br>Once this is done you should be able to delete your branch, as all your changes have now been moved onto the **translate** branch.

<center><img src="./imgs/git_merged.png"></center>

Looking at the history of the **translate** branch we can see that the changes have now been applied.

<center><img src="./imgs/git_mergedbranch.png"></center>

The final thing we need to do is to **pull** those changes down from the Github repository into our local copy. This is very easy using either Github Desktop or the command line. 

```
git checkout translate
git pull
```

<center><img src="./imgs/git_pullchange.png"></center>

## Where Git doesn't work 

Git works best for **text** files, this is anything from source code to csvs. Anything that can be open in a text editor is considered a **text** file. 

Git doesn't work well for **binary** files. 

| Binary File Type | Common Extensions |
| --- | --- |
| Images | jpg, png, gif, bmp... |
| Video | mp4, mkv, avi, mov... |
| Audio | mp3, aac, wav, flac... | 
| Documents | pdf, xlsx, ppt, docx... |
| Archives | zip, rar, 7z, tar... |
| Databases | mdb, accde, frm, sqlite... |

This is because git calculates differences by looking at the **lines** in the data. Changes to **binary** files don't map to neat line changes, but instead the entire file changes. Git can record these changes but it will end up copying the whole file in each commit which can end up making your repository huge and slow. 

Try to avoid committing big or binary file formats into your repository. 

### Explore more branching

To explore branching some more as well as using the git command line, try playing learngitbranching. 
[https://learngitbranching.js.org/](https://learngitbranching.js.org/)