# Getting Started With Git

![xkcd](https://imgs.xkcd.com/comics/git.png)

## What is git?

As XKCD explains, it is software to track collaborative work projects. It is used by almost every open-source project to enable multiple distributed developers to write software together, but most companies also maintain internal (private) git repositories to track their products. Even if you are not working collaboratatively, there are a lot of advantages to using git!. 

There is a lot of interesting technical detail behind git, but here we will focus on its practical use for developing python projects. So, let's get started.

## Running commands

To complete this notebook, you need to open a terminal (if on a Windows machine, be sure to use the `Anaconda Powershell prompt`). As a reminder, you may wish to review the [guide to the command line](https://github.com/GregoryAshton/PH3010_advanced_python/blob/main/guides/using_the_command_line.md).

<div class="alert alert-block alert-danger">
<b>Challenge 3.1:</b> Open a terminal
</div>


<div class="alert alert-block alert-info">
<b>Note:</b> In this notebook, I use jupyter commands so that I can package the instructions as a notebook. But, you need to enter them into the terminal. Wherever there is a "!", this indicates that the rest of the line should be run in the terminal.  Wherever there is a `writefile` you should do this by using a text editor.
    
**All commands listed below should be run in this terminal: DO NOT SIMPLY RUN THE COMMANDS INSIDE THE NOTEBOOK**

</div>


## Checking that you have git installed

You can check that you have git installed and which version by running the following 

In [None]:
! git --version

## A git-tracked directory

Once you have `git` installed, you are ready to track files in a directory. Let's imagine we are starting from scratch on our new project, we first create a directory (folder) on our computer (note, I do this on `~/my_example_project` for demonstrating purposes, you will of course replace this with the directory you actually want to track!).

In [None]:
! rm ~/my_example_project -rf  # First remove the directory to start clean
! mkdir ~/my_example_project

It may be useful to open the directory in a file explorer, on Windows you can do this by running
```
$ explore.exe .
```
or navigating to the directory from the file explorer.

Okay, now we add a file. It is usually a good idea to have a `README.md` file to tell people what the directory contains. Let's add one

In [None]:
%%writefile ~/my_example_project/README.md
# My example project

Here is my project description

<div class="alert alert-block alert-info">
<b>Note:</b> Here I am using the jupyter cell magic `writefile`. To replicate this step, you will need to edit the file directly. For Windows, open notepad or notepadd++ write the contents of the file and the click "save as" navigating to the directory `my_example_project`.
</div>


Okay, now let's go to the directory and check our `README.md` is there.

In [None]:
%cd ~/my_example_project/
! ls

At this point, we will start using `git`. Our first step is to initialise the directory.

In [None]:
! git init
! git checkout -b main

Okay, then next we **add** the `README.md` file 

In [None]:
! git add README.md

We can check that it is there

In [None]:
! git status

The output here says that there is one new file `README.md` "to be committed". Let's commit the file and add a message.

In [None]:
! git commit -m "Adds initial version of the README"

There are two stages to creating a `git` commit. First we **add** files, then we **commit** them. You may ask, "what is the difference?". The key is that our example above was rather simple. In practice, you may want to add multiple files at once, or even add changes to only part of a file. Everything done during the `add` command can easily be undone (look at the output of `git status` to see how). But, once you have added all the files that you like (this is sometimes referred to as *staging*), you **commit**. The commit is where the magic of `git` happens: in short the changes to the files are recorded along with the message explaining the changes.

Okay, to fill out our example, let's change the `README.md` file to add a better description of the project (here we overwrite the file, but you could of course just edit it directly):

In [None]:
%%writefile ~/my_example_project/README.md
# My example project

This is an example project for people learning about git

(Reminder: you will need to edit the file using a text editor). Now, we can check the `git status` again

In [None]:
! git status

Now we see that file has been modified. We can check how it has been modified by using `git diff`

In [None]:
! git diff README.md

And then we can add the file and commit the changes

In [None]:
! git add README.md
! git commit -m "Updating the description in the README"

Why is this so useful? 

* First, I can easily track when something changes, why it changed, and who changed it. This is enabled by the `log`. 
* Second, it enables collaborative work. You and I can both commit changes to the same piece of code simultaneously and then we can **merge** those changes together (if there are conflicts, a decision will be needed to handle them, but let's worry about that later). 
* Finally, it breaks our work into well defined chunks. You should think of each commit as adding, removing, or changing **one** logical piece of the data. The commit message should explain this. Avoid large commits which change multiple files at once: if you can't explain the commit in a single short sentence, then you are going too much.

<div class="alert alert-block alert-danger">
<b>Challenge 3.2:</b> On your computer, create a directory, initialise git, add some files, and commit those files. Check that this works as you expected by looking at git status at each step.
</div>

Okay, let's now look at each of these ideas individually.

### Tracking changes in a git repository

If we want to see a history of the changes, we can look at the `log`

In [None]:
! git log

We can also look at the changes in each commit by pasting part of the commit hash (this is the long string of letters and number printed in the log), for example:

In [None]:
! git show b794bb09de8b5fffa6b599938a2c051c66226ac5

<div class="alert alert-block alert-info">
<b>Note:</b> The hash in your directory will be different! You need to look at the contents of `git log` and copy and paste the hash. 
</div>
    
You don't have to use the full hash here. So long as it is unique, just taking the first few letters also works.

In [None]:
! git show b794bb09

### Git branching

git enables multiple versions of the same software using the idea of branches. Usually we have one main branch, often called either the `master` or `main` branch (the later is preferred nowadays for obvious reasons, but `git` still defaults to calling it `master` as in our example above so we immediately created `main` instead). Then, when we want to make changes, we create a branch, make the changes there, then `merge` them back into the main branch, here is a visualisation of that process graphically from a nice [tutorial](https://www.atlassian.com/git/tutorials/using-branches#:~:text=The%20git%20branch%20command%20lets,checkout%20and%20git%20merge%20commands.)

<img src="https://wac-cdn.atlassian.com/dam/jcr:a905ddfd-973a-452a-a4ae-f1dd65430027/01%20Git%20branch.svg?cdnVersion=365" alt="branches" width="600"/>

Let's see an example of how to create a branch, make a change, and then merge it back in using our project example above

In [None]:
%cd ~/my_example_project/
! git checkout -b adding-a-licence-to-my-project

In [None]:
%%writefile LICENSE.md

MIT License

Copyright (c) [year] [fullname]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

In [None]:
! git add LICENSE.md
! git commit -m "Adding the MIT license to the project"

In [None]:
! git status

At this point, we are on the `adding-a-licence-to-my-project` branch and have two files in to directory:

In [None]:
! ls

We can now switch back to the `main` branch where `LICENSE.md` does not exist

In [None]:
! git checkout main
! ls

To **merge** the changes back in, we run

In [None]:
! git merge adding-a-licence-to-my-project

In [None]:
! ls

In [None]:
! git log

<div class="alert alert-block alert-danger">
<b>Challenge 3.3:</b> Follow the steps above to check you understand how to create a branch, add files, and then merge them back into main/master.
</div>

### Good git commits

`git` can be used in many ways. But, there are some principles which you should try to abide by as they will help you, especially when you work with others. Perhaps the most important is to **ensure that each commit performs one logical change**. Avoid making lots and lots of changes, then committing them all together. This might seem natural. For example, you may previously have dumped all your code into Dropbox at the end of each day. But, it renders the `git log` useless and makes merging your code really hard! 

In short, we can appeal to [XKCD](https://xkcd.com/1296/) again for what **not** to do:

![xkcd](https://imgs.xkcd.com/comics/git_commit.png)

Instead you should

1. Sit down and decide what singular logical change you plan to make to the software
2. Create a branch to track the changes
3. Edit the software to make the change
4. Add all files required (`git add`)
5. Review the changes (using `git status`)
6. Commit the changes with a *good* git commit message
7. Repeat steps 3-6 until you have all the commits you need
8. Either merge the changes into `main` or creating a Pull Request (see below)

Note that in step 7, we realise that sometimes we need multiple commits.

## Git repositories

Our git directory we created above has got us started, but now we want to share our project with the world! How can we go about distributing our software? This can be done using a `repository` (or `repo` for short). This is an online copy of the software where we can **push** our changes (i.e. all our git commits) and **pull** changes from other people. There are several websites which provide access to repositories for free. Here we will use `github`. 

### Working with existing repositories

If you want to work with an existing repository, you can download the source code directly. For example, `numpy` is openly developed on github [here](https://github.com/numpy/numpy). We can `clone` the directory by hitting the big green **clone** button and copying the `HTTPS` URL: https://github.com/numpy/numpy.git. With that in hand, we can clone as follows:

In [None]:
%cd ~
! rm numpy -rf  # Remove the directory to start clean
! git clone https://github.com/numpy/numpy

We can now take a look at the directory

In [None]:
! ls numpy

If you ever want to know how `numpy` works, you can take a look in this directory and find the function that you need. For example, if you wanted to know how the `numpy` function `savetxt` was implement we could use `grep` (a UNIX command to find things):

In [None]:
! grep 'def savetxt' ~/numpy/numpy/ -r

We get two matches, the actual definition is in the first file. We can view it [here](https://github.com/numpy/numpy/blob/main/numpy/lib/npyio.py#L1321) (you could open the file locally, here I have linked to the online version).

<div class="alert alert-block alert-info">
<b>Note:</b> This command will not work on windows, but you can still use the file explorer to navigate into the `numpy` directory and find the files!
</div>

<div class="alert alert-block alert-danger">
<b>Challenge 3.4:</b> Follow the steps above to check you can clone a copy of an existing repository locally and open the file numpy/lib/npyio.py to see how `savetxt` is defined. 

### Creating your own repository
You can sign up for a github student developer pack [here](https://education.github.com/pack) using your university email address. This will give you free access to lots of premium features. Once you have an account, you can play around creating a new repo here: https://github.com/new.

<div class="alert alert-block alert-danger">
<b>Challenge 3.5:</b> Create a new repository on GitHub under your namespace. Clone a local copy of the repository to your local computer. Make some changes to the repository, then push the changes back up to GitHub and verify that they exist there.

## Git Flow

`git` itself is software to track changes to a set of files (i.e our software). However, as we have seen above, lots of packages are developed openly on platforms like GitHub. At it's core, GitHub provides a place to share your git repository with others (and find and clone their repositories). But, beyond this GitHub and similar platforms have developed a way of working with git repositories which is often referred to as **GitHub Flow**. In short, this describes the *flow* of software development from an idea to part of the published package. You can read about [github flow here](https://docs.github.com/en/get-started/quickstart/github-flow). (Note that other platforms have names like GitLab flow and some of the concepts are renames, e.g. Pull Request == Merge Request. Here we will use the GitHub terminology).

<div class="alert alert-block alert-danger">
<b>Challenge 3.6:</b> Skim over the github flow documentation to get an understanding of the idea. We will use this in our group project work so it is important you have this reference on hand.

## Glossary
People are often confused by various definitions at this point, so here is a glossary to help

* **git**: a free and open source distributed version control system (i.e. software) designed to handle everything from small to very large projects with speed and efficiency.
* **directory**: (also known as a folder) a structure in the file system to contain multiple files
* **git repository**: (often shortened to repo) a directory which is tracked by `git`
* **GitHub**: A website for sharing git repositories (see also gitlab, bitbucket, etc)
* **GitHub Flow** (also known as Git Flow) a method of working with git repositories on GitHub to enamle multi-user development.