# Version Control and CI Tutorial

## Outline
1. A brief overview of testing with pytest
1. **Version Control with Git and GitLab**
    1. Git basics and best practices
    1. Advanced Git usage
1. Continuous Integration using GitLab CI/CD

---
# A brief overview of testing with pytest

Additional material: 
- [Pytest documentation](https://docs.pytest.org/)
- [Python Testing with pytest (book)](https://pythontest.com/pytest-book/)

When developing code, it is inherent to introduce bugs.
However, it is important to be able to detect those bugs and be able to fix them quickly.
A great way to do this is to write test for the code.
<br/>With test covering a good portion of a code base, it becomes easier to make change knowing that it will not break the project.
This will greatly speedup development in the long run.

Althought not the main focus of this tutorial, this section will quickly introduce how to write a test for a Python function.

## Writing a simple function and test.

Let's write a simple function to greet users.
For now, the function will do nothing.

We want the function to greet the given user; or say "Hello World!" if no user is given.

```python
# tutorial.greeting
from typing import Optional


def greet(username: Optional[str] = None) -> str:
    """Greet a given user.

    When no user is given, this function will return "Hello World!".

    parameters
    ----------
    username: Optional[str]
        Name of the user.

    returns
    -------
    str
        Personalized greeting.
    """
    pass

```

Then, let's write a test that will make sure our function works as expected.
```python
# tests.test_greeting
import pytest

from tutorial.greeting import greet


def test_greet() -> None:
    assert greet("Alice") == "Hello Alice!"
    assert greet("Bob") == "Hello Bob!"
    if greet() != "Hello World!":
        pytest.fail()  

```
Let's execute the test we wrote. In a terminal run the command
```bash
pytest ./tests
```
The test should fail as expeceted since the function does nothing for now.

Let's write the core of our function and run the test suite again.
```python
# tutorial.greeting
from typing import Optional


def greet(username: Optional[str] = None) -> str:
    """Greet a given user.

    When no user is given, this function will return "Hello World!".

    parameters
    ----------
    username: Optional[str]
        Name of the user.

    returns
    -------
    str
        Personalized greeting.
    """
    username = "World" if username is None else username
    return f"Hello {username}!"

```
```bash
pytest ./tests
```
This time, the test succeeded. Our implementation is successful 🎉

## 💻 **Practice Time** 💻

Write a test case for the following function.
<br/>You can use the template provided in `tutorial/arrays.py` and `tests/test_arrays.py`.

```python
# tutorial.arrays
from typing import Iterable, Tuple


def min_max(arr: Iterable[int]) -> Tuple[int, int]:
    """Get the min and max of an iterable.
    
    parameters
    ----------
    arr: Iterable[int]
        List with integer values.
    
    returns
    -------
    tuple[int, int]
        minimum and maximum value of arr.
    """
    pass

```

### Solution

```python
# tests.test_arrays
from tutorial.arrays import min_max


def test_min_max() -> None:
    assert min_max(range(10)) == (0, 9)
    assert min_max([7, 4, 2, 8, 1, 3]) == (1, 8)
    assert min_max([1]) == (1, 1)
    
```

```python
# tutorial.arrays
from typing import Iterable, Tuple


def min_max(arr: Iterable[int]) -> Tuple[int, int]:
    """Get the min and max of a list.
    
    parameters
    ----------
    arr: list[int]
        List with integer values.
    
    returns
    -------
    tuple(int, int)
        minimum and maximum value of arr.
    """
    return min(arr), max(arr)

```

---
# Version Control with Git and GitLab

Additional material:
- [Git cheatsheet](https://education.github.com/git-cheat-sheet-education.pdf)
- [Git reference](https://git-scm.com/docs)
- [Visual breakdown of Git](https://marklodato.github.io/visual-git-guide/index-en.html)
- [In-depth internals of Git](https://github.com/pluralsight/git-internals-pdf)

Keeping track of different version of a project can be a tedious task. Git aims at solving this issue by offering a suite of commands to create and manage different version of a repository in a covinient fashion.
Git allows to track, view, delete, edit different version of a project; and plenty more.
<br/>Moreover, Git brings an easy way to work on different version of code concurrently and merge it back together. This is great when working in a team.

## Git workflow and common practices

### Initializing a Git repository

The first step is to tell Git that we want to keep track of files in this repository.
To initialize an existing directory as a Git repository, use the following command
```bash
git init
```
**Note:** it is also possible to initialize a Git repository at a given location using
```bash
git init <directory_path>
```

<span style="color:firebrick">**Best practice:**</span> after creating a new project it is recommended to create an empty commit. The rational will be explained later in this tutorial.
This can be done using
```bash
git commit --allow-empty -m "[EMPTY] Initial commit."
```

### Minimal configuration of Git

Before we start using Git, we need to do some minimal configuration.
Use `git config` to set up a username and an email as follow
```bash
git config --global user.name “[firstname lastname]”
git config --global user.email “[valid-email]”
```
This is important for when collaborating with other users as it will help identifying who contributed which part of the project.
<br/>**Note:** when using `--global` this will set it up for any Git repository for your user on the computer. Therefore, it can be done only once.

### An overview of the different Git states

There are two main states in Git: **tracked** and **untracked**.

Tracked files are files from a previous snapshot or that were newly staged; they can be unmodified, modified or staged. i.e. those are the files that Git knows about.

Untracked files are every other files. They are files in your working directory that Git is not tracking.

<img src="./figures/lifecycle.png">

source: [https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository)

### Keeping track of files

After initializing a repository, we will wanted to start tracking files in Git.

To view the status of a repository including which files are untracked or modified, use the following command
<br/>**Note:** Optionally, a path can be specified to only get the status of a specific file or sub-directory.
```bash
git status [path]
```

To move an untracked or modified file to the staging area, use the following command 
```bash
git add [file]
```

To snapshot the currently staged files in a Git repository, use this command
```bash
git commit
```
You will be prompted to enter a commit message.
<br/><span style="color:firebrick">**Best practice:**</span> write a concise and descriptive message for you commits. Others, and your future self, will thank you.

**Note:** It is also possible to write the commit message as part of the command, using the the `-m` flag follow by the message.
```bash
git commit -m "[descriptive message]"
```

Looking at the status of the repostory again, we see that the files we commited are no longer untracked.
```bash
git status
```

As a project progress, you might want to look back at its history.
This will be useful to see when changes were made and who made them.
To list the log of a Git repository, use the following command
```bash
git log
```

### Working on different features concurrently

A critical component of Git are branches. They allows the creation of the repository that diverges from the main project.
In other words, branches in Git let you work on different features of a project concurently. Moreover, it keeps the code of in-development features independent from the main code base as well as other in-development code.

<img src="./figures/branches.svg" width=650>

source: [https://www.atlassian.com/git/tutorials/using-branches](https://www.atlassian.com/git/tutorials/using-branches)

To list all the current branch of the local repository, use the following command
```bash
git branch
```

**Note:** If your default branch is named `master` rename it to `main`, using this command
```bash
git branch -m master main
```

To create a new branch from the current commit, use the command
```bash
git branch [branch-name]
```

To checkout (switch) from one branch to another, use this command
```bash
git checkout [branch-name]
```

To delete a branch, use this command
```bash
git branch -d [branch-name]
```

**Note:** when creating a new branch and directly checking out to it, this shortcut is convinient
```bash
git checkout -b [branch-name]
```

### Sharing your code on GitLab

To share code with others, a central location is needed to store the changes brought to the files in the repository. In Git, this is called a **remote**.
<br/>**Note:** For the purpose of this tutorial, GitLab is used as remote.

#### Creating a repository on GitLab

In the middle top bar of GitLab you will find a `+` icon.
<br/><img src="./figures/topbar.png" width=1000>

Click on it then select **"New project/repository"**.
<br/><img src="./figures/new.png" width="250">

When creating the project, make sure to untoggle the **"Initialize repository with a README"**. Since we already create a local project, this would result in conflict with the repository history.
<br/><img src="./figures/no_readme.png" width=650>

Lasty, to give permission for Git to interact with GitLab an SSH key should be added to your account. Go on your **profile (top right)**, then into **Preferences**, and into **SSH Keys** into the menu. You can paste your SSH public key there.

In case you do not have an SSH key, you can generate one on your project VM using
<br/>**Note:** The `<comment>` is optional, but can be helpful in identifying where the SSH key was generated.
```bash
ssh-keygen -t ed25519 -C "<comment>"
```
After running this command, you will find your SSH public key at the location `~/.ssh/id_ed25519.pub`.

Once the project is created and ssh is configured, locate the `clone` button on the page. Then copy the url under **Clone with SSH**.
<br/><img src="./figures/ssh_url.png" width="350">

#### Interaction between Git and GitLab

To upload code to a remote repository, Git needs to be aware of it by registering the remote as follow
```bash
git remote add [remote-name] [url]
```

To push (upload) the version of the Git repository to a remote location, use the command
```bash
git push
```

**Note:** the first time a branch is pushed, it is required to specify the remote and the name of the branch for the remote.
```bash
git push -u [remote-name] [current-branch]
```

When changes are made on a remote repository, it is possible to pull (download) them locally using
```bash
git pull
```

### Managing code on GitLab

Let's jump onto GitLab to learn how to:
- Create a merge request
- Write an issue
- Fork a repository

### 💻 **Practice Time** 💻

During this exercise, you will:
1. Create a new branch derivated from `main`.
1. Commit the `requirements.txt` file available in the tutorial package.
1. Push the changes on GitLab.
1. Make a merge request.
1. Pull the remote changes from the `main` branches to the local repository.

### Other useful commands

To download a fresh project (not available locally), use this command
```bash
git clone
```

To remove files from the staging and keep the change in the working directory, use the following command
<br/><span style="color:firebrick">**Warning:**</span> Misuse of the `git reset` command could result in loss of work. Moreover, **AVOID** using git reset on public history (shared with others).
```bash
git reset [files]
```

It is possible to merge the history of another branch into the current one, using this command
```bash
git merge [branch]
```

<br/><img src="./figures/merge.svg" width="650">


source: [https://www.atlassian.com/git/tutorials/merging-vs-rebasing](https://www.atlassian.com/git/tutorials/merging-vs-rebasing)

## Advanced Git usage (Optional)

### Showing differences between versions of the repository

To uncommitted changes in the repository use
```bash
git diff
```

It is also possible to see the uncommitted changes of a set of files
```bash
git diff [files]
```

Or between two commits
```bash
git diff [commit-1] [commit-2]
```

Or between two branches
```bash
git diff [branch-1] [branch-2]
```

To learn more about the numerous options of `git diff` you can have a look into those resources:
- [Git diff documentation](https://git-scm.com/docs/git-diff)
- [Atlassian | Git diff tutorial](https://www.atlassian.com/git/tutorials/saving-changes/git-diff)

### Selecting a sample of commits from another branch

To apply a sample of commits from another branch onto the current one, use the following command
```bash
git cherry-pick [commits]
```

### Commiting temporarely

Git allows to save change temporarely in a stash.
<br/>To save the current changes from the working directory, use the following command
```bash
git stash push -m [message]
```

To see the status of the current stash, use this command
```bash
git stash list
```

To show the difference between the stash entry and the commit back when the entry was created, use the command
```bash
git stash show stash@{[stash-id]}
# Example git show apply stash@{0}
```

To apply the change saved in a stash entry to the current working tree, use this command
```bash
git stash apply stash@{[stash-id]}
# Example git stash apply stash@{1}
```

To remove and entry from the stash, use the command
```bash
git stash drop stash@{[stash-id]}
# Example git stash drop stash@{2}
```

To both apply to the working tree then remove the entry from the stash, use this shortcut
```bash
git stash pop stash@{[stash-id]}
# Example git pop apply stash@{0}
```

To remove all entries from the stash, use this command
```bash
git stash clear
```

### Editing history

As an alternative to `git merge`, it is possible to combine the commits from to branch, using this command
```bash
git rebase [base-branch]
```

It is also possible to edit, squash, drop, etc. the commit history interactively, using this command
```bash
git rebase -i
```
<br/>**Note:** To rebase interactively, there needs to be previous commits in the history. This is the rational for previously creating an empty commit at the start of the history.
<br/><span style="color:firebrick">**Warning:**</span> **AVOID** using `git rebase` on public history as it will rewrite the commits history.

<br/><img src="./figures/rebase.svg" width="650">

source: [https://www.atlassian.com/git/tutorials/merging-vs-rebasing](https://www.atlassian.com/git/tutorials/merging-vs-rebasing)

---
# Continous Integration using GitLab CI/CD

Additonal material:
- [GitLab CI/CD Doc](https://docs.gitlab.com/ee/ci/)

<span style="color:firebrick">**Warning:**</span> Resource to spawn GitLab CI might be overused. Therefore, following along might be difficult.

Continuous integration (CI) is a great tool to detect earlier in the development process of software.
The main idea of CI is to test every change to be merge into the production codebase in an automatic way.
When new code is pushed onto a central repository, a job will be executed to run a suite of quality assurance tests to make sure the code is ready for production.
Those tests can include: linting, unit tests, functional tests, and others.
The objective of CI is to detect and fix errors early on, and therefore, reduce the lead time to create new features.

## A first CI job

As a first step, we will use the docker image for `python:3.8` and install the required dependencies before executing other scripts.
```yaml
image: python:3.8

before_script:
  - python3 -V
  - pip install -r requirements.txt

```

We will add this first job to execute the tests suite from the repository.
```yaml
unit-test:
  script:
    - pytest -v -rfEs ./tests

```

## Creating a pipeline with multiple stages

When doing continuous integration, it is useful to separate the steps from a pipeline in multiple stages.
<br/>For example
<br/><img src="./figures/ci-stages.png" width="350">

We will replace the previous job with this pipeline composed of two stages and three jobs.
<br/>Firtsly, the `Static Analysis` jobs will run, then the `Test` job can run. Moreover, for demonstration purpose, we allow the `mypy` job to fail without failing the whole pipeline.
```yaml
stages:
  - Static Analysis
  - Test

black:
  stage: Static Analysis
  script:
    - black ./tutorial ./tests

mypy:
  stage: Static Analysis
  allow_failure: true
  script:
    - mypy ./tutorial ./tests

unit-test:
  stage: Test
  script:
    - pytest -v -rfEs ./tests

```

## Adding artifacts

To have a nice report of our tests, we add an artifact generated by pytest.
```yaml
unit-test:
  stage: Test
  script:
    - pytest --junitxml=report.xml -v -rfEs ./tests
  artifacts:
    when: always
    reports:
      junit: report.xml

```

## Using caching to accelerate our pipeline

We can use the caching feature offered by GitLab CI to avoid downloading the dependencies of the repository for every job.

```yaml
variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip
    - venv/

before_script:
  - python3 -V
  - python3 -m venv ./venv
  - source venv/bin/activate
  - pip install -r requirements.txt

```

## An overview of the full pipeline

```yaml
image: python:3.8

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip
    - venv/

before_script:
  - python3 -V
  - python3 -m venv ./venv
  - source venv/bin/activate
  - pip install -r requirements.txt
  
stages:
  - Static Analysis
  - Test

black:
  stage: Static Analysis
  script:
    - black ./tutorial ./tests

mypy:
  stage: Static Analysis
  allow_failure: true
  script:
    - mypy ./tutorial ./tests

unit-test:
  stage: Test
  script:
    - pytest --junitxml=report.xml -v -rfEs ./tests
  artifacts:
    when: always
    reports:
      junit: report.xml

```

---
# Keypoints from this tutorial
- Creating test with Pytest
- Git
    * Basic Git workflow using: status, add, commit, log, push/pull.
    * Working on different version of the same repository using branches.
    * Comparing and combining different version of a repository.
- GitLab
    * Basic usage of GitLab to manage a repository.
    * Creating a continuous integration pipeline to lint and test code.
