# Implementing Transformer Models
## Practical I
Carel van Niekerk

---

In this practical we will set up a python project for implementing a deep learning model. We will focus on creating a modular codebase which makes it easy to debug, adapt and reuse the code. We will also set up a python environment for the project, making it easy to manage the project dependencies.

### 1. Setting up Environment
We will use Python 3.8 or above for this course. If you have not installed Python yet, you can download it from [here](https://www.python.org/downloads/). We recommend using the latest version of Python. If you are using Windows, make sure to select the option to add Python to your PATH variable during the installation process. This will allow you to run Python from the command line.

#### 1.1 Installing an IDE
We recommend using an IDE (Integrated Development Environment) for this course. An IDE is a software application that provides comprehensive facilities to computer programmers for software development. We recommend using [PyCharm](https://www.jetbrains.com/pycharm/). PyCharm is a cross-platform IDE that provides smart code completion, code inspections, on-the-fly error highlighting and quick-fixes, along with automated code refactorings and rich navigation capabilities. PyCharm also provides support for version control systems, Python web frameworks, databases, and scientific tools. You can register for the professional version of PyCharm for free using your student email address.

#### 1.2 Installing Git
We will use Git for version control. Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. You can download Git from [here](https://git-scm.com/downloads). If you are using Windows, make sure to select the option to add Git to your PATH variable during the installation process. This will allow you to run Git from the command line.

#### 1.4 Creating a virtual environment
Before we start installing the project dependencies, we will create a virtual environment for the project. A virtual environment is a tool that helps to keep dependencies required by different projects separate by creating isolated python virtual environments for them. This is one of the most important tools that most of the Python developers use. We will use the `virtualenv` package to create a virtual environment for our project. You can install the `virtualenv` package using the following command:

```bash
pip install virtualenv
```

To create a virtual environment, you must specify a path. For example to create one in the local directory called ‘mypython’, type the following:

```bash
virtualenv transformer_project
```

To begin using the virtual environment, it needs to be activated:

```bash
source transformer_project/bin/activate
```
This will be down automatically if you are using PyCharm, after setting the virtual environment as the python interpreter for the project. You can confirm you’re in the virtual environment by checking the location of your Python interpreter, it should point to the env directory.

#### 1.5 Installing project dependencies

For this project we will use the following packages:
- [PyTorch](https://pytorch.org/)
- [PyTest](https://docs.pytest.org/en/stable/) (some standard tests will be provided to test important modules)
- Huggingface datasets (optionally you can download the dataset manually)

### 2. Setting up the project

#### 2.1 Creating a project directory

We will create a directory for the project. This directory will contain all the code for the project. We will call this directory `transformer_project`. You can create this directory in the location of your choice. For example, you can create it in your project directory.

#### 2.2 How to structure the project

There are many ways to structure a deep learning project. We will use a modular approach, where we will create a separate python module for each component of the project. This will make it easy to debug, adapt and reuse the code. We will create the following modules (directories/scripts) for our project:

- `modelling`: This module will contain the code for the model architecture including the learning rate schedulers, loss functions and training code.
- `dataset.py`: This script will contain the code for loading, cleaning and preparing the data for model training.
- `test`: This directory will contain the code for testing the modules.
- `run/main.py`: This script will contain the code for running the model training and evaluation.
- `utils.py`(Optional): This script will contain utility functions that will be used by other modules.

# Exercises

1. Install all required packages for the project.
2. Create a virtual environment for the project.
3. Setup a project directory.
4. Push your clean project to GitLab.