## 6.1 - Testing Python code with pytest

The idea in this section is to perform **unit testing**, that is testing individual pieces of the code.

We will take the deployment example from Week 4 (streaming). In this example we used Lambda and Kinesis (AWS). It had this simple architecture:

```   
                      (4) Model
                     (S3 Bucket)
                         |
    Stream of         Service         Predictions 
(1) events    ->  (2) (Lambda) -> (3) Stream
    (Kinesis)                         (Kinesis)
```

1. We have the stream of events
2. The service w/ model (from an S3 bucket (4)) reads and reacts to these events 
4. The service applies the model to input stream to get the predictions

The development will be done in the new folder ```notes-code```, where we had copied the content from ```04-deployment/streaming```:

```
Dockerfile          Pipfile       model.py
lambda_function.py  Pipfile.lock  test_docker.py
```

where we have removed the original```README.md``` and ```test.py``` files. We then create a virtual environment (we could have just used the conda environment ```mlops-zoomcamp``` used in previous weeks instead):

```
$ pipenv install boto3 mlflow scikit-learn
```

and create a test folder ```notes-code/tests``` with ```__init__.py``` so that python knows it can be imported as package. 

Finally the install pytest as a dev dependency (we do not need it for production):

```
$ pipenv install --dev pytest
```

To enter the virtualenv:

```
$ pipenv shell
```

We start from the script ```notes-code/lambda_function.py```. We modify (refactor) the original script to make it easier to test:

```python
import os
import model

# Get environment variables
PREDICTIONS_STREAM_NAME = os.getenv('PREDICTIONS_STREAM_NAME',
                                    'ride_predictions')
RUN_ID = os.getenv('RUN_ID')
TEST_RUN = os.getenv('TEST_RUN', 'False') == 'True'

# Initialize the model service
model_service = model.init(
    prediction_stream_name=PREDICTIONS_STREAM_NAME,
    run_id=RUN_ID,
    test_run=TEST_RUN,
)

# Lambda handler function
def lambda_handler(event, context):
    # pylint: disable=unused-argument
    return model_service.lambda_handler(event)
```

Explanation:

- This script serves as the entry point for the AWS Lambda function. It delgates everything to the model service.
- It imports the ```model``` service and initializes it using the ```model.init()``` function.
- The environment variables PREDICTIONS_STREAM_NAME, RUN_ID, and TEST_RUN are used to configure the model service.
- The ```lambda_handler``` function is the main function that AWS Lambda invokes when the function is triggered. It calls the ```lambda_handler``` method of the model service (yes, it's confusing) and passes the event and context.

where script ```notes-code/model.py``` is as follows:

```python

import os
import json
import base64
import boto3
import mlflow

# Function to get the model location (from S3 bucket)
def get_model_location(run_id):
    # ...

# Function to load the MLflow model
def load_model(run_id):
    # ...

# Function to decode base64 data
def base64_decode(encoded_data):
    # ...

# Class representing the model service
class ModelService:
    # ...

# Class representing the Kinesis callback
class KinesisCallback:
    # ...

# Function to create a Kinesis client
def create_kinesis_client():
    # ...

# Function to initialize the model service
def init(prediction_stream_name: str, run_id: str, test_run: bool):
    # ...

```

Explanation:

- This script contains the main logic for the model service.
- It defines functions to get the model from S3 bucket, load the MLflow model, decode base64 data, and create a Kinesis client.
- The ```ModelService``` class represents the model service and contains methods for preparing features, making predictions, and handling Lambda events.
- The ```KinesisCallback``` class represents a callback for sending prediction events to a Kinesis stream.
- The ```init``` function initializes the model service by loading the MLflow model, creating a Kinesis callback (if not in test run), and returning the initialized model service.

The tests are located in ```notes-code/tests/model_test.py```, which reads as follows:

```python
from pathlib import Path
import model

# Function to read text from a file
def read_text(file):
    # ...

# Test for base64_decode function
def test_base64_decode():
    # ...

# Test for prepare_features method
def test_prepare_features():
    # ...

# Mock model class for testing predict method
class ModelMock:
    # ...

# Test for predict method
def test_predict():
    # ...

# Test for lambda_handler method
def test_lambda_handler():
    # ...
```

Explanation:

- This script contains pytest test cases for the model and its functionalities.
- The test cases include 4 tests: ```base64_decode``` function, ```prepare_features``` method, ```predict method``` and ```lambda_handler``` method of the ```ModelService``` class.
- A ```ModelMock``` class is used to mock the ML model for testing purposes.
- Each test case asserts the expected result against the actual result obtained from the tested function or method.

### 6.1.1 Unit Testing (locally)

We can run the tests locally by executing:

```
$ pipenv run pytest tests/
```

which will work (with maybe some warnings), as it does not use a model from a S3 bucket but instead it creates a *mock* version of the model only for testing purposes.

### 6.1.2 Prelude to Integration Tests (Docker container & connection to AWS)

We can then check that the scripts ```lambda_function.py``` and ```model.py``` are working by using the following ```notes-code/Dockerfile```:

```dockerfile
FROM public.ecr.aws/lambda/python:3.10

RUN pip install -U pip
RUN pip install pipenv

COPY [ "Pipfile", "Pipfile.lock", "./" ]

RUN pipenv install --system --deploy

COPY [ "lambda_function.py", "model.py", "./" ]

CMD [ "lambda_function.lambda_handler" ]
```

Building the docker image:

```
$ docker build -t stream-model-duration:v2 .
```

and running the container:

```
$ docker run -it --rm \
    -p 8080:8080 \
    -e PREDICTIONS_STREAM_NAME="ride_predictions" \
    -e RUN_ID="e1efc53e9bd149078b0c12aeaa6365df" \
    -e TEST_RUN="True" \
    -e AWS_DEFAULT_REGION="eu-west-1" \
    stream-model-duration:v2

```

Finally, we can test that the container is working by running:

```
$ python test_docker.py
```

I haven't configured the AWS account, so it will not work.

## 6.2 - Integration tests with docker-compose

**Integration tests** are tests which cover the entire pipeline to assess how well the parts fit together:

- It can handle our request
- It can decode it
- It can download the model from S3 bucket
- It can apply the model 

In section 6.1.1 we have tested the different components separately, but we haven't checked if the entire thing works together. We did a first draft in section 6.1.2. We will continue from there.

### 6.2.1 Running (manual) integration test with ```Deepdiff```

Thus we start from the script ```test_docker.py``` and turn it into a proper test. The idea is to interface with the Docker container and return a dictionary. We use the ```deepdiff``` library to see the difference between the expected dictionary and the returned dictionary. Therefore, we install the library first:

```
$ pipenv install --dev deepdiff 
```

Now we rebuild the docker image with: 

```
$ docker build -t stream-model-duration:v2 .
```

We create a folder called ```notes-code/integration-test``` and copy the ```test_docker.py``` there. We also download the model from S3 into the folder ```integration-test/model```(to have it locally during testing). Once downloaded, the ```model``` folder looks as follows:

```
conda.yaml  MLmodel  model.pkl  python_env.yaml  requirements.txt
```

We can now run the container specifying this new model location (run from the ```integration-test``` folder:

```
$ docker run -it --rm \
    -p 8080:8080 \
    -e PREDICTIONS_STREAM_NAME="ride_predictions" \
    -e RUN_ID="Test123" \
    -e MODEL_LOCATION="/app/model" \
    -e TEST_RUN="True" \
    -e AWS_DEFAULT_REGION="eu-west-1" \
    -v $(pwd)/model:/app/model \
    stream-model-duration:v2
```

### 6.2.2 Automate test with docker-compose

So far we have been running the containers manually. We can automate the process by using docker-compose. We can define the bash script ```run.sh```:

```bash
#!/usr/bin/env bash

# This command changes the current directory to the directory
# where the script resides
cd "$(dirname "$0")"

# timestamp format is year-month-day-hour-minute
LOCAL_TAG=`date +"%Y-%m-%d-%H-%M"`
# "stream-model-duration" concatenated with the 
# LOCAL_TAG timestamp. 
export LOCAL_IMAGE_NAME="stream-model-duration:${LOCAL_TAG}"

# Detached mode so next command can be executed
docker compose up -d

# This line pauses the script execution for 1 second,
# allowing Docker Compose services to start.
sleep 1

pipenv run python test_docker.py

# The exit status of the previous command (test_docker.py) 
# is stored in the ERROR_CODE variable.
ERROR_CODE=$?

# If the exit status of the previous command is not equal to 0 
# (indicating an error occurred), the "docker compose logs" 
# command is executed, which shows the logs of the Docker 
# Compose services.
if [ ${ERROR_CODE} != 0 ]; then
    docker compose logs
fi

docker compose down

# The script exits with the same exit code as the previous 
# command, allowing the caller to know if an error occurred.
exit ${ERROR_CODE}
```

and the ```compose.yaml```:

```compose.yaml
services:
  backend:
    image: ${LOCAL_IMAGE_NAME}
    ports:
      - "8080:8080"
    environment:
      - PREDICTIONS_STREAM_NAME=ride_predictions
      - TEST_RUN=True
      - RUN_ID=Test123
      - AWS_DEFAULT_REGION=eu-west-1
      - MODEL_LOCATION=/app/model
    volumes:
      - "./model:/app/model"
```

Now we can execute the bash script from ```notes-code/``` by running:

```
$ ./integration-test/run.sh
```

## 6.3 - Testing cloud services with LocalStack

In this section we will test AWS services locally using LocalStack (w/o requiring an AWS account). It's very useful to test Kinesis, S3,...etc.

### 6.3.1 Add LocalStack to docker-compose

In the previous sections we haven't tested Kinesis. Here we'll use LocalStack to test it. To do that we add the Kinesis Service to the previous ```compose.yaml``` as follows:

```compose.yaml
  # ...
  kinesis:
    image: localstack/localstack
    ports:
      - "4566:4566"
    environment:
      - SERVICES=kinesis
```

and now we can lunch the container with only the Kinesis Service:

```
$ docker compose up kinesis
```
where we have provided a ```integration-test/.env``` file to define environment variables for the Docker Compose setup. We have added a 
 dummy value for ```LOCAL_IMAGE_NAME```, otherwise docker compose will not run. 

### 6.3.2 ```awscli``` & LocalStack  

Test it with 
```
$ aws --endpoint-url=http://localhost:4566 kinesis list-streams 
```
the result should be ```[ ]``` because we haven't created any stream. The command above is using the AWS Command Line Interface (CLI) to interact with a local instance of LocalStack running as a Docker container (specified as ```--endpoint-url=http://localhost:4566```). Make sure that you have ```awscli``` installed and keys and password set:

 - AWS Access Key ID: ```abc```
 - AWS Secret Access Key: ```xyz```
 - Default region name: ```eu-west-1```

Let us create a stream. All of this happens locally and no development is made in the AWS account.

```
$ aws --endpoint-url=http://localhost:4566 \
      kinesis create-stream                \
      --stream-name ride_predictions       \
      --shard-count 1
```

By executing this command, you are creating a Kinesis stream named *ride_predictions* on your local instance of LocalStack. A stream is a sequence of data records, and shards are the partitions within a stream that allow for parallel processing of the data.



### 6.3.3 Testing Kinesis client

We can automate the ```--endpoint-url``` in ```compose.yaml```by adding the line ```KINESIS_ENDPOINT_URL=http://kinesis:4566/```.

Next step is to add a function to ```model.py``` to test the Kinesis client:

```python
def create_kinesis_client():
    endpoint_url = os.getenv('KINESIS_ENDPOINT_URL')

    if endpoint_url is None:
        return boto3.client('kinesis')

    return boto3.client('kinesis', endpoint_url=endpoint_url)
```

which will connect to LocalStack or AWS, depending on the value of ```KINESIS_ENDPOINT_URL```.

We then modify ```run.sh``` to add a Kinesis stream every time the integration test is run:

```bash
#!/usr/bin/env bash

cd "$(dirname "$0")"


LOCAL_TAG=`date +"%Y-%m-%d-%H-%M"`
export LOCAL_IMAGE_NAME="stream-model-duration:${LOCAL_TAG}"
export PREDICTIONS_STREAM_NAME='ride_predictions'

docker build -t ${LOCAL_IMAGE_NAME} ..

docker compose up -d

sleep 1

aws --endpoint-url=http://localhost:4566       \
      kinesis create-stream                    \
      --stream-name ${PREDICTIONS_STREAM_NAME} \
      --shard-count 1

pipenv run python test_docker.py

ERROR_CODE=$?

if [ ${ERROR_CODE} != 0 ]; then
    docker compose logs
fi

#docker compose down

#exit ${ERROR_CODE}
```

where ```compose.yaml```:

```compose.yaml
services:
  backend:
    image: ${LOCAL_IMAGE_NAME}
    ports:
      - "8080:8080"
    environment:
      - PREDICTIONS_STREAM_NAME=${PREDICTIONS_STREAM_NAME}
      # - TEST_RUN=True
      - RUN_ID=Test123
      - AWS_DEFAULT_REGION=eu-west-1
      - MODEL_LOCATION=/app/model
      - KINESIS_ENDPOINT_URL=http://kinesis:4566/
    volumes:
      - "./model:/app/model"
  kinesis:
    image: localstack/localstack
    ports:
      - "4566:4566"
    environment:
      - SERVICES=kinesis   
```

And now we can run 

```
$ ./run.sh
```
and we observe both containers running:
- localstack/localstack
- stream-model-duration:2023-07-13-13-06

We could also list the streams and see the new stream created.

We can obtain the shard-iterator with:

```
$ aws kinesis get-shard-iterator \
--shard-id shardId-000000000000 \
--shard-iterator-type TRIM_HORIZON \
--stream-name ride_predictions \
--query 'ShardIterator'
```

and now we can use this shard iterator to see what is inside the Kinesis stream:

```
$ aws --endpoint-url=http://localhost:4566 kinesis get-records --shard-iterator AAAAAAAAAAEKI4NpH/+OBt9nuDiWveLMU3AC04xCuNo+FAd4A8AG0xie44BvI515xlgURUqDa4yQNbbebn/Mh43NjDCW6tJ8aD87X9PTooaZWjpWklDFXATaLHKT3f+lZSyrsNC8dkb7sS/uLQHyb5OrMKM8YS7kj+LqrX93tZ3hRRaiTavCLF2HYvDA5opnP8sM3/y/dciH2NWrE4PrT4YHoJXSoknd 
```

You can decode the results by using:

```
$ echo $DATA | base64 -d
```

### 6.3.4 Testing Kinesis client automatically

We will use the script ```test_kinesis.py``` to perform all the previous steps automatically. Overall, this script retrieves a single record from an Kinesis stream, compares it to an expected record, and performs assertions to ensure the records match. It uses ```boto3``` for interacting with Kinesis.

```python
import os
import json
from pprint import pprint

import boto3
from deepdiff import DeepDiff

# If the environment variable is not set (normally set to AWS),
# it defaults to LocalStack,
kinesis_endpoint = os.getenv('KINESIS_ENDPOINT_URL',
                             "http://localhost:4566")
# Kinesis client using the boto3.client method
kinesis_client = boto3.client('kinesis', 
                              endpoint_url=kinesis_endpoint)


stream_name = os.getenv('PREDICTIONS_STREAM_NAME', 
                        'ride_predictions')
# represents a specific shard in the Kinesis stream
shard_id = 'shardId-000000000000'

# retrieve a shard iterator from Kinesis client
# 'TRIM_HORIZON' -> it starts reading from the oldest 
#                   available records in the shard.
shard_iterator_response = kinesis_client.get_shard_iterator(
    StreamName=stream_name,
    ShardId=shard_id,
    ShardIteratorType='TRIM_HORIZON',
)

# retrieves the actual shard iterator ID
shard_iterator_id = shard_iterator_response['ShardIterator']

# retrieve records from the stream. It provides the ShardIterator 
# obtained earlier and sets the Limit parameter to 1, indicating 
# that only one record should be retrieved.
records_response = kinesis_client.get_records(
    ShardIterator=shard_iterator_id,
    Limit=1,
)

# retrieves the actual records from the response
records = records_response['Records']
pprint(records)

assert len(records) == 1

# retrieves the data field from the first record in the records 
# list, which contains the actual record as a JSON string.
actual_record = json.loads(records[0]['Data'])
pprint(actual_record)

expected_record = {
    'model': 'ride_duration_prediction_model',
    'version': 'Test123',
    'prediction': {
        'ride_duration': 21.3,
        'ride_id': 256,
    },
}

diff = DeepDiff(actual_record, 
                expected_record,
                significant_digits=1)
print(f'diff={diff}')

# These lines assert that there are no differences related to 
# changed values or type changes between the actual_record and 
# expected_record. If any such differences exist, an exception
# will be raised.
assert 'values_changed' not in diff
assert 'type_changes' not in diff

# If the script reaches this line, it means that no assertion 
# errors occurred, and the comparison between the actual and 
# expected records passed successfully.
print('all good')
```

and ```run.sh``` is modified to:

```bash

#!/usr/bin/env bash

cd "$(dirname "$0")"


LOCAL_TAG=`date +"%Y-%m-%d-%H-%M"`
export LOCAL_IMAGE_NAME="stream-model-duration:${LOCAL_TAG}"
export PREDICTIONS_STREAM_NAME='ride_predictions'

docker build -t ${LOCAL_IMAGE_NAME} ..

docker compose up -d

sleep 1

aws --endpoint-url=http://localhost:4566       \
      kinesis create-stream                    \
      --stream-name ${PREDICTIONS_STREAM_NAME} \
      --shard-count 1

pipenv run python test_docker.py

ERROR_CODE=$?

if [ ${ERROR_CODE} != 0 ]; then
    docker compose logs
    docker compose down
fi

pipenv run python test_kinesis.py

ERROR_CODE=$?

if [ ${ERROR_CODE} != 0 ]; then
    docker compose logs
    docker compose down
fi

docker compose down

```

and then we can test everything by running

```
$ ./run.sh
```

## 6.4 - Code quality: linting and formatting

Code quality refers to the overall quality, readability and maintainability of your code. **Linting** and **formatting** are two essential practices in Python that contribute to improving code quality.

- **Linting** is the process of (statically) analyzing your code against a set of predefined rules or coding standards for potential stylistic inconsistencies and suspicious coding patterns. The most popular linter is ```pylint```, but there are other alternatives like ```flake8``` (faster) or ```ruff``` (written in Rust, much much faster).

- **Formatting** ensures that, when different developers may have different preferences for indentation, line length, spacing, ... etc, the code is organized and presented in a uniform manner, making it easier to read and understand. The most widely used Python code formatter is ```black```.

### 6.4.1 Linting

In Python, it is recommended to follow the [PEP8 guidelines](https://pep8.org/). This ensures clean, standard code formats and helps code readability. We may want styled code, however, conforming to the PEP8 style manually may be cumbersome. So we can use ```pylint``` instead. 

To install ```pylint``` within the virtual environment in ```notes-code/```:

```
$ pipenv install --dev pylint
```
We can now test it on the ```model.py``` script:

```
$ pipenv shell
$ pylint model.py
```
which will output:
```
************* Module model
model.py:1:0: C0114: Missing module docstring (missing-module-docstring)
...
model.py:106:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 8.12/10

```
You can test all the files in the folder with 

```
$ pylint --recursive=y .
```
There is also a plugin for ```pylint``` in PyCharm, so that you can analyse the files directly in the IDE (make sure first to install ```pylint``` in the environment that you are currently using):

![title](images/pylint.png)

You can add the file ```notes-code/.pylintrc``` to configure the linter:

 - ```~/.pylintrc``` for default user configuration
 - ```<your project>/.pylintrc``` for default project configuration (used when you'll run ```pylint <your project>```)

For example:

```
[MESSAGES CONTROL]

disable=missing-function-docstring,
        missing-final-newline,
        missing-class-docstring
```
where the ```.pylintrc``` above allows you to disable specific checks:
- when a function or method does not have a docstring
- when there is no final newline at the end of a file
- classes that lack a docstring

There is another alternative: to use a ```pyproject.toml``` file. The ```pyproject.toml``` file can contain configurations for various tools used in the project. For example, you can specify code linters, formatters, and other development tools along with their respective configurations. This avoids using a configuration file for each of them (e.g. ```.pylintrc``` for ```pylint```, etc). So far we only have ```pylint```, so it would look like this:

```toml
[tool.pylint.messages_control]

disable = [
    "missing-function-docstring",
    "missing-final-newline",
    "missing-class-docstring",
]
```
Don't forget to remove the previous ```.pylintrc``` file if you are using ```pyproject.toml```.

To ignore certain errors in certain areas of the code, we use ```# pylint: disable=[ERROR CODE]``` blocks. E.g., errors related to a class:

```python
# ...

class KinesisCallback:
    # pylint: disable=too-few-public-methods
    
    # ...
```

### 6.4.2 Formatting

For formatting Python code, the most common tool is ```black```. From the docs:

*Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.*

*Blackened code looks the same regardless of the project you're reading. Formatting becomes transparent after a while and you can focus on the content instead.*

To install it:

```
$ pipenv install --dev black isort
```

Additionally, the ```isort``` library allows you to automate the process of organizing import statements and reduce the time spent manually sorting imports. It is often used in conjunction with ```black```.

Now you can type (within virtual env):

```
$ black --diff model.py
```
This option will show you the changes that black would make to the code. Similarly with ```isort```:
```
$ isort --diff model.py
```
Then you can 
1. run ```$ isort .``` to sort the imports
2. run ```$ black .``` to format the code
3. run ```pylint --recursive=y .``` to check the code
4. run ```pytest tests/``` to (unit) test the code.

One trick to avoid Black to put several lines in one line is to add a (redundant) comma at the end of the last entry. For example:

```python
# ...

ride = {
        "PULocationID": 130,
        "DOLocationID": 205,
        "trip_distance": 3.66,
    }

# ...

```

We can also add the configuration for ```black``` and ```isort``` to ```pyproject.toml```:

```toml
[tool.pylint.messages_control]
disable = [
    "missing-function-docstring",   
    "missing-final-newline",          
    "missing-class-docstring",
]

[tool.black]
line-length = 88                  # maximum line length to 88
target-version = ['py310']        # Python version as Python 3.10
skip-string-normalization = true  # Skip string change (''->"")

[tool.isort]
# profile=black  # Use the "black" profile for import sorting
multi_line_output = 3   # Use a grid-style output for imports
length_sort = true  # Sort imports by name length 
                    # (overrides "black" profile)

```


Finally, to enable ```black``` in Pycharm:

1. ```(mlops-zoomcamp) $ pip install 'black[d]'```
2. Install BlackConnect plugin
3. Follow [instructions 3-5](https://black.readthedocs.io/en/stable/integrations/editors.html)
4. Now you can format the currently opened file by selecting ```Code -> Reformat Code```

## 6.5 - Git pre-commit hooks

Running tests, formatting, and linting should occur whenever code changes, but it's often easy to overlook. Therefore, running them automatically before we commit to Git can be beneficial. To do that, we can use **pre-commit hooks** 

We can install ```pre-commit``` within our virtual environment:

```
$ pipenv install --dev pre-commit
```

Pre-commit hooks are specific to each Git repository and are stored in the repository's ```.git/hooks``` directory. However, in our case we have a very big repo (```mlops-zoomcamp```) and we are only interested to set up the pre-commit hooks for our current directory ```notes-code/```. Therefore, we could consider the current directory as a standalone repo by using:

```
$ git init
```

which will create a ```.git/``` directory in ```notes-code/``` (this will be removed at the end of this section, now it's used just for explanatory purposes)

### 6.5.1 Initializing pre-commit-hooks config file

Pre-commit (the program we have just installed) uses ```.pre-commit-config.yaml``` to specify what programs will run at every commit (the hooks). We can initialize this file with sample config file:

```
$ pipenv shell
$ pre-commit sample-config > .pre-commit-config.yaml
```
This file is created in ```notes-code/``` and looks like this:

```yaml
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.2.0
    hooks:
    -   id: trailing-whitespace # check for trailing whitespace
    -   id: end-of-file-fixer # ensure proper line endings
    -   id: check-yaml # validate YAML files
    -   id: check-added-large-files # prevent large files from 
                                    # being added to the repo
```

To install the hooks defined above:
``` 
$ pre-commit install
```
Now you can see that they are installed at ```.git\hooks\pre-commit```. It's important to note that pre-commit hooks are only executed on the local repository where they are set up. So when getting a new copy of the repo (clone), you need to run ```$ pre-commit install```.

### 6.5.2 Launch pre-commit

To apply the hooks defined in ```.pre-commit-config.yaml``` to the all untracked files:

```
$ git add .
$ git commit -m "initial commit"
```

which will apply the hooks and then commit. *Important:* the files modified by the application of the hooks need to be commited again. Thus:

```
$ git add .
$ git commit -m "fixes from pre-commit default"
```

### 6.5.3 Adding more hooks to config file

Now we want to add ```black```, ```isort```, ```pylint``` and ```pytest``` so that they are applied before each commit. Therefore, the ```.pre-commit-config.yaml``` results:

```yaml
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v3.2.0
  hooks:
    - id: trailing-whitespace
    - id: end-of-file-fixer
    - id: check-yaml
    - id: check-added-large-files
- repo: https://github.com/pycqa/isort
  rev: 5.10.1
  hooks:
    - id: isort
      name: isort (python)
- repo: https://github.com/psf/black
  rev: 22.6.0
  hooks:
    - id: black
      language_version: python3.10
- repo: local
  hooks:
    - id: pylint
      name: pylint
      entry: pylint
      language: system
      types: [python]
      args: [
        "-rn", # Only display messages
        "-sn", # Don't display the score
        "--recursive=y"
      ]
- repo: local
  hooks:
    - id: pytest-check
      name: pytest-check
      entry: pytest
      language: system
      pass_filenames: false
      always_run: true # always run regardless of file changes 
      args: [
        "tests/"
      ]

```

Explanation:

- The ```repos``` section defines a list of repositories from which the hooks will be downloaded.
- Remote repos:
  - ```https://github.com/pre-commit/pre-commit-hooks``` It contains a collection of useful pre-commit hooks for various purposes. 
  - ```https://github.com/pycqa/isort``` It contains ```isort```
  - ```https://github.com/psf/black``` It contains ```black```
- Local repos (hooks are located locally on your system)
  - ```pylint``` for running the Pylint code analysis tool
  - ```pytest-check``` for running the Pytest framework
  
Do not forget to remove the standalone repo ```notes-code/.git```:

```
$ rm -rf .git
```

## 6.6 - Makefiles and make

### 6.6.1 Brief introduction

By utilizing ```make``` and ```Makefile``` in your Python project, you can automate repetitive tasks.

- ```Makefile``` contains a set of rules, called *targets* (e.g. compiling the code, running tests, generating documentation), along with their dependencies and associated commands. 
- By running the ```make``` command, you can automatically build or perform specific tasks in your project based on these rules

Normally in Linux systems ```make``` comes pre-installed. For example, we can create ```notes-code/Makefile```:

```python
run:           # target
    echo 123   # Requires Tabs
```
where ```run``` is an alias (target). Now when ```$ make run``` is executed, ```echo 123``` is executed as a result. We can also make aliases depend on other aliases:

```python
test-1:                # target 
    echo '1st test'
test-2:                # target
    echo '2nd test'
run: test-1 test-2     # target (dependency on test-1, test-2)
    echo 'running tests'
```
where ```run``` depends on ```test-1``` and ```test-2```, and when ```$ make run``` is executed, all ```echo '1st test'```, ```echo '2nd test'```, ```echo 'running tests'``` are executed in that order.

### 6.6.2 ```Makefile``` in a Python project

In our case, we want to run *tests* (unit tests and integration tests) and *quality checks* (pylint, black, isort) before running the program or commiting or deploying to AWS. To do so we can make use of the following ```Makefile```: 

```python
# current date and time (used to tag the Docker image)
LOCAL_TAG:=$(shell date +"%Y-%m-%d-%H-%M")
# name of the Docker image
LOCAL_IMAGE_NAME:=stream-model-duration:${LOCAL_TAG}

# This target is responsible for running the (unit) tests 
test:
	pytest tests/

# This target performs several code quality checks
quality_checks:
	isort .
	black .
	pylint --recursive=y .

# This target first ensures that the quality checks and tests
# have passed, and then it builds the Docker image.
build: quality_checks test
	docker build -t ${LOCAL_IMAGE_NAME} .

# This target is responsible for running integration tests 
# using the built Docker image
integration_test: build
	LOCAL_IMAGE_NAME=${LOCAL_IMAGE_NAME} bash integraton-test/run.sh

# This target is responsible for publishing the Docker image, 
# typically to a container registry.
publish: build integration_test
	LOCAL_IMAGE_NAME=${LOCAL_IMAGE_NAME} bash scripts/publish.sh

# This target is typically used to set up the project 
# environment initially.
setup:
	pipenv install --dev
	pre-commit install
```

By defining these targets and their associated commands in the ```Makefile```, you can easily run specific tasks by executing the ```make``` command followed by the target name. For example:
- ```$ make test``` will execute the tests
- ```$ make publish``` will build the image, run integration tests, and publish the Docker image.

Note: ```notes-code/scripts/publish.sh``` is defined as:

```bash
#!/usr/bin/env bash

echo "publishing image ${LOCAL_IMAGE_NAME} to Amazon ECR repository..."
```

It's actually not publishing to ECR but just printing a message.