# Notes:

## Module 6.1 : Testing Python code with pytest

In this section, we will work on the [streaming code in the 04-deployment section](/mlops-zoomcamp/04-deployment/streaming/lambda_function.py). We will make the code better from an engineering pov by adding tests.


### Creating and activate virtual environment:

1. Go to the correct directory by: `cd mlops-zoomcamp/06-best-practices/code`
2. Activate any existing conda virtual evironment: `. /opt/homebrew/anaconda3/bin/activate && conda activate /opt/homebrew/anaconda3/envs/mlops-zoomcamp-venv;`
3. Create a new virtual environment: `pipenv install`
4. Activate it by `pipenv shell`
5. Install pytest with `pipenv install --dev pytest`. We use the dev argument cause we want pytest only in the dev environment and not in the production environment.
6. Find the location of your virtual environment by typing `pipenv --venv`. You'll get the path `/Users/aasth/.local/share/virtualenvs/code-JCzC6QQn` to the venv. Copy the path.
7. We need to set up our python envionment in VSCode. Hit `Cmd+Shift+P` -> `Select Python Interpreter` and paste the path of the venv that you copied in step 6.
8. We will configure the python tests. Click on the `Testing` tab which is lcoated on the left panel of VSCode. Click om the `Configure Python Tests` button. Select `pytest` and the `test` directory.

### Testing if Docker works

1. Open your Docker app and in the terminal with the `code` environment activated, run `docker build -t stream-model-duration:v2 .`.

*Note: If you already have a previous docker container running, it might be exposed to the same port that we will use now. A good practice is to use the `docker ps` command to lists all active Docker containers along with their respective port mappings. You can stop the previous docker containers in case you don't need it.*

2. Now run in the same terminal:

``` 
docker run -it --rm \
    -p 8080:8080 \
    -e PREDICTIONS_STREAM_NAME="ride_predictions" \
    -e RUN_ID="e1efc53e9bd149078b0c12aeaa6365df" \
    -e TEST_RUN="True" \
    -e AWS_DEFAULT_REGION="eu-west-1" \
    stream-model-duration:v2
```

3. Then open up a new terminal with the `code` directory (`cd mlops-zoomcamp/06-best-practices/code`) and activate the `code` evnrionment. Run `python ./integraton-test/test_docker.py`. You should see this:

<img src="notes-images/test_docker output.png" width="700"/>


### Running the unit tests

In the code folder with the `code` environment activated, run `pytest tests/` and you should get:

``` 
================================================== test session starts ==================================================
platform darwin -- Python 3.9.6, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/aasth/Desktop/Data analytics/MLOps/datatalks-zoomcamp/mlops-zoomcamp/06-best-practices/code
collected 4 items                                                                                                       

tests/model_test.py ....                                                                                          [100%]

=================================================== 4 passed in 1.06s ===================================================
```


## Module 6.2 : Integration tests with docker-compose

### Integration tests:

Unit tests just test partial of the code, we still need to test the whole code and we will do that using integration test. We will convert our `test_docker.py` file into an intergration test.

We can do that by adding assertions and the `DeepDiff` library. The `test_docker.py` file returns a dictionary and we use deepdiff to see the difference between the expected dictionary and the returned dictionary.

### Load the model first from local env and remove the dependency in S3

We were loading our model from S3 in the [model.py](./code/model.py) file. We can remove that dependency by adding `get_model_location() ` function to the model file. It will check if the user has specified a local path where the model is downnloaded and load it from that path. If the user doesn't specify a path, then it will load the model from S3.

Then in the `code` directory with the `code` environment activated run:

``` 
docker run -it --rm \
    -p 8080:8080 \
    -e PREDICTIONS_STREAM_NAME="ride_predictions" \
    -e RUN_ID="Test123" \
    -e MODEL_LOCATION="/app/model" \
    -e TEST_RUN="True" \
    -e AWS_DEFAULT_REGION="eu-west-1" \
    -v "$(pwd)/model:/app/model" \
    stream-model-duration:v2
```


Then in another terminal (in the `code` directory with the `code` environment activated), run `python ./integraton-test/test_docker.py` to test the model which has been downloaded in local env.

### Automating tests

Right now to run the latest version of the tests, we need to build the docker container first and then run the docker run command and in another terminal run the test. We can automate this:

1. Create a new file named `run.sh` under integration-test and changes its permissions by running `chmod +x ./integraton-test/run.sh`. `chmod +x` on a file (your script) only means, that you'll make it executable.

    #### [Run.sh file](./code/integraton-test/run.sh):

    - The first line is `#!/usr/bin/env bash` which basically means that we are going to use the bash command.

    - The `cd "$(dirname "$0")"` takes us to the directory of our script (the model directory)

    - The 
    ```
    LOCAL_TAG=date +"%Y-%m-%d-%H-%M"
    export LOCAL_IMAGE_NAME="stream-model-duration:${LOCAL_TAG}"
    ``` 
    line maintains the build version

    - The `docker build -t ${LOCAL_IMAGE_NAME} ..` builds the image

    - `docker compose up -d`: start docker compose

    - `sleep 1`: give the container some time to start so we make the program sleep for sometime

    - `ERROR_CODE=$?`: reads the exit status of the last command executed. The error code will be 0 if the script executes successfully.

    - `if [${ERROR_CODE} != 0]; then docker compose logs fi`: When you see a non-zero error code then, print the docker logs

    - `docker compose down`: stops containers and removes containers


2. Create docker-compose.yaml based on the format of [Compose file reference](https://docs.docker.com/compose/compose-file/06-networks/)

3. Open a terminal in a non-virtual env, run ./run.sh.

## Module 6.3 : Testing cloud services with LocalStack

We wrote unit tests to test our function and intergation test to test our service but we didn't test Kinesis. We need to test the Kinesis connection or the function that puts the responses to the Kinesis stream with LocalStack. LocalStack helps us to develop and test AWS applications locally.

1. Add the kinesis service to the [docker-compose.yaml](./code/integraton-test/docker-compose.yaml) file.

2. First `cd integraton-test` and then run `export LOCAL_IMAGE_NAME=123` and then `export PREDICTIONS_STREAM_NAME=ride_predictions` and finally `docker-compose up kinesis` in the `code` environment. 

3. We will create a stream locally using localstack. In another terminal, in the `code` directory and in the `code` envrionment,run:
``` 
aws --endpoint-url=http://localhost:4566 \
    kinesis create-stream \
    --stream-name ride_preditions \
    --shard-count 1
```

4. We add the `create_kinesis_client()` function in our [model.py file](./code/model.py) where we use a similar logic that we used when we removed the dependency from S3. We check if the local path to kinesis is set, then we can use that path otherwise we actually call the AWS Kinesis client.

5. Stop the `docker-compose up kinesis` (by pressing control+C) and then in a new terminal in a **non-virtual envrionment**, run `cd mlops-zoomcamp/06-best-practices/code/integraton-test` and then run `./run.sh`:

<img src="notes-images/run bash output.png" width="700"/>

6. We will now check the content in the stream. Get the shard ID present in the stream by `aws --endpoint-url=http://localhost:4566 kinesis list-shards --stream-name ride_preditions`. It will return the shard ID as "shardId-000000000000". Use this to set the SHARD variable.

```
export SHARD="shardId-000000000000"
export PREDICTIONS_STREAM_NAME=ride_predictions 
aws  --endpoint-url=http://localhost:4566 \
    kinesis     get-shard-iterator \
    --shard-id ${SHARD} \
    --shard-iterator-type TRIM_HORIZON \
    --stream-name ${PREDICTIONS_STREAM_NAME} \
    --query 'ShardIterator'
```

_Note: Step 6 didn't work for me. It says `An error occurred (ResourceNotFoundException) when calling the GetShardIterator operation: Stream arn arn:aws:kinesis:us-east-1:000000000000:stream/ride_predictions not found`__

7. Create the [test_kinesis.py file](./code/integraton-test/test_kinesis.py)

8. Edit the logic in run.sh to incorporate the test_kinesis.py as well (similar logic to the test_docker.py file that we used in run.sh)

9. Stop any docker containers that are running. In a new terminal in a **non-virtual envrionment**, run `cd mlops-zoomcamp/06-best-practices/code/integraton-test` and then run `./run.sh`. You can see that the kinesis service has finished running:

<img src="notes-images/run bash kinesis output.png" width="700"/>
