# Notes:

## Module 4.1 Model Deployment

### Recap:

We looked at the "Design" phase where we gather the requirements and understand if machine learning is the right solution for our problem. Then we looked at the "Training" phase where we talked about experiment tracking like how we and we also talked about productionizing our jupyter notebook and turning this into a machine learning pipeline. 

Now that the model is registered in MLflow Model Registry and production ready, we need to deploy that so that we can get the prediction result for the given data to realize its value.

### Model Deployment:

There are primarily two kinds of deployments:
- Batch (offline) - runs regularly
- Online - Up & running all the time with two sub-options:
    - Web service
    - Streaming

<img src="notes-images/deployment-types.png" width="700"/>

### Batch Deployment:

The model doesn't run all the time, but we regularly apply it to new data. The regularity could be hours, days, week *etc*.
A typical batch deployment pipeline is sketched below. Let's say we have a database with all the data.
The scoring job fetches some data from the database and applies the model to that data. The predictions are then written to another
database. Another software can read predictions and react to them, for example, by preparing a report or raising an alarm.

<img src="notes-images/batch-deployment-pipeline.png" width="700"/>

### Web service deployment:

Users use an app that communicates with the backend, which in turn communicates with a service that runs the model.
The model needs to be always running. When the client sends a request,
it initiates the connection to the backend service which remains open during the processing operation.

<img src="notes-images/web-deployment-pipeline.png" width="700"/>

For example, a client wants to know the duration of an upcoming taxi ride. The mobile app sends a request to the backend, which performs calculations using
the model in the service and returns the result back to the app.


### Streaming deployment:

In the streaming settings, we have producers (they produce events) and consumers (they consume events). Producers pass some events to the event stream. 
Consumers read it and respond to events. Here we have one-to-many (single producer) or many-to-many (multiple producers) client-server relationship.

Producer send an event but does not really care what happens with it since there is not explicit connection between producers and consumers.

<img src="notes-images/streaming-deployment-pipeline.png" width="700"/>

Let's have a look at the iweb service example. Now the backend becomes a producer. It generates the event "Ride started" containing all information about
the ride. Then multiple services consume this stream and run something for it:  

- consumer 1: tip prediction
- consumer 2: duration prediction consumer
- consumer 3: ...

## Module 4.2  Web-services: Deploying models with Flask and Docker

To deploy as a web service we go through steps:
1. Getting the Python envrionment used to train/test the model using pipenv 
2. Re-writing the prediction script and wrap it with a backend (Flask used here) 
3. Creating a Docker Container and putting our prediction backend with it along with the Python environment

#### Python Environment:

We want to use the model developed in week 1 of the course. For that, we'll need to obtain the python environment we used to train and test the model for consistency. To obtain the packages and the package versions of the current python environment (even if conda), we use pip freeze; This outputs the installed packages and their versions. In our case, we're mostly interested in getting the scikit-learn version. So we grep scikit to only get lines with scikit in them:

```
cd "mlops-zoomcamp/04-deployment/web-service" (go to the web-service folder)
```

```
pip freeze | grep scikit-learn 
 OR 
pip list | grep scikit-learn
```

If Pipfile or Pipfile.lock files already exist, make sure to delete them before creating a new virtual environment. When you run the below commands, it will create a new Pipfile and a Pipfile.lock stores the versions of the packages that we want.

With pipenv, create a new virtual environment:

```
pipenv install scikit-learn==1.2.2 flask
```

Activate the environment using:
```
pipenv shell
```



### Writing the prediction script:

Quick recap: Our week 1 model writes 2 pickle files. One is the Linear Regressor, the other is the DictVectorizer object. The prediction moves through 3 steps:

1- Feature Engineering 2- DictVectorizer 3- Regressor

As a web service, our predictor will take a dictionary of a single "row" rather than a Pandas DataFrame as input.

Two functions to deal with JSON files:
- `jsonify(D)` transforms a dictionary D into a JSON
- `request.get_json()` reads the JSON passed to the app

Check out the predict.py file to see how we added Flask to the prediction script. Make sure you are in the right virtual environment and run `python predict.py` to run the prediction script.

To request a prediction from the server, we create another file test.py. This file will post its ride information to the server and print out the response (i.e: The predicted duration). While the prediction script is running on the terminal, open up another terminal and make sure you are in the right virtual environment and run `python test.py` to run the get a prediction. You will get this output: `{'duration': 26.43883355119793}`

### Deploying as WGSI:

When you run `pythong predict.py` you'll notice this warning message:
```
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
```

Our current flask set up is for the development environment. Install gunicorn in order to solve the following production environment type warning. gunicorn is one of the production servers.

To deploy the model into production, we use gunicorn to deploy the web service: 

`pipenv install gunicorn` (you'll notice that the Pipfile and Pipfile.lock files also got updated with gunicorn)

`gunicorn --bind=0.0.0.0:9696 predict:app`

where predict is the predict.py located in the current directory, and app is the Flask app defined on that file (See above).

### Docker Container:

Now we want to deploy our predictor into a Docker Container for reproducibility, scalability, security.

Check out the version of python by typing `python -V` in the command line:
```
(web-service) aasth@Aasthas-Air web-service % python -V
Python 3.9.6
```

In our case, the python version is 3.9.6.

In our DockerFile (refer to Dockerfile):

- we will write `FROM python:3.9-slim`. "-slim" is just a reduced size of the image.

- If we type just `RUN pipenv install` then a virtual environment will be created inside of docker which is not needed as docker is an isolated container as it is. So we can install the packges directly on the system by running `RUN pipenv install --system`. The `--deploy` makes sure Pipfile.lock is up-to-date and will crash if it isn't. So final command is `RUN pipenv install --system --deploy`

- The lines `COPY [ "predict.py", "lin_reg.bin", "./" ]` and `COPY [ "Pipfile", "Pipfile.lock", "./" ]` means copy the predict.py, lin_reg.bin, Pipfile and Pipfile.lock into the current directory.

- We specify the current directory by `WORKDIR /app`. This command creates and cd's (enters) into the "/app" directory


Open the Docker app and while it is open in the backgroun, we then build the Docker Image with:

`docker build -t ride-duration-prediction-service:v1 .`

In the above, `ride-duration-prediction-service` is the image name and `v1` is the tag.

And run the container that was built with:

`docker run -it --rm -p 9696:9696 ride-duration-prediction-service:v1`

Now when we request predictions like earlier, we're instead calling the WGSI within the Docker Container. While the container is running in the terminal , open another terminal and make sure you are in the right virtual environment and run `python test.py` to run the get a prediction. You will get this output: `{'duration': 26.43883355119793}`.

So far we have packaged the model in a docker file that can run in every docker compatible compute to serve. However the model we used was directly loaded from the local path where it was stored and we had learnt in previous sessions that the candidate models were stored in model registry that we were supposed to use. Hence, in the next section we will learn how to fetch the model from model registry to serve.
