# Notes:

## Module 4.1 Model Deployment

### Recap:

We looked at the "Design" phase where we gather the requirements and understand if machine learning is the right solution for our problem. Then we looked at the "Training" phase where we talked about experiment tracking like how we and we also talked about productionizing our jupyter notebook and turning this into a machine learning pipeline. 

Now that the model is registered in MLflow Model Registry and production ready, we need to deploy that so that we can get the prediction result for the given data to realize its value.

### Model Deployment:

There are primarily two kinds of deployments:
- Batch (offline) - runs regularly
- Online - Up & running all the time with two sub-options:
    - Web service
    - Streaming

<img src="notes-images/deployment-types.png" width="700"/>

### Batch Deployment:

The model doesn't run all the time, but we regularly apply it to new data. The regularity could be hours, days, week *etc*.
A typical batch deployment pipeline is sketched below. Let's say we have a database with all the data.
The scoring job fetches some data from the database and applies the model to that data. The predictions are then written to another
database. Another software can read predictions and react to them, for example, by preparing a report or raising an alarm.

<img src="notes-images/batch-deployment-pipeline.png" width="700"/>

### Web service deployment:

Users use an app that communicates with the backend, which in turn communicates with a service that runs the model.
The model needs to be always running. When the client sends a request,
it initiates the connection to the backend service which remains open during the processing operation.

<img src="notes-images/web-deployment-pipeline.png" width="700"/>

For example, a client wants to know the duration of an upcoming taxi ride. The mobile app sends a request to the backend, which performs calculations using
the model in the service and returns the result back to the app.


### Streaming deployment:

In the streaming settings, we have producers (they produce events) and consumers (they consume events). Producers pass some events to the event stream. 
Consumers read it and respond to events. Here we have one-to-many (single producer) or many-to-many (multiple producers) client-server relationship.

Producer send an event but does not really care what happens with it since there is not explicit connection between producers and consumers.

<img src="notes-images/streaming-deployment-pipeline.png" width="700"/>

Let's have a look at the iweb service example. Now the backend becomes a producer. It generates the event "Ride started" containing all information about
the ride. Then multiple services consume this stream and run something for it:  

- consumer 1: tip prediction
- consumer 2: duration prediction consumer
- consumer 3: ...

## Module 4.2  Web-services: Deploying models with Flask and Docker

`Reference: https://github.com/ayoub-berdeddouch/mlops-journey/blob/main/deployment-04.md`

To deploy as a web service we go through steps:
1. Getting the Python envrionment used to train/test the model using pipenv 
2. Re-writing the prediction script and wrap it with a backend (Flask used here) 
3. Creating a Docker Container and putting our prediction backend with it along with the Python environment

#### Python Environment:

We want to use the model developed in week 1 of the course. For that, we'll need to obtain the python environment we used to train and test the model for consistency. To obtain the packages and the package versions of the current python environment (even if conda), we use pip freeze; This outputs the installed packages and their versions. In our case, we're mostly interested in getting the scikit-learn version. So we grep scikit to only get lines with scikit in them:

```
cd "mlops-zoomcamp/04-deployment/web-service" (go to the web-service folder)
```

```
pip freeze | grep scikit-learn 
 OR 
pip list | grep scikit-learn
```

If Pipfile or Pipfile.lock files already exist, make sure to delete them before creating a new virtual environment. When you run the below commands, it will create a new Pipfile and a Pipfile.lock stores the versions of the packages that we want.

Before running the below command, make sure you are in any active virtual environment (for example, you can use the `mlops-zoomcamp-venv` by typing `conda activate /opt/homebrew/anaconda3/envs/mlops-zoomcamp-venv`). Then once you are in an active environment (in our case `mlops-zoomcamp-venv`), use pipenv to create a new virtual environment:

```
pipenv install scikit-learn==1.2.2 flask
```

Activate the environment using:
```
pipenv shell
```



### Writing the prediction script:

Quick recap: Our week 1 model writes 2 pickle files. One is the Linear Regressor, the other is the DictVectorizer object. The prediction moves through 3 steps:

1- Feature Engineering 2- DictVectorizer 3- Regressor

As a web service, our predictor will take a dictionary of a single "row" rather than a Pandas DataFrame as input.

Two functions to deal with JSON files:
- `jsonify(D)` transforms a dictionary D into a JSON
- `request.get_json()` reads the JSON passed to the app

Check out the predict.py file to see how we added Flask to the prediction script. Make sure you are in the right virtual environment and run `python predict.py` to run the prediction script.

To request a prediction from the server, we create another file test.py. This file will post its ride information to the server and print out the response (i.e: The predicted duration). While the prediction script is running on the terminal, open up another terminal and make sure you are in the right virtual environment and run `python test.py` to run the get a prediction. You will get this output: `{'duration': 26.43883355119793}`

### Deploying as WGSI:

When you run `pythong predict.py` you'll notice this warning message:
```
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
```

Our current flask set up is for the development environment. Install gunicorn in order to solve the following production environment type warning. gunicorn is one of the production servers.

To deploy the model into production, we use gunicorn to deploy the web service: 

`pipenv install gunicorn` (you'll notice that the Pipfile and Pipfile.lock files also got updated with gunicorn)

`gunicorn --bind=0.0.0.0:9696 predict:app`

where predict is the predict.py located in the current directory, and app is the Flask app defined on that file (See above).

### Docker Container:

Now we want to deploy our predictor into a Docker Container for reproducibility, scalability, security.

Check out the version of python by typing `python -V` in the command line:
```
(web-service) aasth@Aasthas-Air web-service % python -V
Python 3.9.6
```

In our case, the python version is 3.9.6.

In our DockerFile (refer to Dockerfile):

- we will write `FROM python:3.9-slim`. "-slim" is just a reduced size of the image.

- If we type just `RUN pipenv install` then a virtual environment will be created inside of docker which is not needed as docker is an isolated container as it is. So we can install the packges directly on the system by running `RUN pipenv install --system`. The `--deploy` makes sure Pipfile.lock is up-to-date and will crash if it isn't. So final command is `RUN pipenv install --system --deploy`

- The lines `COPY [ "predict.py", "lin_reg.bin", "./" ]` and `COPY [ "Pipfile", "Pipfile.lock", "./" ]` means copy the predict.py, lin_reg.bin, Pipfile and Pipfile.lock into the current directory.

- We specify the current directory by `WORKDIR /app`. This command creates and cd's (enters) into the "/app" directory


Open the Docker app and while it is open in the backgroun, we then build the Docker Image with:

`docker build -t ride-duration-prediction-service:v1 .`

In the above, `ride-duration-prediction-service` is the image name and `v1` is the tag.

And run the container that was built with:

`docker run -it --rm -p 9696:9696 ride-duration-prediction-service:v1`

Now when we request predictions like earlier, we're instead calling the WGSI within the Docker Container. While the container is running in the terminal , open another terminal and make sure you are in the right virtual environment and run `python test.py` to run the get a prediction. You will get this output: `{'duration': 26.43883355119793}`.

So far we have packaged the model in a docker file that can run in every docker compatible compute to serve. However the model we used was directly loaded from the local path where it was stored and we had learnt in previous sessions that the candidate models were stored in model registry that we were supposed to use. Hence, in the next section we will learn how to fetch the model from model registry to serve.


## Module 4.3  Web-services: Getting the models from the model registry (MLflow)

`Reference: https://github.com/BPrasad123/MLOps_Zoomcamp/tree/main/Week4`

This time we are going to run a fresh experiment to train a new model (Random Forest) on the same dataset and register the model in MLflow Model Registry.

We can do this by providing a RUN ID manually (like we are doing in this case) or by using a s3 link as the tracking uri (which is not shown in this segment but you can refer to Alexey's video for it.)

#### Python envrionment:

Go to the correct directory:

```
cd "mlops-zoomcamp/04-deployment/web-service-mlflow" (go to the web-service-mlflow folder)
```

and activate any virtual environment (let's say mlops-zoomcamp-venv) by doing `source /opt/homebrew/anaconda3/bin/activate ml
ops-zoomcamp-venv`). Then once you are in an active environment (in our case `mlops-zoomcamp-venv`), use pipenv to create a new virtual environment:

```
pipenv install mlflow
```

The above command will first create a new virtual environment and then install the mlflow package and all the packages in the Pipfile.

Activate the environment using:
```
pipenv shell
```

#### Running the jupyter notebook

Once you are in the `web-service-mlflow` folder, then run `mlflow ui`:

```
(web-service-mlflow) aasth@Aasthas-MacBook-Air web-service-mlflow % mlflow ui                                        
[2024-01-05 21:04:17 -0800] [62488] [INFO] Starting gunicorn 20.1.0
[2024-01-05 21:04:17 -0800] [62488] [INFO] Listening at: http://127.0.0.1:5000 (62488)
```

Click on `http://127.0.0.1:5000` link to open up the MLFlow tracking page and then run the random-forest.ipynb jupyter notebook.

<img src="notes-images/webservice-mlflow-rf-jupyter.png" width="700"/>

Remember to update the RUN_ID in the jupyter notebook based on the run_id of the mlflow experiment that you want to finalize. For example, in the below screenshot you can see the run id is 229c53c2f7b349eca20d7c482efb4bc8 so that's what we entered in the jupyter notebook:

<img src="notes-images/webservice-mlflow-rf-ui.png" width="700"/>

There are multiple ways to use the logged model. If we are using runs/RUN_ID/model (the method we used in the screenshot) then we run with risk of availability cause the tracking server can go down. However, if fetch the artifact directly from S3 then we are not dependent on the artifact server. Please check the predict.py script to see the commented line on how you can use S3.

In another terminal run test.py to see if we are getting the predicted result.

*Note: Module 4.4 was optional so I skipped it for now.*

## Module 4.5   Batch: Preparing a scoring script

Typical approach for deploying a model in batch model:

- Create a notebook/training script to train a model and save it
- Create a notebook to load the trained model and make prediction on the new data
- Convert the notebook to an inference script
- Clean and parameterize the script
- Schedule the inference script if required

#### Python envrionment:

Go to the correct directory:

```
cd "mlops-zoomcamp/04-deployment/batch" (go to the batch folder)
```

and activate any virtual environment (let's say mlops-zoomcamp-venv) by doing `source /opt/homebrew/anaconda3/bin/activate ml
ops-zoomcamp-venv`). Then once you are in an active environment (in our case `mlops-zoomcamp-venv`), use pipenv to create a new virtual environment:

```
pipenv install mlflow
```

The above command will first create a new virtual environment and then install the mlflow package and all the packages in the Pipfile.

Activate the environment using:
```
pipenv shell
```

#### Running the jupyter notebook

Once you are in the `batch` folder, then run `mlflow ui`:

```
(batch) aasth@Aasthas-MacBook-Air batch % mlflow ui
[2024-01-07 17:06:57 -0800] [67271] [INFO] Starting gunicorn 20.1.0
[2024-01-07 17:06:57 -0800] [67271] [INFO] Listening at: http://127.0.0.1:5000 (67271)
```

Click on `http://127.0.0.1:5000` link to open up the MLFlow tracking page and then run the random-forest.ipynb jupyter notebook.

<img src="notes-images/webservice-mlflow-rf-jupyter.png" width="700"/>

You'll notice a mlruns folder is created when you run the jupyter notebook. The mlruns folder has a default folder called 0 and a new folder that consists of information about your current experiment run.

### Score notebook and script

Now, open the score.ipynb. In this notebook, we will load our model from mlflow (which is why it is important to run the random-forest.ipynb first) and then make a prediction on some data.

The score.py is just a python script version of the score.ipynb notebook. Open a new terminal (while having `mlflow ui` run in another terminal) and check if the score.py file works fine.

```
(batch) aasth@Aasthas-MacBook-Air batch % python score.py green 2022 3
```

You should see a `2022-02.parquet` file in the output/green folder.

In the course, the model is loaded via a S3 link but I am loading the pickled version of the model (as I haven't used S3) so that it works when I deploy it using prefect.

### Prefect deployment for score.py 

*Note: If you face issues with the package dependencies, maybe try removing the virtual environment completely using `pipenv --rm` and then (while you are in another virtual env like `mlops-zoomcamp-venv`) then run `pipenv install`.*

1. Open the Prefect UI using `prefect server start` to spin up a localhost instance. When you start the server, you'll see a message saying:
```
Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api
```

Open up another terminal with the `batch` environment activated and run the command `prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api`

2. Run `prefect project init` to initialize the prefect project in the batch directory and to create the necessary prefect files like:
  - .prefectignore
  - deployment.yaml: usefule for templating if you're making multiple deployments from the same project
  - prefect.yaml
  - .prefect/: this is a hidden folder


3. Run `prefect deploy "/Users/aasth/Desktop/Data analytics/MLOps/datatalks-zoomcamp/mlops-zoomcamp/04-deployment/batch"/score.py:ride_duration_prediction -n my-first-deployment -p local-workpool`. You might need to create a worker called local-workpool first. In that case, run command (4) then terminate it and run command (3).

4. Then you can run `prefect worker start -p local-workpool -t process` and it will create the work pool called local-workpool if it didn't already exist.

5. Then run the `my-first-deployment` deployment and it should be successful.

### Prefect deploymnet for score_backfill.py:

Similar till step (2) of "Prefect deployment for score.py" except the following steps will be: 

3. Run `prefect deploy "/Users/aasth/Desktop/Data analytics/MLOps/datatalks-zoomcamp/mlops-zoomcamp/04-deployment/batch"/score_backfill.py:ride_duration_prediction_backfill -n backfill-deployment -p local-workpool`

4. Then you can run `prefect worker start -p local-workpool -t process` and it will create the work pool called local-workpool if it didn't already exist.

5. Then run the `backfill-deployment` deployment and it should be successful.
