# 5. Deploying Machine Learning Models

We`ll use the same model we trained and evaluated previously - the churn prediction model.  Now we'll deploy it as a web service.

## 5.1 Introduction

- What we will cover this week


<img src="../images/overview_week5.png" alt="overview" style="width:500px;height:auto;">

- saving and loading models
- serving the model with Flask

<img src="../images/docker_on_AWS_EB.png" alt="docker on AWS EB" style="width:500px;height:auto;">

- environment and dependency management with pipenv
- environment docker


## 5.2 Saving and loading the model

- Saving the model to pickle
- Loading the model from pickle
- Turning our notebook into a Python script

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [None]:
import sklearn

sklearn.__version__

'1.5.2'

In [3]:
df = pd.read_csv("../data/WA_Fn-UseC_-Telco-Customer-Churn.csv")

df.columns = df.columns.str.lower().str.replace(" ", "_")

categorical_columns = list(df.dtypes[df.dtypes == "object"].index)

for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(" ", "_")

df.totalcharges = pd.to_numeric(df.totalcharges, errors="coerce")
df.totalcharges = df.totalcharges.fillna(0)

df.churn = (df.churn == "yes").astype(int)

In [4]:
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=1)

In [5]:
numerical = ["tenure", "monthlycharges", "totalcharges"]

categorical = [
    "gender",
    "seniorcitizen",
    "partner",
    "dependents",
    "phoneservice",
    "multiplelines",
    "internetservice",
    "onlinesecurity",
    "onlinebackup",
    "deviceprotection",
    "techsupport",
    "streamingtv",
    "streamingmovies",
    "contract",
    "paperlessbilling",
    "paymentmethod",
]

In [6]:
def train(df_train, y_train, C=1.0):
    dicts = df_train[categorical + numerical].to_dict(orient="records")

    dv = DictVectorizer(sparse=False)
    X_train = dv.fit_transform(dicts)

    model = LogisticRegression(max_iter=10_000, C=C)
    model.fit(X_train, y_train)

    return dv, model

In [7]:
def predict(df, dv, model):
    dicts = df[categorical + numerical].to_dict(orient="records")

    X = dv.transform(dicts)
    y_pred = model.predict_proba(X)[:, 1]

    return y_pred

In [8]:
C = 1.0
n_splits = 5

In [9]:
kfold = KFold(n_splits=n_splits, shuffle=True, random_state=1)

scores = []

for train_idx, val_idx in kfold.split(df_full_train):
    df_train = df_full_train.iloc[train_idx]
    df_val = df_full_train.iloc[val_idx]

    y_train = df_train.churn.values
    y_val = df_val.churn.values

    dv, model = train(df_train, y_train, C=C)
    y_pred = predict(df_val, dv, model)

    auc = roc_auc_score(y_val, y_pred)
    scores.append(auc)

print("C=%s %.3f +- %.3f" % (C, np.mean(scores), np.std(scores)))

C=1.0 0.842 +- 0.007


In [10]:
scores

[0.8444081607020903,
 0.8449522414249768,
 0.8335741521171984,
 0.8347609036260152,
 0.851769633836403]

In [11]:
dv, model = train(df_full_train, df_full_train.churn.values, C=1.0)
y_pred = predict(df_test, dv, model)

y_test = df_test.churn.values
auc = roc_auc_score(y_test, y_pred)
auc

0.8584492508693815

### Save the model

Load the pickel library and save the model to a file.

In [12]:
import pickle

Set the path to the file where the model will be saved.

In [13]:
output_file = f"../models/model_C={C}.bin"
output_file

'../models/model_C=1.0.bin'

Method 1: simple save

In [14]:
f_out = open(output_file, "wb")
pickle.dump((dv, model), f_out)
f_out.close()

Method 2: save with open

In [15]:
# with open(output_file, "wb") as f_out:
#     pickle.dump((dv, model), f_out)

### Load the model

Import the pickle library and load the model from the file.

In [16]:
import pickle

In [17]:
model_file = "../models/model_C=1.0.bin"

In [18]:
with open(model_file, "rb") as f_in:
    (dv, model) = pickle.load(f_in)

Check if the model is loaded correctly.

In [19]:
dv, model

(DictVectorizer(sparse=False), LogisticRegression(max_iter=10000))

Define a customer and predict the churn.

In [20]:
customer = {
    'gender': 'female',
    'seniorcitizen': 0,
    'partner': 'yes',
    'dependents': 'no',
    'phoneservice': 'no',
    'multiplelines': 'no_phone_service',
    'internetservice': 'dsl',
    'onlinesecurity': 'no',
    'onlinebackup': 'yes',
    'deviceprotection': 'no',
    'techsupport': 'no',
    'streamingtv': 'no',
    'streamingmovies': 'no',
    'contract': 'month-to-month',
    'paperlessbilling': 'yes',
    'paymentmethod': 'electronic_check',
    'tenure': 12,
    'monthlycharges': 2.85,
    'totalcharges': 29.85
}

In [21]:
X = dv.transform([customer])

In [22]:
model.predict_proba(X)[0, 1]

0.5394495644561959

### Making requests (create a separate notebook for that)

- [04_predict_test.ipynb](04_predict_test.ipynb)
- Export file to [Python script](../predict.py)

## 5.3 Web services: introduction to Flask

- Writing a simple ping/pong app -> `ping.py`
- strat it with `python ping.py`
- Querying it with `curl` and browser

```bash
curl http://localhost:9696/ping
```

or

```bash
curl http://0.0.0.0:9696/ping
```

or open in browser [http://localhost:9696/ping](http://localhost:9696/ping)


## 5.4 Serving the churn model with Flask

- Wrappping the `predict.py` script into a Flask app
- Querying it with `requests`
- Preparing for production: gunicorn
- Running it on Windows? with waitress


Start the server with:
```bash
gunicorn --bind 0.0.0.0:9696 predict:app
```

Test the server with:
```bash
python predict_test.py
```

## 5.5 Dependency and environment management: Pipenv

- Why we need virtual environment

<img src="../images/python_envs.png" alt="Python envs" style="width:500px;height:auto;">
<img src="../images/python_envs2.png" alt="Python envs 2" style="width:500px;height:auto;">
<img src="../images/python_env_manager.png" alt="Python envs 2" style="width:500px;height:auto;">

- Installing Pipenv
- Installing libraries with Pipenv
- Running things with Pipenv

BUG: could not create virtual environment with python 3.8, so I created an [environment](../environment.yml) with python 3.11.3 and scikit-learn=1.5.2. `Pipfile.lock` could not be created.

### Installing Pipenv

```bash
pip install pipenv
```

### Activating the virtual environment

```bash
pipenv shell
```

### Deactivating the virtual environment

```bash
exit
```

### Removing the virtual environment

```bash
pipenv --rm
```


### Installing environment from Pipfile

```bash
pipenv install
```

### Running commands with Pipenv

```bash
pipenv run gunicorn --bind 0.0.0.0:9696 predict:app
```

## 5.6 Environment management: Docker

- Why we need Docker

<img src="../images/docker.png" alt="Docker" style="width:500px;height:auto;">
<img src="../images/docker_to_cloud.png" alt="Docker to cloud" style="width:500px;height:auto;">

- Running a Python image with docker
- Dockerfile
- Building a docker image
- Running a docker image

### Running a Python image with docker

```bash
docker run -it --rm --entrypoint=bash python:3.8.12-slim
```

### [Dockerfile](../Dockerfile)

```Dockerfile
FROM python:3.8.12-slim                                 # define base image

WORKDIR /app # define working directory
COPY ["Pipfile", "Pipfile.lock", "./"]                  # copy files for python environment to working directory

RUN pipenv install --system --deploy                    # install dependencies from Pipfile to system

COPY ["predict.py", "./models/model_C=1.0.bin", "./"]   # copy flask-app and model-files to working directory

EXPOSE 9696                                             # expose port

ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "predict:app" ] # define entrypoint, start gunicorn and run app
```

<img src="../images/docker_expose_port.png" alt="docker expose port" style="width:500px;height:auto;">

### Building a docker image

```bash
docker build -t churn-predictor .
```

### Running a docker image

```bash
docker run -it --rm -p 9696:9696 churn-predictor
```

## 5.7 Deployment to the cloud: AWS Elastic Beanstalk (optional)

- Installing the eb cli
- Running eb locally
- Deploying the model

<img src="../images/AWS_beanstalk.png" alt="AWS beanstalk" style="width:500px;height:auto;">

### Installing the eb cli

On your AWS EC2 instance in your conda environment run:

```bash
pipenv install awsebcli --dev
```

### Activate the virtual environment

```bash
pipenv shell
```

### Initialize the Elastic Beanstalk application

```bash
eb init -p docker -r eu-central-1 churn-serving
```

### Running eb locally

```bash
eb local run --port 9696
```

### Deploying the model

```bash
eb create churn-serving-env
```

change `predict_test.py` to use the new URL, given by the Elastic Beanstalk

```python
host = 'churn-serving-env.eba-xxxxxx.eu-west-1.elasticbeanstalk.com'
url = f'http://{host}/predict'
```

### Cleaning up

```bash
eb terminate churn-serving-env
```


>[!Warning]
>
>Be careful, this is open to the whole world

## 5.8 Summary

- Save model with pickle
- Use Flask to turn the model into a web service
- Use a dependency & env manager
- Package it with Docker
- Deploy to the cloud

## 5.9 Explore more

- Flask is not the only framework for creating web services. Try others, e.g. FastAPI
- Experiment with other ways of managing environment, e.g. virtual env, conda, poetry.
- Expolore other ways of deploying web services, e.g. GCP, Azure, Heroku, Python Anywhere, etc