# Deployment

We take the model from Session_4:

In [18]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [19]:
df = pd.read_csv('churn_data.csv')

df.columns = df.columns.str.lower().str.replace(' ', '_')

categorical_columns = list(df.dtypes[df.dtypes == 'object'].index)

for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(' ', '_')

df.totalcharges = pd.to_numeric(df.totalcharges, errors='coerce')
df.totalcharges = df.totalcharges.fillna(0)

df.churn = (df.churn == 'yes').astype(int)


df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=1)
y_test = df_test['churn'].values

numerical = ['tenure', 'monthlycharges', 'totalcharges']

categorical = [
    'gender',
    'seniorcitizen',
    'partner',
    'dependents',
    'phoneservice',
    'multiplelines',
    'internetservice',
    'onlinesecurity',
    'onlinebackup',
    'deviceprotection',
    'techsupport',
    'streamingtv',
    'streamingmovies',
    'contract',
    'paperlessbilling',
    'paymentmethod',
]


def train(df_train, y_train, C=1.0):
    dicts = df_train[categorical + numerical].to_dict(orient='records')

    dv = DictVectorizer(sparse=False)
    X_train = dv.fit_transform(dicts)

    model = LogisticRegression(C=C, max_iter=1000)
    model.fit(X_train, y_train)
    
    return dv, model

def predict(df, dv, model):
    dicts = df[categorical + numerical].to_dict(orient='records')

    X = dv.transform(dicts)
    y_pred = model.predict_proba(X)[:, 1]

    return y_pred

C = 1.0
n_splits = 5

kfold = KFold(n_splits=n_splits, shuffle=True, random_state=1)

scores = []

for train_idx, val_idx in kfold.split(df_full_train):
    df_train = df_full_train.iloc[train_idx]
    df_val = df_full_train.iloc[val_idx]

    y_train = df_train.churn.values
    y_val = df_val.churn.values

    dv, model = train(df_train, y_train, C=C)
    y_pred = predict(df_val, dv, model)

    auc = roc_auc_score(y_val, y_pred)
    scores.append(auc)

print('C=%s %.3f +- %.3f' % (C, np.mean(scores), np.std(scores)))

dv, model = train(df_full_train, df_full_train.churn.values, C=1.0)
y_pred = predict(df_test, dv, model)

auc = roc_auc_score(y_test, y_pred)
print(f'auc_score = {auc}')

C=1.0 0.840 +- 0.008
auc_score = 0.8572386167896259


### Save model

![](./pic/1.png)

- we have model in notebook
- we want to save it into file
- then we want to load it into web-service
- then we want use model on web-service which will interact with other services etc

![](./pic/2.png)

- we have model
- we put it on web service (**FLASK** (framework for web-services creation in Python) for example)
- we want to isolate dependencies on that service that they do not interfere with other services on our machine (environment for Python dependencies (**Pipenv**))
- isolate system dependencies (**Docker**)
- deploy this on cloud (AWS - Elastic Beanstalk for example)

# Save Model

In [20]:
# we will use pickle to save model as python object:
import pickle
output_file = f'model_C={C}.bin'
output_file

'model_C=1.0.bin'

In [21]:
# we want to write (w) into binary (b) file
f_out = open('output_file', 'wb')
pickle.dump(model, f_out)
f_out.close()

but now we saved only Model, however in Predict function, we also had DictVectorizer required, otherwise we would not be able to properly transform data from users

In [22]:
# that is why we will write Tuple (dv, model):
f_out = open(output_file, 'wb')
pickle.dump((dv, model), f_out) # dv added
f_out.close()

In [23]:
# it is easy to forget to close the file, so it is better to use with:
with open(output_file, 'wb') as f_out:
    pickle.dump((dv, model), f_out)

# Load model

In [24]:
import pickle

In [25]:
input_file = 'model_C=1.0.bin'

In [26]:
with open(input_file, 'rb') as f_in:
    dv, model = pickle.load(f_in)

In [27]:
model

In [28]:
#new customer:
customer = {
    'gender': 'female',
    'seniorcitizen': 0,
    'partner': 'yes',
    'dependents': 'no',
    'phoneservice': 'no',
    'multiplelines': 'no_phone_service',
    'internetservice': 'dsl',
    'onlinesecurity': 'no',
    'onlinebackup': 'yes',
    'deviceprotection': 'no',
    'techsupport': 'no',
    'streamingtv': 'no',
    'streamingmovies': 'no',
    'contract': 'month-to-month',
    'paperlessbilling': 'yes',
    'paymentmethod': 'electronic_check',
    'tenure': 1,
    'monthlycharges': 29.85,
    'totalcharges': 29.85
}

In [29]:
X = dv.transform([customer])

model.predict_proba(X)[0, 1]

0.6363584152721566

So we can put these pipeline into .py scripts (train.py and predict.py)  
and use them later

# Web services

we want to put our model in some kind of Churn Service (Web-service)

- Web-service - a way of communicating between 2 devices via Network

We want to use our predict script as Web-service - create API for our model

![](./pic/3.png)

When our Service is running, we may try to pretend that we here is the Marketing Service sending Request to our Churn Service:

when we try to access our predict Sevice from browser, we get an Error: "Method Not Allowed", since browser sends the Get request but, the only possible method we coded is POST,

therefore we need to POST using Pyhton library "requests";

In [30]:
# requests
import requests
# url is the one that we used creating Predict Web-service
url = 'http://localhost:9696/predict'
# customer in json format:
customer = {
    "gender": "female",
    "seniorcitizen": 0,
    "partner": "yes",
    "dependents": "no",
    "phoneservice": "no",
    "multiplelines": "no_phone_service",
    "internetservice": "dsl",
    "onlinesecurity": "no",
    "onlinebackup": "yes",
    "deviceprotection": "no",
    "techsupport": "no",
    "streamingtv": "no",
    "streamingmovies": "no",
    "contract": "month-to-month",
    "paperlessbilling": "yes",
    "paymentmethod": "electronic_check",
    "tenure": 1,
    "monthlycharges": 29.85,
    "totalcharges": 29.85
}

# making POST request to the Predict Web Service:
requests.post(url, json=customer)

<Response [200]>

We have got Error 500, in Terminal we can see:  
"TypeError: Object of type bool_ is not JSON serializable"

"bool_" is something that comes from numpy (response) from model, so we can convert bool response of our model into bool datatype explicitly

In [31]:
requests.post(url, json=customer)

<Response [200]>

now the response is 200, which means OK

In [32]:
# get response content:
response = requests.post(url, json=customer).json()
response

{'churn': True, 'churn_probability': 0.6363584152721566}

In [33]:
# Marketing service app:
if response['churn'] == True:
    print(f'sending promo email to customer: 999')

sending promo email to customer: 999


WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

the reason for this warning is that we using plain (simple) Flask,  
to eliminate this, we need to use WSGI server instead
- WSGI - **Web Server Gateway Interface**. It is a specification that describes how a web server communicates with web applications, and how web applications can be chained together to process one request.



we will use **gunicorn** (pip isnstall gunicorn)

then we need to tell gunicorn where our Flask app is:  
- Linux: *gunicorn --bind 0.0.0.0:9696 predict:app*    

- Windows:
    - (The module fctnl is not available on Windows systems), so we can either install ubuntu terminal or use "Waiters":  
    - *pip install waitress* 

  
  

waitress-serve --listen=0.0.0.0:9696 predict:app

In [35]:
# get response content:
response = requests.post(url, json=customer).json()
response

{'churn': True, 'churn_probability': 0.6363584152721566}

# Python virtual environment: Pipenv

if one project uses sklearn of one version and another project uses different version, they need to be able to leave on the same computer

## Intro

When we want to install some library (for example sklearn) we type: *pip install sklearn*

in that type computer looks at the $PATH direction to find what **pip** we are talking about, there then it is able to find path to the pip file, which then is running and requesting pypi.org for the latest or some specific version of lybrary

now for example, let us say that we need sklearn 0.24.2 for our churn prediction project, however some other project (for example Lead Scoring Service) requires other version of sklearn 
- Lead Scoring - the evaluation that potential client will become our client in future  


therefore, if we had these 2 projects on the same computer: we may have situation when some changes in later version of library influences the work of 1st project  
so we need to find a way to keep them separated

the solution is to use different virtual environments

![](./pic/4.png)

so these projects use different pythons, with their own packages

### Virtual environments

there are multiple ways of managing virtual environments:
- classical python build in /venv
- conda - what anaconda uses
- pipenv - the one we will use but there are no big difference, the reason we will use it is officially recommended package management from python community
- poetry

We will use *pipenv* package manager, however they all are pretty similar
- *pip install pipenv*
- then if we want to install some package we would need to use pipenv:  
    - *pipenv install pandas*\
- specify exact versions of required packages:
    - *pipenv install numpy scikit-learn==0.24.2 flask*
- after that **pipfile** is created with packages specified inside
- so when our colleagues will want to work on this project, they can simply use *pipenv install* and all required packages will be installed
- also pipfile.lock will be created which "locks" the packages exact versions to make sure they will be the same later as well, for all the dependencies
- pipenv install will install that exact versions


#### Run our service:
- get into this environment (dependencies from this specific project)
    - *pipenv shell** - launching subshell in virt env
    - it shows which folder is used for storing this virt env
    - we can see by *which gunicorn* that path to the unicorn library is different and much longer, since, virtual environment adds path before origional one to isolate our project
    - therefore we can use gunicorn from this virtual env as ususal *gunicorn --bind 0.0.....*
    - and then we can communicate with our web-service as usual
    - to exit virt env: *CTRL+D* and then we can see that gunicorn is usual

- to enter again **pipenv shell**
    - gunicorn --bind ...

! However, if project needs some specific version of System Library (not python dependency) virtual environment can not solve this issue  
- like apt-get install "..."
- for that we will need one more level of isolation (Docker)

# Environment Management: Docker  
- why we need Docker
- running a Python image with docker
- dockerfile
- building a docker image
- running a docker image

Docker makes one step further from virtual environments, and lets us to isolate entire application from the rest system

considering 2 different services, we can put them into isolated container (kind of special box on computer that pretends as it is the only one thing that is running)

![](./pic/5.png)

### Docker image with python only

- go to docker hub for python docker container
- we can choose from different versions: lets take *3.10.7-slim*
- *docker run -it --rm python:3.10.7-slim* (--it - open terminal after, --rm -remove after exit, we do not want to keep it on machine) 
- sice we used -it we then get into python terminal, which runs in separate container and we can interact with it

*docker run -it --rm -e  python:3.10.7-slim* (-e - entry point i.e. default command when we use docker run),  
- since we used python image, we get into python terminal, but now:  
    - *docker run -it --rm -entrypoint=bash python:3.10.7-slim* and we are in Linux bash terminal and can use apt-get install wget for example
    - and now what we do here does not affect host outside system
        - for example we can create folder mkdir and it will not affect outside system
        - we can use pip to install libraries etc

**Everything we do inside Docker we can state in Docker file**

- FORM python:3.8.12-slim (docker image)

- RUN pip install pipenv (since all depedencies)
- WORKDIR /app (creates new folder and goes there)
- COPY ["Pipfile", "Pipfile.lock", "./"] (copy pipfile to the workdirectory)
- RUN pipenv install --system (create virtual environment), however we do not need virtual environment inside docker, since it is already isolated, therefore use *--system -deploy*

- *docker build -t zoomcamp-test .* 
- then docker runs al commands,
- when it is finished building, we can run this image:  
    - *docker run -it --rm --entrypoint=bash zoomcamp-test*
- then we also need to copy model file (COPY ["predict.py", "model_C=1.0.bin", "./"])

after building an image and entering it, we can run gunicorn --bind ...   
however, we can not yet listen to that web-service, since it is inside container  
- we need to expose port of Docker container to allow Test.py to access this port (PORT MAPPING)  
![](./pic/6.png)

- we can do EXPOSE inside docker container: *EXPOSE 9696*
- then we want to run our model web service: *ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "predict.py"]*

- After EXPOSE we also need to do MAPPING:
    - docker run -it --rm -p 9696:9696 zoomcamp-test (-p means port and we say that 9696 is mapped with 9696)

      
*docker run -it --rm -p 9696:9696 zoomcamp-test*
docker run -it --rm -p 9696:9696 card_pred

After that we can successfully interact with web-service which runs inside docker container

! Now, when we have our web-service inside Container, it now no longer required to be hosted on our machine, it can now live in **Cloud**

# Deployment to the cloud: AWS Elastic Beanstalk

1. Install the eb cli
2. Running eb locally
3. Deploy the model

- So there is cloud, which is running our docker container
- We have marketing service, which communicates with churn service on cloud
- There could be many different services which request data from Churn service, and Cloud can automatically understand that service in Scaling UP (duplicating more services)
- If traffic is low it then can scale down automatically  
![](./pic/7.png)

- We need to install AWS ElasticBeans Stock only for current project (churn prediction service)
    - therefore, we can use pipenv
    - this is Dev dependency (something that we need only for deployment), therefore it is not something we want to have inside the container --dev  

*pipenv install awsebcli --dev*

then we go to virtual environment (pipenv shell)  
- and there we can use *eb*

- eb init -p docker -r eu-west-1 churn-serving (p - platform, name-project)
- then the Project was created /elstic... folder with config.yaml with information inside
- eb local run --port 9696 -  now before we deploy we can test it locally
- eb create churn-serving-env

After that we need to change url from localhost to the Host specified by AWS  
But it is also important to mention that this elastic bean service is open to the world, so we should be careful with it
- probably we want to make sure that only computers from specific network only are able to access our service
- for pet project it is ok, but we should not forget to turn it down
