# Machine Learning with Python - Containerizing a Model
Moving to production

### docker, flask, and sklearn
Provides lots of tools to help!

![](app/docker-flask.jpg)

* docker packages everything up as a **microservice**
* flask is a simple python **webserver** so we can incorporte our python objects easily
* scikit-learn models can be saved or **persisted**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier                              
from sklearn.metrics import accuracy_score

In [2]:
#read in our titanic data
df_og = pd.read_csv('data/train.csv') 

#split the data set into train and test sets remove any non-numeric columns for the example
X, y = df_og.drop(columns=['PassengerId','Name','Ticket','Cabin','Embarked','Survived']), df_og['Survived']
X = X.replace({'male': 0, 'female': 1}).fillna(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

y_train = y_train.astype(int)
y_test = y_test.astype(int)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

print('size of X_train') 
print(X_train.shape)
print('size of X_test')
print(X_test.shape)
print('size of y_train') 
print(y_train.shape)
print('size of y_test')
print(y_test.shape)

size of X_train
(596, 6)
size of X_test
(295, 6)
size of y_train
(596,)
size of y_test
(295,)


In [3]:
rf = RandomForestClassifier(n_estimators=500)

rf.fit(X_train, y_train)

y_pred = rf.predict(X_train)

print('train acc:', accuracy_score(y_train, y_pred))

y_pred = rf.predict(X_test)

print('testa acc:', accuracy_score(y_test, y_pred))

train acc: 0.9798657718120806
testa acc: 0.8101694915254237


## Persist the Model

In [4]:
from joblib import dump, load
dump(rf, 'app/model.pkl') #pickel

['app/model.pkl']

In [5]:
X_train[0]

array([-1.62580285, -0.72677722,  1.73407952, -0.46983664, -0.46399264,
        0.38784185])

Need to do a few things now. Following some guidance from this old [blog post](https://towardsdatascience.com/a-flask-api-for-serving-scikit-learn-models-c8bcdaa41daa) and using chatGPT.

* create a flask app
* import this model
* create and endpoint or **route** as an API (application progamming interface) to pass the data
* package everything up into a docker container using **`docker build`** and testing with **`docker run`**

There are some important considerations for stability of this program that we haven't discussed in detail but are important:

* inputs to the API need to be clean and formated
* The proccess of preparing data is called a **data pipeline**
* API should have a **contract** with the rest of the microservices

### Testing our API
we can use **curl** to test the API locally.

`curl -X POST 127.0.0.1:8080/predict -H 'Content-Type: application/json' -d '[{"f1":0.80576177,"f2":1.37593746,"f3":-0.09609774,"f4":-0.46983664,"f5":-0.46399264,"f6":-0.41596074}]'`