# Model Deployment

- Choosing a Model
    - Explore multiple models and compare performance metrics.
    - consider tradeoffs between model interpretability and performance.
       - are coeffs for features available etc. 
    - check online for "choosing the right estimator" for a guided map on which algorithms to consider.
    
- Purpose of Deployment
    -  Deployment consideration vary widely depending on the scale and usage of the model:
        - small portfolio project?
            - could set up blog post instead of full deployment 
            - could set up simple API flask-based website, on a free tier service like HEROKU
        - industry level deployment?
            - considerations need to be made across multiple stakeholders (not job of Data Scientist to decide).
            
- Performance expectations
    - Set clear expectations on model performance based on cross-validation (final hold-out set).
    - DO NOT set expectations based on the fully trained model as will not be respresentative of the true performance on unseen data.
    
- Retraining intervals:
    - Dependent upon situation, influx of new data warranting retraining etc. 
    - is performance still good?

## Model Persistance

Quick review of lifecycle of creating, training, saving and loading a ML model with sklearn

In [1]:
import pandas as pd

In [6]:
df =pd.read_csv('DATA/Advertising.csv')
# expenditure on media and resulting sales.

In [7]:
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   radio      200 non-null    float64
 2   newspaper  200 non-null    float64
 3   sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


In [10]:
df.describe()

Unnamed: 0,TV,radio,newspaper,sales
count,200.0,200.0,200.0,200.0
mean,147.0425,23.264,30.554,14.0225
std,85.854236,14.846809,21.778621,5.217457
min,0.7,0.0,0.3,1.6
25%,74.375,9.975,12.75,10.375
50%,149.75,22.9,25.75,12.9
75%,218.825,36.525,45.1,17.4
max,296.4,49.6,114.0,27.0


In [12]:
X = df.drop('sales',axis=1)

In [13]:
y = df['sales']

In [16]:
# TRAIN / VALIDATION / HOLD OUT - SETs
# 70%  /  15%       /  15%

#so first split is 70/30
#second is 50/50 of the 30.

In [17]:
from sklearn.model_selection import train_test_split

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

In [20]:
X_validation, X_holdout_test, y_validation, y_holdout_test = train_test_split(X_test, y_test, test_size=0.5, random_state=101)

In [21]:
len(X)

200

In [22]:
len(X_train)

140

In [23]:
len(X_validation)

30

In [24]:
len(X_holdout_test)

30

### Model Training 

In [27]:
from sklearn.ensemble import RandomForestRegressor

In [40]:
model = RandomForestRegressor(n_estimators=30,random_state=101)

In [41]:
model.fit(X_train,y_train)

RandomForestRegressor(n_estimators=30, random_state=101)

In [42]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [43]:
validation_predictions = model.predict(X_validation)

In [44]:
validation_predictions

array([14.40333333,  5.47333333,  4.14      , 15.72666667, 11.66666667,
        9.93      , 10.83333333, 11.48      , 18.02      ,  7.60333333,
       10.9       , 21.44333333, 14.08333333,  7.53333333, 11.81333333,
        6.83      , 13.51      , 13.62      , 11.01333333,  7.99666667,
       12.53333333, 21.63      , 19.49      , 15.73      , 16.05666667,
       24.21666667, 20.17666667,  9.50666667, 14.50333333, 19.36333333])

In [45]:
mean_absolute_error(y_validation,validation_predictions)

0.6575555555555552

In [46]:
import numpy as np
np.sqrt(mean_squared_error(y_validation,validation_predictions)) #RMSE

0.8542009478215644

In [38]:
df.describe()['sales']

count    200.000000
mean      14.022500
std        5.217457
min        1.600000
25%       10.375000
50%       12.900000
75%       17.400000
max       27.000000
Name: sales, dtype: float64

#Hyperparameter tuning here to achieve better metrics.n=3 (MSE 0.85 RMSE 1.10) to n =30 (MSE 0.65, 0.85)
BETTER RESULTS AND PERFORMANCE WITH MORE ESTIMATORS

TUNE ALL PARAMETERS FOR BEST MODEL.

FOR PURPOSE OF MODEL DEPLOYMENT LETS SAY N30 IS OUT CHOSEN HYPER PARAM


## Final performance metrics (Holdout set).

- how does it perform on a dataset its never seen. (lets us know how it will perform on unseen data before deployment)

In [47]:
holdout_predictions = model.predict(X_holdout_test)

In [48]:
mean_absolute_error(y_holdout_test,holdout_predictions)

0.5937777777777775

In [49]:
np.sqrt(mean_squared_error(y_holdout_test,holdout_predictions))

0.745323693040418

In [51]:
final_model = RandomForestRegressor(n_estimators=30,random_state=101)

In [52]:
final_model.fit(X,y) #fit model to entire dataset. should scale data first if required.

RandomForestRegressor(n_estimators=30, random_state=101)

In [50]:
import joblib

In [53]:
joblib.dump(final_model, 'final_model.pkl')

['final_model.pkl']

In [54]:
#save column feature names as list
list(X.columns)

['TV', 'radio', 'newspaper']

In [55]:
joblib.dump(list(X.columns),'col_names.pkl')

['col_names.pkl']

## LOADING MODEL

In [56]:
new_columns = joblib.load('col_names.pkl')

In [57]:
new_columns

['TV', 'radio', 'newspaper']

In [58]:
loaded_model = joblib.load('final_model.pkl')

In [59]:
loaded_model.predict([[230.1,37.8,69.2]])



array([21.99])

# Model Deployment (API)

Deployment of model as API (using flask)

Application programming interface: serves as an interface for GET and POST requests.

goal here is to let sklearn model be 'served' as an API which can get and recieve information.

wrap our machine learning model around an API routing call created through Flask.

- wrap a prediciton function - 

- for sake of course, posting it locally.

using JSON data file ----> Post request ----> TO FLASK API which predicts using our model ----> GET request to obtain JSON with predicted values (using postman to do this locally)


#### Model API Deployment Steps:
    - install Flask
    - Create simple Flask App for API
    - Connect ML model to Flask APIi
    - install Postman
    - Test API through Postman

In [64]:
#pip install Flask (FLASK FIGHTS WITH JUPYTER. MUST RUN API CODE IN .py in CHOSEN EDITOR)