# Colab Hosting Example v2
Sample Jupyter notebook demonstrating how to host a ML model for demos. Tested on Colab but reusable anywhere Jupyter notebooks can be ran.

Updated January 2022:
- uses the [California housing](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html) dataset instead of the Boston housing dataset due to Boston housing's [deprecation](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html)

## Install dependencies

In [None]:
%pip install numpy scikit-learn flask flask-cors flask-ngrok psutil

## Load and split a dataset

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(*fetch_california_housing(return_X_y=True), test_size=0.33, random_state=8)

## Train some ML models

In [None]:
# Simple linear regression model
from sklearn.linear_model import LinearRegression

simple_est = LinearRegression()
simple_est.fit(X_train, y_train)
simple_est.score(X_test, y_test)

In [None]:
# Boosting ensemble model
from sklearn.ensemble import GradientBoostingRegressor

boosting_est = GradientBoostingRegressor()
boosting_est.fit(X_train, y_train)
boosting_est.score(X_test, y_test)

In [None]:
# Bagging ensemble model
from sklearn.ensemble import ExtraTreesRegressor

bagging_est = ExtraTreesRegressor()
bagging_est.fit(X_train, y_train)
bagging_est.score(X_test, y_test)

## (Optional) Serialize and save ML models to disk

In [None]:
import pickle

for model in (simple_est, boosting_est, bagging_est):
    filename = type(model).__name__ + '.pickle'
    with open(filename, 'wb') as f:
        pickle.dump(model, f)

## (Optional) Load ML models from disk

In [None]:
import pickle

models = []

# Assuming same naming convention used in the previous code block
for filename in ('LinearRegression.pickle', 'GradientBoostingRegressor.pickle', 'ExtraTreesRegressor.pickle'):
    with open(filename, 'rb') as f:
        models.append(pickle.load(f))

simple_est, boosting_est, bagging_est = models

## Delete old `ngrok` processes from previous runs

In [None]:
from psutil import process_iter

for proc in process_iter():
    if proc.name() == 'ngrok.exe':
        proc.kill()

## Start a Flask server and tunnel to ngrok to expose externally

In [None]:
import numpy as np
from flask import Flask, request
from flask_cors import CORS
from flask_ngrok import run_with_ngrok

app = Flask(__name__)
CORS(app)
run_with_ngrok(app)

# Default to mean value if no param is provided by user
DEFAULT_PARAMS = {
    'MedInc': 3.870671,
    'HouseAge': 28.639486,
    'AveRooms': 5.429000,
    'AveBedrms': 1.096675,
    'Population': 1425.476744,
    'AveOccup': 3.070655,
    'Latitude': 35.631861,
    'Longitude': -119.569704
}

# Response JSON if input parameters are invalid
INVALID_PARAM_RESP = {
    'status': 'fail',
    'data': {
        'parameters': 'One or more of the input parameters was invalid. Make sure all parameters are numbers.'
    }
}

def fill_params(initial, fill_with):
    out = {}
    for k in fill_with:
        out[k] = initial.get(k, False) or fill_with[k]
    return out

### Test fill_params
# test_input = {'AveBedrms': 2, 'Population': 2000}
# test_val = fill_params(test_input, DEFAULT_PARAMS)
# print(test_val)

def predict_price(model, X):
    # Models require data in shape [[x1, x2, x3, ...], [x1, x2, x3, ...], ...]
    try:
        X = [np.fromiter(X.values(), dtype=np.float64)]
    except ValueError:
        raise ValueError
    Y = None
    if model == 'simple':
        Y = simple_est.predict(X)
    elif model == 'boosting':
        Y = boosting_est.predict(X)
    else:
        Y = bagging_est.predict(X)
    # Models return data in shape [Y1, Y2, Y3, ...]
    Y = Y[0]
    return Y

### Test predict_price
# test_val = {'MedInc': 3.870671, 'HouseAge': 28.639486, 'AveRooms': 5.429000, 'AveBedrms': 1.096675, 'Population': 1425.476744, 'AveOccup': 3.070655, 'Latitude': 35.631861, 'Longitude': -119.569704}
# test_price = predict_price('boosting', test_val)
# print(test_price)

@app.route('/api/v1/simple-est')
def simple_est_endpoint():
    resp = None
    params = fill_params(request.args.to_dict(), DEFAULT_PARAMS)
    try:
        resp = {
            'status': 'success',
            'data': {
                'model': 'LinearRegression',
                'parameters': params,
                'price': predict_price('simple', params)
            }
        }
    except ValueError:
        return INVALID_PARAM_RESP, 400
    return resp

@app.route('/api/v1/boosting-est')
def boosting_est_endpoint():
    resp = None
    params = fill_params(request.args.to_dict(), DEFAULT_PARAMS)
    try:
        resp = {
            'status': 'success',
            'data': {
                'model': 'GradientBoostingRegressor',
                'parameters': params,
                'price': predict_price('boosting', params)
            }
        }
    except ValueError:
        return INVALID_PARAM_RESP, 400
    return resp

@app.route('/api/v1/bagging-est')
def bagging_est_endpoint():
    resp = None
    params = fill_params(request.args.to_dict(), DEFAULT_PARAMS)
    try:
        resp = {
            'status': 'success',
            'data': {
                'model': 'ExtraTreesRegressor',
                'parameters': params,
                'price': predict_price('bagging', params)
            }
        }
    except ValueError:
        return INVALID_PARAM_RESP, 400
    return resp

app.run()

## Using the endpoints
If the code block above ran successfully, you should see some output that looks like this
```
* Serving Flask app "__main__" (lazy loading)
* Environment: production
 WARNING: This is a development server. Do not use it in a production deployment.
 Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Running on http://xxxxxxxxxxxx.ngrok.io
* Traffic stats available on http://127.0.0.1:4040
```

You can now query the `ngrok` URL like a normal REST API to use the ML models. Examples:

### Use the `bagging` model to predict the price of a Boston house that has 8 rooms:
```
http://xxxxxxxxxxxx.ngrok.io/api/v1/bagging-est?RM=8
```

### Use the `simple` model to predict the price of a Boston house on a tract that's next to the Charles River and has a pupil-teacher ratio of 13:
```
http://xxxxxxxxxxxx.ngrok.io/api/v1/simple-est?CHAS=1&PTRATIO=13
```

## API Reference
See the [scikit-learn documentation](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) for more information about this dataset.

### Parameters (same for all 3 endpoints)
- `MedInc`: median income in block group
- `HouseAge`: median house age in block group
- `AveRooms`: average number of rooms per household
- `AveBedrms`: average number of bedrooms per household
- `Population`: block group population
- `AveOccup`: average number of household members
- `Latitude`: block group latitude
- `Longitude`: block group longitude

If no input is passed for a specific parameter, the mean of the dataset is used.

### Endpoints
- GET /api/v1/simple-est
- GET /api/v1/boosting-est
- GET /api/v1/bagging-est

All endpoints take the same parameters and return responses in the same shape.

### Example Responses

#### GET /api/v1/simple-est?MedInc=5&HouseAge=20
HTTP 200
```json
{
  "data": {
    "model": "LinearRegression",
    "parameters": {
      "AveBedrms": 1.096675,
      "AveOccup": 3.070655,
      "AveRooms": 5.429,
      "HouseAge": "20",
      "Latitude": 35.631861,
      "Longitude": -119.569704,
      "MedInc": "5",
      "Population": 1425.476744
    },
    "price": 2.4686195241736613
  },
  "status": "success"
}
```

#### GET /api/v1/simple-est?AveBedrms=NOTANUMBER
HTTP 400
```json
{
    "data": {
        "parameters": "One or more of the input parameters was invalid. Make sure all parameters are numbers."
    },
    "status":"fail"
}
```