# CPSC 330 Lecture 24

Outline:

- ðŸ‘‹
- **Turn on recording**
- Announcements + survey (5 min)
- Model deployment (30 min)
- Instructor/TA evaluations + Break (15 min)
- Review / conclusion (30 min)

## Learning objectives

- TODO

In [2]:
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split, cross_validate

## Announcements + survey (5 min)

- Last lecture today!
- Learning objectives should now be posted for all lectures.
- Extremely short survey on CPSC 330 vs. 340: https://ubc.ca1.qualtrics.com/jfe/form/SV_2ayfs2EcNUJdYKV
- We will take time later for the formal course evaluations.

## Model deployment (30 min)

#### Attribution

This material on model deployment was created by [Tomas Beuzen](https://www.tomasbeuzen.com/). The original version is [here](https://github.com/TomasBeuzen/machine-learning-tutorials/blob/master/ml-deploy-model/deploy-with-flask.ipynb).

#### What is deployment?

- After we train a model, we want to use it!
- The user likely does not want to install your Python stack, train your model.
- You don't necessarily want to share your dataset.
- So we need to do two things:

1. Save/store your model for later use.
2. Make the saved model conveniently accessible.

We will use `joblib` for (1) and Flask for (2).

#### Requirements (I already did these)

- Heroku account. Register [here](https://www.heroku.com/).
- Heroku CLI. Download [here](https://devcenter.heroku.com/categories/command-line).

More python installations (not in the course environment):

```
pip install Flask
pip install Flask-WTF
pip install joblib
```

#### Preparing the model we wish to deploy

We'll be training a regression model to predict the age of abalone based on the classic abalone dataset hosted [here](https://archive.ics.uci.edu/ml/datasets/abalone). We aim to predict the age of abalone based on four physical measurements. We've renamed `abalone.data` to `abalone.csv` after downloading.

In [3]:
abalone_df = pd.read_csv('data/abalone.csv',
                       names = ['sex', 'length', 'diameter', 'height',
                                'whole_weight', 'shucked_weight', 'viscera_weight',
                                'shell_weight', 'rings'])

For simplicity, only use 4 features:

In [4]:
features = ['length', 'diameter', 'height', 'whole_weight']

X = abalone_df[features]
y = abalone_df['rings']

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123)

Build and score model:

In [6]:
model = RandomForestRegressor(n_estimators=10, random_state=123)
model.fit(X_train, y_train);

In [7]:
model.score(X_train, y_train)

0.8716166306386149

In [8]:
model.score(X_test, y_test)

0.27082133598283975

Re-fit the model on the full dataset to get it ready for deployment. 

In [9]:
model = RandomForestRegressor(n_estimators=10, random_state=123).fit(X, y)

Save trained model using `joblib`. This will be loaded up when we start our "app": 

In [12]:
with open('web_api/abalone_predictor.joblib', 'wb') as f:
    joblib.dump(model, f)
with open('web_application/abalone_predictor.joblib', 'wb') as f:
    joblib.dump(model, f)

Here we will define a function that accepts input data and returns a prediction. We will use this function to develop our web API and web application using Flask.

In [13]:
def return_prediction(model, input_json):
    
    input_data = [[input_json[k] for k in features]]
    prediction = model.predict(input_data)[0]
    
    return prediction

In [14]:
example_input_json = {
    'length': 0.41,
    'diameter': 0.33,
    'height': 0.10,
    'whole_weight': 0.36
}

In [15]:
return_prediction(model, example_input_json)

7.6

#### Setting up a directory structure and environment

- We'll be needing a specific directory structure to help us easily deploy our machine learning model. 
- This is already set up in this repo.

```shell
flask
â”œâ”€â”€ build_model.ipynb  # this notebook contains the model building code
â”œâ”€â”€ web_api
â”‚   â””â”€â”€ abalone_predictor.joblib  # this is the machine learning model we have built locally
â”‚   â””â”€â”€ app.py  # the file that defines our flask API
â”‚   â””â”€â”€ Procfile  # required by Heroku to help start flask app
â”‚   â””â”€â”€ requirements.txt  # file containing required packages
â”‚   
â””â”€â”€ web_application
    â””â”€â”€ abalone_predictor.joblib  # this is the machine learning model we have built locally
    â””â”€â”€ app.py  # the file that defines our flask API
    â””â”€â”€ Procfile  # required by Heroku to help start flask app
    â””â”€â”€ requirements.txt  # file containing required packages
    â””â”€â”€ templates  # this subdirectory contains HTML templates to help us build the web application
    â”‚   â””â”€â”€ style.css  # css template to be used in web application
    â””â”€â”€ static  # this subdirectory contains CSS style sheets
        â””â”€â”€ home.html  # html template to be used in web application
        â””â”€â”€ prediction.html  # html template to be used in web application
```

#### Model deployment

We have two options for deploying our abalone prediction model. We can:

1. Develop a RESTful web API that accepts HTTP requests in the form of input data and returns a prediction.
2. Build a web application with a HTML user-interface that interacts directly with our API.

We'll explore both options below.

#### Building and deploying a web API

- I have a separate Python file called `app.py` that handles this part.
- We can open it up here in Jupyter Lab and take a look.
- We won't go into details here. If you want to learn more about Flask, see:
  - [Flask tutorial video series by Corey Schafer](https://www.youtube.com/playlist?list=PL-osiE80TeTs4UjLw5MM6OjgkjFeUxCYH)
  - [Flask docs](https://flask.palletsprojects.com/en/1.1.x/)
  - [Flask tutorial by Miguel Grinberg](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world)

#### Testing that the Flask API is alive

- Let's test this out in a terminal here in Jupyter Lab. 
- Ok, now let's go to the URL `http://127.0.0.1:5000/`. 

#### Sending a request to the API

In [17]:
!curl -d '{"length":0.41,"diameter":0.33,"height":0.10,"whole_weight":0.36}' -H "Content-Type: application/json" -X POST http://127.0.0.1:5000/predict

7.6


(Or, we can open up another Terminal tab and do it there)

#### Deploying the API on a server

- Okay, so we have a working API running on localhost, but we don't want to host this service on my laptop!
- We now want to deploy it on a "real" server so others can send it requests. 
- We will use Heroku to deploy our app but you could also use other services such as AWS.


Here are some steps I already did:

1. Head over to [Heroku](https://dashboard.heroku.com/), log-in, and click "Create new app".
2. Choose a unique name for your app.
3. Create app.

<img src="img/flask_images/fl_6.png" width="600">


- Let's do the rest.
- We will be using the Heroku CLI to deploy our model. 
- We'll open up another terminal.

```
heroku login
cd my-project/
git init
heroku git:remote -a my-abalone-predictor
git add .
git commit -am "Initial commit"
git push heroku master
```

(Note that for more complex applications, you may choose to containerize everything in a Docker container to deploy to Heroku).

In [21]:
!curl -d '{"length":0.41,"diameter":0.35,"height":0.10,"whole_weight":0.36}' -H "Content-Type: application/json" -X POST https://my-abalone-predictor.herokuapp.com/predict 

8.6


- OK so what this means is that anyone can do this.
- In fact, you all have your laptops - give it a try!
- You can also do the `curl` from a terminal:

```
curl -d '{"length":0.41,"diameter":0.35,"height":0.10,"whole_weight":0.36}' -H "Content-Type: application/json" -X POST https://my-abalone-predictor.herokuapp.com/predict 
```

![](img/mike_highfive.png)

#### Building and deploying a web application

- Flask can create entire web applications.
- We only need to refactor our code a little bit and link it up with some html and css to create our web application.
- We will use Flask to create a html form, accept data submitted to the form, and return a prediction using the submitted data. 
- Again, I won't go into too much detail here, but we can open up `web_application/` and take a quick look.

#### Testing the web application

- Let's terminate our API Flask app, navigate to `../web_application`, and run again.
- Now let's go back to `http://127.0.0.1:5000/`.

We can try it again on localhost.

#### Deploying the web application

- I already logged in to Heroku and created the app.
- Now the same commands:

```
heroku login
cd my-project/
git init
heroku git:remote -a my-abalone-web-app
git add .
git commit -am "Initial commit"
git push heroku master
```

- Let's try it out: https://my-abalone-web-app.herokuapp.com/
- You can try it too!

#### Discussion

- There are many ways to deploy a model; a RESTful API is very common and convenient. 
- As you can see, a simple deployment is fairly straightward. 
- However, there may be other considerations such as:
  - Privacy/security
  - Scaling
  - Error handling
  - etc.

## Break (15 min)

- We'll take a longer break today.
- Consider taking this time to fill out the instructor/TA evaluations if you haven't already.
- You may have seen [my post about these evaluations](https://www.reddit.com/r/UBC/comments/k18qj7/teaching_evaluations_the_good_the_bad_and_the_ugly/) on r/ubc

https://canvas.ubc.ca/courses/53561/external_tools/4732

## Course review (30 min)


#### Learning objectives

Here are the course learning outcomes I came up with when proposing this new course:

1. Identify problems that may be addressed with machine learning.
2. Select the appropriate machine learning tool for a problem.
3. Transform data of various types into usable features.
4. Apply standard tools implementing supervised and unsupervised learning techniques.
5. Describe core differences between training, validation, and testing regimes.
6. Effectively communicate the results of a machine learning pipeline.
7. Be realistic about the limitations of individual approaches and machine learning as a whole. 
8. Create reproducible workflows and pipelines.

- How did we do? 
- Hopefully OK, except we skipped the last point (that will likely be its own new course).
- I would also add:

9. Identify and avoid scenarios in which training and testing data are accidentally mixed (the "Golden Rule").
10. Employ good habits for applying ML, such as starting an analysis with a baseline estimator.

because I think they are important enough to make it to the course-level list.

#### What did we cover?

I see the course roughly like this (not in order):

Part 1: Supervised learning on tabular data

- Overfitting, train/validation/test/deployment, cross-validation
- Feature preprocessing, pipelines, imputation, OHE, etc
- The Golden Rule, various ways to accidentally violate it
- Classification metrics: confusion matrix, precision/recall, ROC, AUC
- Regression metrics: MSE, MAPE
- Regression: transforming the targets
- Feature importances, feature selection
- Hyperparameter optimization

Part 2: Other data types (non-tabular)

- Computer vision with deep learning
- Language data, text preprocessing
- Ratings data
- Time series
- Right-censored data / survival analysis

Also: Other stuff

- Ensembles
- Outlier detection
- Clustering
- A bunch of models: 
  - Dummy*
  - linear models (ridge, lasso, huber, logistic regression, SGD*)
  - tree-based models (random forest, gradient boosted trees)
  - KNN classifier/regressor
  - pre-trained deep learning models
- Communicating your results (including visualizations)
- ML skepticism
- Ethics for ML

#### Some key takeaways

Some useful guidelines:

- Do train-test split right away
- Don't look at the test set until the end
- Don't call `fit` on test/validation data
- Use pipelines
- Use baselines

Recipe to approach a supervised learning problem with tabular data:

1. Have a long conversation with the person(s) who collected the data
2. Have a long conversation with the stakeholders who will be using your pipeline
3. Random train-test split with fixed random seed; lock away the test set
4. Exploratory data analysis, outlier detection
5. Choose a scoring metric -> higher values should make you happier
6. Fit a baseline model, `DummyRegressor` or `DummyClassifier`
7. Create a preprocessing pipeline (may involve feature engineering)
8. Fit a linear model, `LogisticRegression` or `Ridge`, tune hyperparameters with CV
9. Fit other model(s), e.g. LightGBM, tune hyperparameters with CV
10. Pick a model that you like
11. Look at sub-scores from the folds of cross-validation
12. Perform some more diagnostics like confusion matrix for classification, or "pred vs. true" scatterplots for regression
13. (optional) Try to calibrate the uncertainty/confidence outputted by your model 
14. Test set evaluation
15. Discuss your results with various stakeholders
16. Question everything again
17. Retrain on all your data?
18. Deployment
19. Profit

Note: 

- Step 6 is probably the most time-consuming.
- You may need to go back repeat various steps depending on what happens.

#### What would I do differently?

- Find a dataset with multi-class classification for an early part of the course.
- Reordering the material a bit:
  - Move "feature importances for computer vision" into computer vision lecture (not ethics).
  - Introduce random forests and feature importances a bit earlier
  - Move outlier lecture much earlier
- Allocate 2 lectures to time series data 

I'm sure you have other suggestions - feel free to drop me an email, submit my contact form anonymously at mikegelbart.com, or drop them in the course evaluations.

#### 330 vs. 340

- I am hoping lots of people will take both courses.
- There is some overlap but not a crazy amount (I hope).
- If you want to learn how these methods work under the hood, CPSC 340 will give you a lot of that, such as:
  - Implementing `Ridge.fit()` from scratch
  - Mathematically speaking, what is `C` in `LogisticRegression`?
  - How fast do these algorithms run in terms of the number of rows and columns of your dataset? 
  - Etc.
- There are also a bunch of other methods covered. 

#### Unsolicited advice

- I sometimes end my courses with "unsolicited life advice".
- I won't repeat myself here because some of you took CPSC 340 with me. But if you're interested [it's on YouTube](https://www.youtube.com/watch?v=_7zYxpzrKmQ&list=PLWmXHcz_53Q02ZLeAxigki1JZFfCO6M-b&index=34&t=0s).
- This time I will limit myself to 5 min and one piece of advice.

<br><br>

Mike's advice: if you need to disappoint someone, do it right away.

## Conclusion

That's all, folks! You made it. 

<table style="float:left"><tr>
<td><img src="img/mike_hanginthere.png"/></td>
<td><img src="img/mike_believeinyou.png"/></td>
</tr></table>
