# Deploy your ML model

This notebook focuses on the last step of the data science workflow: deploying your trained ML model. Just to describe the situation. You've followed all the steps of the data science workflow, you've evaluated your model and your are happy with it. Now it is time to deploy your model for users to be able to use it to do predictions.

Deploying your ML model is not about data science, but about data engineering. So no difficult math :).

<img src="deployMlModel.png" alt="drawing" width="600"/>

We're going to deploy an ML model in three ways:
* a python-based web application where the *backend* performs the prediction
* a node.js web application where the *backend* performs the prediction
* a node.js web application where the *front-end* (the browser) performs the prediction

Why three ways? To give you some choice. Below, we'll go over advantages/disadvantages of the approaches.

# Overview of popular, python-based options

In the hands-on above, we went from the python world to Javascript world by converting the trained model into tensorflow.js format.  However, many data scientists only know python and don't have knowledge about other programming languages. The purpose of this section is to give an overview of currently popular python-based solutions.

1. ***Streamlit***. Streamlit is a <u>server-side web framework</u>, modern (2019) & easy to learn. If you need a very fast and easy way to deploy your ML solution as a web application, choose Streamlit. It has no suppport yet for RESTful API's, although a feature request to add this functionality is outstanding. Just to give some understanding of the architecture, Streamlit is built on top of Tornado, a Python web framework and asynchronous networking library.
2. ***Django*** and ***Flask***. Django and Flask are full-featured, python-based web frameworks. They do support RESTful API's. Due to their full-featured nature, they come with a learning curve. There many other python-based web frameworks, but Django and Flask are the most well-known.
3. ***Plain python using http and json libraries***. Using the libraries ```json```, ```requests``` and ```urllib.parse```, it is easy to set-up an HTTP server. [This](https://towardsdatascience.com/restful-apis-in-python-121d3763a0e4) is an example. This means we have to copy the learned theta values (the theta values are the result of the training) from the "linear regression in practice" hands-on to the application that will be deployed and use the theta values in the hypothesis to do the prediction. Although not difficult, it requires manual coding (error prone). Every time you improve the model, the newly trained theta values must be again manually copied to the application that will be deployed (error prone, as it is easily forgotten). 

We're going to do a small demo using Streamlit, just to show how easily it works.

# Hands-on 5: deploy a pre-trained model using Streamlit

In hands-on 4, Linear Regression in Practice, we've performed a linear regression using the scikit-learn. This is an easy and common method to perform linear regression. In this hands-on, we're going to deploy this model as a web app, using Streamlit. Streamlit is a server-side (== the ML model runs in the backend), python-based web framework that provides a very easy way to deploy your python-based ML solution.

<img src="deployMlModelPythonBackend.png" alt="drawing" width="600"/>

For every HTTP user request the complete Streamlist python script is run from top to bottom. You want to **train** the model only **once**, as this is time consuming, but **use** the trained model to perform **many** predictions. How to accomplish this? Streamlit has the very useful ```@st.cache_resource``` annotation to accomplish this, which you can put above a function that needs to be performed only once. The first time the function is invoked, it is executed and the function result is also cached. The second time the function is invoked, Streamlit uses the cached result, rather than executing the function again.

1. Install streamlit using ```conda install streamlit -c conda-forge```.
2. When activating an anaconda environment, the folder where the anaconda environment is located is automatically prepended to the Windows search path (Powershell: type `$Env:Path` within an anaconda environment and also outside an anaconda environment to see the difference). This means that `streamlit.exe` can be called from any folder. 
3. Save the code below in a file called ```streamlit_demo.py``` (in the same folder as where this Jupyter notebook is located)
4. anaconda prompt> ```streamlit run streamlit_demo.py```
5. open web browswer on http://localhost:8501

In [1]:
##### this code cannot be run as a jupyter notebook
##### save this code to a file called "streamlit_demo.py" and run it using "anaconda prompt> streamlit run streamlit_demo.py"
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import linear_model
import streamlit as st

@st.cache_resource  # this function is executed for the 1st user request; for subsequent user requests, the cached function result is used
# please note that the code below is identical to the code of hands-on 4
def train_model():
    print("function train_model() is called (to verify the caching behavior of Streamlit)")

    # read the training set
    data = pd.read_csv("ex1data1.txt", header=None)  # read from dataset into Pandas DataFrame variable
    data.head()  # view first few rows of the data
    X = data.iloc[:, 0]  # read first column; upper case for matrix
    y = data.iloc[:, 1]  # read second column; lower case for vector
    m = len(y)  # number of training samples; lower case for scalar
    X = np.array(X).reshape(-1, 1)  # transform to format that sklearn expects

    # perform the regression
    regr = linear_model.LinearRegression()
    regr.fit(X, y)

    # the theta's
    print('theta_0', regr.intercept_, 'theta_1', regr.coef_)
    
    return regr


st.title('Predicting profit as function of city size')

# train the model. This is done only once due to the Streamlit caching feature
regr = train_model()

# perform prediction
city_size = float(st.text_input("Enter the city size: ", '0'))  # 2nd argument is the value of city_size on 1st rendering of the web app
pred = regr.predict([[city_size]])
print(pred)
st.text('The predicted profit for city size ' + str (city_size * 10000) + ' is ' + str(pred * 10000) + ' dollars.') 


2024-09-12 11:43:40.632 
  command:

    streamlit run c:\Users\wesse\anaconda3\envs\py312\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2024-09-12 11:43:40.801 Session state does not function when running a script without `streamlit run`


function train_model() is called (to verify the caching behavior of Streamlit)
theta_0 -3.895780878311852 theta_1 [1.19303364]
[-3.89578088]


DeltaGenerator()

# Hands-on 6: deploy a pre-trained Keras model using Tensorflow.js

In Hands-on 3, "Univariate linear regression using a Keras/Tensorflow neural network", we've trained a regression model using Keras. We were happy with the model and saved the model, to allow using it elsewhere to make predictions. 

In this hands-on, we're going to deploy the pre-trained Keras model in two ways:
* a web application where the *backend* performs the prediction. 
* a web application where the *front-end* (the browser) performs the prediction. Note that *training* a machine learning model is CPU-intensive, but *running* is not. So running the trained model in the browser on a lightweight device like a smartphone is not a problem.

Choosing to allocate the prediction intelligence in the backend or front-end is, of course, an archtitectural design choice, each having its own pro's and con's, and will be discussed below.

Both ways of deployment use Tensorflow.js. Tensorflow.js is a Javascript based open-source library with which you define, train, and run machine learning models. As we've defined and trained the model using the normal, python-based Tensorflow, we'll be using Tensorflow.js only to *run* the model.

After this hands-on, you know how to deploy a Keras model. This method can be used for any Keras model, so also convolutional neural networks, classifications, .... .

**Let's start**

In Hands-on 3, "Univariate linear regression using a Keras/Tensorflow neural network", we've saved the model in tensorflow.js format, using ```import tensorflowjs as tfjs; tfjs.converters.save_keras_model(my_regression_model, 'univariateKerasRegression_tfjs_model')```. The tensorflow.js format allows to directly use it in Tensorflow.js. Another frequently used format is HDF5, the Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data. Saving the model in HDF5 format can be done by ```my_regression_model.save('univariateKerasRegression_hf5_model')```. A model saved in HDF5 format (extension .hf5) can be converted to the tensorflow.js format using the tensorflowjs_converter. However, we don't need to do this, as we've saved the model directly into tensorflow.js format.

## Pre-trained Keras model in a Node.js backend

The backend will perform the prediction.

<img src="deployMlModelJavascriptBackend.png" alt="drawing" width="600"/>

The first step is to install [node.js](https://nodejs.org/en/download/). Check the checkbox "Automatically install the necessary tools".

<img src="nodejs_windows_build_tools.png" alt="drawing" width="350"/>

**September 2024:** when running ```npm install```, if you get an error message containing ```Error: Command failed: node scripts/deps-stage.js symlink ./lib/napi-v9```, a solution is to install node.js version v18.16.1 instead of version v20.x.x. This version of node.js can be found [here](https://nodejs.org/download/release/v18.16.1/). For Windows, choose the file ```node-v18.16.1-x64.msi```. This solution comes from [this bug report](https://github.com/tensorflow/tfjs/issues/7793).

To run the application:
1. open a command prompt in the folder "predict_backend"
2. run ```npm install```.
3. run ```node predict_app.js``` to start the node.js http server
4. open a browser with URL: ```http://localhost:8081/?citysize=20```. '20' means a city size of 200000. The browser should show an expected profit of 192580.92880249023 dollars.

Let's have a look at the code in predict_app.js. It's very easy to understand:
* open an HTTP server
* whenever an URL is opened with a query parameter ```citysize```, the ```predictProfit``` function is invoked.
* the function ```predictProfit``` loads the pre-trained Keras/Tensoflow regression model from disk (only once to save resources).
* the ```citysize``` is feature-normalized, a processing step that is is easily forgotten.
* the ```predict``` method is invoked on the loaded regression model and the feature-normalized citysize is passed.
* the predicted value is returned from the ```predict``` method and returned to the browser.

Note that the values of the mean and standard deviation, needed to perform feature normalization, are based on the training set of hands-on 3 'univariate linear regression using a keras/tensorflow NN'. Whenever you change something to the training set, e.g. make it bigger or smaller, the values of the mean and standard deviation are different and the changed values <b><u>need to be copied to this node.js application</u></b>, otherwise the predictions will be incorrect! This must be done for all performed data transformations (e.g. imputing, one hot encoding, ...)

This example has only one feature (univariate linear regression). If your model has two features (multivariate linear regression), the call to ```predict()``` looks as follows:
```
let result = my_regression_model.predict(tf.tensor2d([parseFloat(citysize), parseFloat(citysurface)], [1, 2])).arraySync(); 
```
or alternatively:
```
let result = my_regression_model.predict(tf.tensor2d([[parseFloat(citysize), parseFloat(citysurface)]])).arraySync();
```
You can also pass multiple samples to ```predict()```:
```
let result = my_regression_model.predict(tf.tensor2d([[parseFloat(city1size), parseFloat(city1surface)], [parseFloat(city2size), parseFloat(city2surface)]])).arraySync();
```

## Pre-trained Keras model in a front-end

The front-end will perform the prediction.

<img src="deployMlModelJavascriptFrontend.png" alt="drawing" width="600"/>

This web app is located in the folder "predict_frontend" and consists of three parts:
1. ```fileserve.js```, an express.js backend that serves files. Creating your own Express.js file server is very easy and how to do it can be found [here](https://expressjs.com/en/starter/installing.html). 
2. ```main.js```, the application that does the prediction. Please, appreciate that the code is almost identical to the code of the Node.js backend.
3. ```index.html```, the front-end. It starts with loading ```tensorflow.js``` and ```main.js```. Then it allows the user to enter the city size, invokes the function that does the prediction, and returns the result.
  
To run the application:
1. open a command prompt in the folder "predict_frontend"
2. run ```npm install``` in the folder of the app to install express.js (if you would start from scratch without a ```package.json```, use ```npm install express --no-save```to install express.js and create a ```package.json```).
3. run ```node fileserve.js``` to start the file server.
4. double-clik ```index.html``` to open it in the browser or directly open ```http://localhost:8081/index.html```
5. enter a city size and press ```predict``` to predict the expected profit.


# Discussion

## Backend or front-end deployment?

Should you deploy you model in the backend or the front-end? As mentioned earlier, *training* an ML model is cpu/gpu-heavy, but in general *performing predictions*, using a trained model, usually isn't. So doing prediction on a lightweight front-end, like a cheap smartphone or a tablet should not give performance issues. If you do many predictions at once, then there could theoretically be a performance problem. There's a trade-off here. Doing predictions on the front-end provides natural scalability: the amount of available cpu-power is proportional to the number of users. Doing predictions on the backend will provide sufficient speed for users on lightweight devices (so no annoyed users), but of course, if the number of users increase, computing power on the backend needs to increase as well.

A reason to deploy your ML model on the backend is security. Potentially, you've invested a lot of money to obtain the well-trained ML model. You want to protect your investment. Not impossible to protect it when it runs on the front-end, but arguably less safe.

## Python-based or Javascript-based deployment?

A very, very strong argument in favor of Streamlit is the short modelling-deployment cycle. If you set-up your code base correctly, your streamlit application automatically uses the code with which you develop your model. Every change you make during the development of the model is automatically part of your streamlit application. This saves a lot of time, but maybe even more important, it removes a source of errors, as no manual involvement is needed. In the Javascript version, we've seen that we need to manually translate quite some python-code to Javascript (e.g. all the data transformation steps). This takes time, and is error prone, because you might easily forget to translate a last minute change of your python code to the Javascript code base. Of course there's a solution here: perform also the *development* of your ML model in Javascript, which for the moment means that your stuck with tensorflow.js, which is not bad, but of course limiting.

A second reason to use Streamlit over Javascript is that we can use *any* python-based ML model in our Streamlit application without difficulty. For the Javasript application, we have two solutions: 1. only use tensorflow for your model, as the Javascript-equivalent tensorflow.js is available; 2. manually translate your prediction model to Javascript.

A reason to use Javascript is that it is very popular. Your ML model might be a small part of a big application. At the moment many applications are Javascript-based.

Another reason to use Javascript, is that it allows you to run the ML model in the front-end.


## Diehard Java or C# developers

**YOU**: We've invested an enormous effort in performing our linear regression, difficult maths formulas, complex algorithms, hefty libraries like Tensorflow. But, what is the actual outcome of all these machine learning efforts? So what is a very easy way to deploy our trained model using Java or C#?

Answer: (don't look at it yet if you didn't put some thought effort into the questions above) the outcome of all are learning efforts are in fact the values of $\Theta_0$ and $\Theta_1$. That's all! So a very easy way to deploy our trained model using Java or C# is to write `predictedProfit = theta_0 + theta_1 * citySize`. Something similar can be done when using a neural network. 

Of course the development cycle becomes cumbersome if you hard code the values of $\Theta_0$ and $\Theta_1$ in your Java or C# program. An easy workaround is storing the values of $\Theta_0$ and $\Theta_1$ in a file that is read by the Java or C# program.


# Deploy your app on a public production server

Developing your app is best done on your own laptop. If you're ready developing you want to deploy your app in a production environment. This allows users to use your app or teachers to grade your app. There are many (free) options to publicly deploy your app. Here, we only discuss two options, namely deploying on a shared Linux server from Digital Ocean and deploying on Heroku. Unfortunately, Heroku is not for free any more.

For grading, you usually also hand in the source code, documentation, and the URL of the deployed app.

## Deploy your Streamlit app on a Digital Ocean droplet provided by Avans

This is a server running Linux. One way to access it is `PuTTY`. Server details:
* ip address: \<ask the teacher to create a server\>
* user: root
* pass: hcaid1AI

Software that you need for deployment has already been installed on the server (git, miniconda, python, numpy, matplotlib, pandas, streamlit). Feel free to install more software. If you're going to change the current installation, or if you're going to install 'strange' software, it can be wise to create a separate anaconda environment to avoid that you'll bother other studens.

This server is used by multiple students, so I would like you to stick to the following arrangements:
* Create a folder with your own name where you store the source code of your app.
* The easiest way to get the source code on the server is `git clone <URL to your repo>`. Another option is to use `psftp.exe` or another ftp client.
* To avoid port conflicts, I propose that you open the Excel on Brightspace with the students and groups, and add the row number to  8080. Hanno deploys his app on port 8083, Sander on port 8084, ....
* Deploy your app (replace the port number with your port number): 

`streamlit run streamlit_demo.py --server.port=8150 &` (the ampersand is needed to run the process in the achtergrond)

`disown -h %1` (needed to continue running Streamlit when you close the command terminal)

If for some reason the server is restarted (has not happened in at least a year), you need to repeat these commands.

* http://206.189.8.200:8150/ (this URL points to the Streamlit demo of ML1 Workshop 5)

As you can understand, this is not a very safe server, so do not put anything on this server for which you don't have a version elsewhere.


## Deploy your Node.js app on Heroku

Heroku is a container-based cloud Platform as a Service (PaaS). Developers use Heroku to deploy, manage, and scale modern apps. Nice is that you can use it to deploy your app for free! 

<img src="deployMlModelHeroku.png" alt="drawing" width="600"/>

[This link](https://devcenter.heroku.com/articles/deploying-nodejs) describes how to deploy app's on Heroku. In this hands-on exercise, we'll only show how to deploy 'Pre-trained Keras model in a Node.js backend'. Once you understand how it works, you can deploy 'Pre-trained Keras model in a front-end' yourself. We'll deploy using the Heroku CLI. Another option, that we'll not use, is to deploy by pushing to Github, after which Heroku automatically detects the changes on Github and redeploys. These are the minimal steps to deploy the app on Heroku:

* Create an Heroku account.
* Install the [Heroku CLI](https://devcenter.heroku.com/articles/heroku-cli).
* Your app must be in a git repo in order to use Heroku. Ensure that the app is in the root folder of the repo. For this exercise, it means that `predict_backend` should be the root folder of the git repo.
* Specify the version of node.js in `package.json`, using `engines`. Heroku uses this to deploy your app using the correct version of Node.js.
* Add a `Procfile` to the app. Heroku uses this to find where your app is located.
* Change the `hostname` to `0.0.0.0`, meaning 'all IP addresses on the local machine'.
* Read the `port` from the Heroku environment variable, if present: `const port = process.env.PORT || 8081;`
* Test the app locally on your laptop by typing `heroku local web` in a command prompt. The default port for local deployment is `5000`, so https://localhost:5000/?citysize=20 should do the job. Only when it properly works, continue to the next step.
* Commit the changes to the local repo. 
* In the command prompt, type:
  * `heroku login`
  * `heroku create ercoapp`. This creates the git repo `https://git.heroku.com/ercoapp.git` on the Heroku server to which you will push your app.
  * `git push https://git.heroku.com/ercoapp.git master`. This pushes your local repo to the remote repo on the Heroku server, and then triggers building and deploying the app on the remote Heroku server.
* Your app should be publicly accessible now. The default port for public deployment is `80`, so https://ercoapp.herokuapp.com/?citysize=20 should do the job. The first attempt might take 20-30 seconds as Heroku needs to spin up the server. After this, it will be much faster. When your app isn't used for a while, Heroku will take it of the server and store it, meaning that the first attempt will again take some time.
* Every time you make a change to the app, commit to the local repo and push to the Heroku remote repo to redeploy.
* Also have look at the [Heroku dashboard](https://dashboard.heroku.com/apps). Here you can for example delete your app from Heroku.