# ML Models Deployment and Monitoring

In [None]:
#Load the environment for the notebook from Project.toml and Manifest.toml

In [None]:
]instantiate

## Model Deployment

After training and evaluation, the model should be deployed to serve the predictions. The model is usually embedded into a bigger application or exposed through a web service. The mentioned solutions need additional logic to properly prepare the input data and return the prediction should be returned to the user in appropriate form. Let's consider two examples:
* **JSON-based web service** - JSON payload with input observation is provided to the web service and the JSON with the prediction is returned back
* **interactive web application with GUI** - the model is embedded into the application which gathers input from the set of text fields, sliders and other interactive elements, while the prediction is presented on the screen as part of the user interface

As part of the notebook we'll build a simple web service working with JSON data.

In [None]:
using Random
using Statistics
using CSV
using DataFrames
using Flux
using BSON: @save, @load
using JSON
using ProgressMeter

We'll build regression model to predict median house value in the Boston suburbs. The dataset comes from [UCI repository](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/).

Attribute Information:

1. CRIM - per capita crime rate by town
2. ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS - proportion of non-retail business acres per town
4. CHAS - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5. NOX - nitric oxides concentration (parts per 10 million)
6. RM - average number of rooms per dwelling
7. AGE - proportion of owner-occupied units built prior to 1940
8. DIS - weighted distances to five Boston employment centres
9. RAD - index of accessibility to radial highways
10. TAX - full-value property-tax rate per \$10,000
11. PTRATIO - pupil-teacher ratio by town
12. B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT - \% lower status of the population
14. **MEDV - Median value of owner-occupied homes in \$1000's**

In [None]:
#Loading Boston housing data
houses = CSV.read("Boston.csv", DataFrame)
houses = houses[:, Not(:Column1)]
X = transpose(Matrix(houses[!,Not(:medv)]))
y = transpose(houses.medv);

In [None]:
#Neural network model one dense hidden layer with ReLU activation function
model = Chain(Dense(13 => 42, relu), Dense(42 => 1))
loss(x, y) = Flux.Losses.mse(model(x), y)
parameters = Flux.params(model)
data = [(X, y)]
opt = Flux.Adam(0.002)
@showprogress for epoch in 1:50_000
    Flux.train!(loss, parameters, data, opt)
end

In [None]:
#Auxilary function for Root Mean Squared Error calculation
RMSE(y, ŷ) = sqrt(mean((y-ŷ).^2));

In [None]:
#RMSE of the neural network on the training data
RMSE(y, model(X))

In [None]:
#Save the model to `boston_nn.bson` file
@save "boston_nn.bson" model

In [None]:
#Let's 'reset' the model variable with nothing and load Flux model from the file
model = nothing
@load "boston_nn.bson" model

In [None]:
#The neural network can generate the predictions after being loaded from the file
model(X[:,1])[1]

We'll use the saved model in a web service build with [Genie.jl](https://github.com/GenieFramework/Genie.jl). `Genie` is part of broader `GenieFramework` environment providing tools for web development in Julia. 

Our small app will accept JSON payload with values of independent variables (crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, lstat) and use it to produce a median house value prediction. The output will be send back in a JSON form as well.

In [None]:
#Saving first observation from the training dataset into `house.json` file
open("house.json","w") do f
    JSON.print(f, Dict(names(houses)[begin:end-1] .=> X[:,1]), 4)
end

In [None]:
#The script below set up a basic web server accepting GET and POST requests under http://localhost:8000/ address
using Flux
using BSON: @load
using Genie, Genie.Requests, Genie.Renderer.Json

columns = ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","black","lstat"]
@load "boston_nn.bson" model

route("/") do
"""<div style="white-space:pre">To receive a prediction send POST request with JSON payload.

Example:
>> curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/
>> cat house.json
{
    "crim": 0.00632,
    "tax": 296.0,
    "chas": 0.0,
    "black": 396.9,
    "lstat": 4.98,
    "age": 65.2,
    "indus": 2.31,
    "rm": 6.575,
    "dis": 4.09,
    "zn": 18.0,
    "nox": 0.538,
    "ptratio": 15.3,
    "rad": 1.0
}</div>"""
end

route("/", method = POST) do
    input_data = jsonpayload()
    keys_json = keys(input_data)
    missing_fields = [k for k in columns if k ∉ keys_json]
    
    if length(missing_fields) != 0
        missing_str = join(missing_fields, ",")
        Json.json(:error => "The fields: $missing_str are missing from the JSON payload."*
            "The prediction can not be returned.")
    else
        try
            Json.json(Dict("input" => input_data,
                        "prediction" => model([input_data[f] for f in columns])[1])
                     )
        catch e
            Json.json(:error => "Ooops! There was a problem while generating a prediction.")
        end
    end
end
#start the server - it will not block the Jupyter due to async=true
up(port=8000, async=true)

After starting the server, you can use `curl` or other tool capable of sending and receiving HTTP requests to interact with the neural network model.

In [None]:
;curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/

Change the contents of `house.json` file and rerun the call to the web service. Prediction changed accordingly.

In [None]:
;curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/

The server is running asynchronously in Jupyter. When you are finished, run the `down()` command to turn it off.

Note that there is `boston_web_service.jl` script in the directory of this notebook. It makes sense to run the web app outside of the Jupyter and use the notebook to interact with the service. You can use the 
```shell
julia boston_web_service.jl
```
command to launch the app in the terminal synchronously (it will block your terminal, you can then turn the server down by using CMD+C/Ctrl+C)

In [None]:
down()

We have an app ready to be published - right now we can only access our ML service locally, so it's still not very useful. The `boston_web_service.jl` can be deployed on a remote machine with the public IP, maybe we'd bind a DNS domain with the IP, so the service would be available under a nice address like http://boston-predict.com/. 

The server would require the setup of all dependencies and correct configuration, so there is additional effort to operationalize the app. With that approach scaling the service and applying changes (maybe next step is to add a graphical interface) would also be very tedious. Some of the problems can be alleviated by packaging the app into container such as [Docker container](https://www.docker.com/). Containerization is a modern technique for applications development - the application source code, configuration and all required dependencies are packed within an image which can be easily shared and run on multiple machines.

`Genie` supports Docker-based workflows with dedicated functions for building and running the images. More details are available in [Genie tutorial](https://genieframework.github.io/Genie.jl/dev/tutorials/16--Using_Genie_With_Docker.html). **Note that you'll need [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed to follow the tutorial.**

## Model monitoring

After deploying the model, the maintanance and monitoring phase starts. From the technical perspective, the application need to handle all the incoming requests within reasonable time, provide appropiate error handling, stay stable within the normal usage, etc. 

Additionally, the model needs to be monitored with regard to predictive performance. The drift in the incoming data (changes in the distribution of the underlying features compared to the training dataset) may degrade the model's quality. The bussiness needs may change over time as well, which in some cases may require model retraining or redefinition of the task.

In more complex deployments, multiple models are involved in the monitoring and maintance process. Usually the setup includes the 'leading' model and 'auxilary' models. Commonly used techniques include:
* **champion-challenger approach** - the 'champion' model is serving the predictions as the best performing model and the model's quality metrics are gathered over time; periodically the 'challengers' are evaluated against the new data points; if a challanger scores better than the champion, it may replace it as a new champion and the process is continued
* **multi-armed bandits** - there are multiple models capable of serving the prediction in the deployed solution; the leading model in terms of predictive quality handles more requests than the remaining models; often each model receives the probability of serving the prediction, where the leading model has the highest probability

In [None]:
#Linear neural network
model_lin = Chain(Dense(13 => 30), Dense(30 => 1))
loss(x, y) = Flux.Losses.mse(model_lin(x), y)
parameters = Flux.params(model_lin)
data = [(X, y)]
opt = Flux.Adam(0.002)
@showprogress for epoch in 1:50_000
    Flux.train!(loss, parameters, data, opt)
end
@save "boston_nn_lin.bson" model_lin

In [None]:
RMSE(y, model_lin(X))

In [None]:
#Neural network with 2 hidden layers
model_2hl = Chain(Dense(13 => 30, relu), Dense(30 => 10, relu), Dense(10 => 1))
loss(x, y) = Flux.Losses.mse(model_2hl(x), y)
parameters = Flux.params(model_2hl)
data = [(X, y)]
opt = Flux.Adam(0.002)
@showprogress for epoch in 1:60_000
    Flux.train!(loss, parameters, data, opt)
end
@save "boston_nn_2hl.bson" model_2hl

In [None]:
RMSE(y, model_2hl(X))

In [None]:
#Epsilon greedy 3-armed bandit
using Flux
using BSON: @load
using Genie, Genie.Requests, Genie.Renderer.Json

columns = ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","black","lstat"]
@load "boston_nn.bson" model
@load "boston_nn_lin.bson" model_lin
@load "boston_nn_2hl.bson" model_2hl

ϵ = 0.5
bandits = [("ReLU Neural Network", model), 
            ("Linear Neural Network", model_lin), 
            ("Neural Network with Two Hidden Layers", model_2hl)]
pick_probs = ϵ:ϵ/(length(bandits)-1):1.0

route("/", method = POST) do
    input_data = jsonpayload()
    keys_json = keys(input_data)
    missing_fields = [k for k in columns if k ∉ keys_json]
    
    if length(missing_fields) != 0
        missing_str = join(missing_fields, ",")
        Json.json(:error => "The fields: $missing_str are missing from the JSON payload."*
            "The prediction can not be returned.")
    else     
        try
            (bandit_name, bandit) = bandits[argmin(pick_probs .<= rand())]
            print(bandit_name)
            Json.json(Dict("input" => input_data,
                        "prediction" => bandit([input_data[f] for f in columns])[1],
                        "model" => bandit_name)
                     )
        catch e
            Json.json(:error => "Ooops! There was a problem while generating a prediction.")
        end
    end
end
#start the server - it will not block the Jupyter due to async=true
up(port=8000, async=true)

Run few calls to the web server and see how the model serving the predictions change.

In [None]:
;curl -X POST -d @house.json -H "Content-Type: application/json" http://localhost:8000/

You can also run the server outside Jupyter using
```shell
julia boston_multi_armed.jl
```

In [None]:
#Tear the server down when finished
down()