# Boxkite demo

This demo shows an end-to-end model training in Kubeflow notebooks, model management in MLflow, and monitoring of model input & output distributions (data and concept drift) in Prometheus + Grafana.

## Train a diabetes model

First we import our dependencies and turn on mlflow auto-logging, so that the sklearn model we train will automatically have its parameters and metrics logged to MLflow.

In [None]:
import pickle
import os

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from boxkite.monitoring.service import ModelMonitoringService

import mlflow
mlflow.sklearn.autolog()

Then we train a diabetes model on our input data. We save the model to MLflow, and at the same time, we record the histogram (statistical distribution) of the features (input data) and inferences (predictions) using boxkite.

In [None]:
with mlflow.start_run() as run:

    bunch = load_diabetes()
    X_train, X_test, Y_train, Y_test = train_test_split(
        bunch.data, bunch.target
    )
    model = LinearRegression()
    model.fit(X_train, Y_train)

    print("Score: %.2f" % model.score(X_test, Y_test))
    with open("./model.pkl", "wb") as f:
        pickle.dump(model, f)

    # features = [("age", [33, 23, 54, ...]), ("sex", [0, 1, 0]), ...]
    features = zip(*[bunch.feature_names, X_train.T])
    
    Y_pred = model.predict(X_test)
    inference = list(Y_pred)
    
    ModelMonitoringService.export_text(
        features=features, inference=inference, path="./histogram.txt",
    )
    mlflow.log_artifact("./histogram.txt")

The model has been automatically logged to MLflow against the `run.info.run_id`. This is a unique identifier from MLflow which pins down the exact model version along with the histogram we saved from the training data and the predictions the model gave on it.

So let's check we can get it out of MLflow and run some inferences on it:

In [None]:
logged_model = f"s3://mlflow-artifacts/0/{run.info.run_id}/artifacts/model"

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(pd.DataFrame(X_test))

Looks good! Now let's deploy the model to the same Kubernetes cluster we're running on.

## Deploy the model to production in HA mode

Now we're going to deploy the code in https://github.com/boxkite-ml/boxkite/blob/master/examples/kubeflow-mlflow/app/serve_completed.py to the Kubernetes cluster. We've already dockerized it for you, so you don't need to.

That code will fetch the model and its histogram from MLflow, and serve it along with the `/metrics` endpoint so you can - live - and across multiple model servers - compare both the training time data and inference distributions against the inference time data and inference distributions.

The key parts of the code (see above link for the full listing) are:

Inferences for the model, note how we log to `monitor.log_prediction`:

```python
@app.route("/", methods=["POST"])
def predict():
    features = request.json
    score = model.predict([features])[0]
    pid = monitor.log_prediction(
        request_body=request.data,
        features=features,
        output=score,
    )
    return {"result": score, "prediction_id": pid}

```

And where the Prometheus-format metrics are exposed:

```python
@app.route("/metrics", methods=["GET"])
def metrics():
    return monitor.export_http()[0]

```

With these simple changes to your model server, you can now track live data and model drift!

Now we'll deploy it to Kubernetes, with some boilerplate deployment and service configuration. The deployment simply says to run three instances of the model server in HA mode, and the service tells Kubernetes to make those services available behind an internal load balancer called `ml-server`.

In [None]:
deployment = f"""
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-deployment
  labels:
    app: ml-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-server
  template:
    metadata:
      labels:
        app: ml-server
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
      - name: ml-server
        image: quay.io/boxkite/boxkite-app:e7a70df
        ports:
        - containerPort: 5000
        #command: ["tail", "-f", "/dev/null"]
        env:
        - name: MLFLOW_RUN_ID
          value: {run.info.run_id}
        - name: MLFLOW_TRACKING_URI
          value: {os.environ['MLFLOW_TRACKING_URI']}          
        - name: MLFLOW_S3_ENDPOINT_URL
          value: {os.environ['MLFLOW_S3_ENDPOINT_URL']}
        - name: AWS_ACCESS_KEY_ID
          value: {os.environ['AWS_ACCESS_KEY_ID']}
        - name: AWS_SECRET_ACCESS_KEY
          value: {os.environ['AWS_SECRET_ACCESS_KEY']}
"""

service = """
apiVersion: v1
kind: Service
metadata:
  name: ml-server
spec:
  selector:
    app: ml-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
"""
open("deployment.yaml", "w").write(deployment)
open("service.yaml", "w").write(service)
!kubectl apply -f deployment.yaml
!kubectl apply -f service.yaml

Now we wait for the deployment to come up:

In [None]:
!kubectl rollout status deployment/ml-deployment 

Now let's send a sample request to the model server:

In [None]:
!curl ml-server -H "Content-Type: application/json" \
-d "[0.03, 0.05, -0.002, -0.01, 0.04, 0.01, 0.08, -0.04, 0.005, -0.1]"

Hooray, we got a result value! Now let's run a more substantial load test against the model, to get some interesting data in our dashboard:

In [None]:
!python boxkite/examples/kubeflow-mlflow/load.py

Great, now continue with the tutorial in the boxkite docs to see how to log into Grafana and observe the results of our load test.