# ✍️ Exercise: Intro to MLFlow - Part III

Now that we have loged models into MLFlow it's time to learn how register them and deploy them to a production environment.


- Load a regression dataset
- Train a model
- Log the model into MLFlow
- Register the model
- Stage the model into production/development
- Deploy the model using MLFlow

In [7]:
from sklearn import datasets


# Download dataset and convert to pandas dataframe
diabetes_dataset = datasets.load_diabetes(as_frame=True)
X = diabetes_dataset.data
y = diabetes_dataset.target

## Exercise I: Split the Data into Train and Test Sets

💡 Remember that we need to split our data into train and test sets. We can use the [`train_test_split` function](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from `sklearn.model_selection` to do this. We should store the split into `X_train`, `y_train`, `X_test`, `y_test`.

In [8]:
from sklearn.model_selection import train_test_split


RANDOM_STATE = 42
TEST_SIZE = 0.2

# 👇 Add the relevant code below to split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE)

## Exercise II: Train a Linear Regression Model

Then, train a [**linear regression model** using the scikit-learn library](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).

1. 👉 Initialize the model calling the `LinearRegression` class.
2. 👉 Train the model using the `fit` method.

In [9]:
from sklearn.linear_model import LinearRegression


# Add code to train the model 👇
model = LinearRegression()
model.fit(X_train, y_train)

## Exercise III: Compute the Accuracy of the Model

Finally, compute the accuracy of the model using the [`mean_squared_error` function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) from the `sklearn.metrics` module.

1. 👉 Compute the predictions by passing the `X_test` to the `predict` method of the model.
2. 👉 Compute the accuracy using the `mean_squared_error` function and passing the `y_test` and the `predictions` as arguments.
3. 👉 Print the accuracy.

In [10]:
from sklearn.metrics import root_mean_squared_error


# Add code to calculate the mean squared error 👇
y_pred = model.predict(X_test)
y_pred.shape, y_test.shape


((89,), (89,))

In [10]:
rmse = root_mean_squared_error(y_test, y_pred)
rmse

53.85344583676593

## Exercise IV: Create a Run and log the model and metrics.

1. 👉 Connect to MLFlow
2. 👉 Set the experiment "Diabetes Linear Regression"

In [11]:
import mlflow


EXPERIMENT_NAME = "Diabetes Linear Regression"
MLFLOW_TRACKING_URI = "http://localhost:5000"

# Connect to MLFlow 👇
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

<Experiment: artifact_location='mlflow-artifacts:/228036213947662562', creation_time=1730320093975, experiment_id='228036213947662562', last_update_time=1730320093975, lifecycle_stage='active', name='Diabetes Linear Regression', tags={}>


1. 👉 Log the root mean squared error metric using `mlflow.log_metric` function
2. 👉 Log the model using the `mlflow.sklearn.log_model` function.

In [12]:
# launch a run to log the model
with mlflow.start_run() as run:
    
    # Add code to log the model, and the mean squared error 👇
    mlflow.log_metrics({"rmse": 0.5})
    mlflow.sklearn.log_model(model, "model", input_example=X_test[:1])

2024/12/22 19:07:17 INFO mlflow.tracking._tracking_service.client: 🏃 View run magnificent-eel-617 at: http://localhost:5000/#/experiments/228036213947662562/runs/588664b3ad054699bfc512039aebf2cb.
2024/12/22 19:07:17 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/228036213947662562.


## Exercise V: Register the model

Registering a model in MLFlow is a way to keep track of the different versions of the same model. Registered models have different versions that track changes in the model and allows

1. 👉 Get the **run ID** of the model you want to register using `run.info.run_id`.
2. 👉 Register the model using the `mlflow.register_model` function.

In [None]:
# register the model for this run
MODEL_NAME = "diabetes_prediction"  # change this to your model name


# Compute model path: models stored in a run follow this convention
model_path = f"runs:/{run_id}/model"  # fill the `run_id`` variable

## Exercise VI: Deploy a model

Deploying a model is a complex task that involves many steps. MLFlow simplifies this process by providing a set of tools to deploy models to different platforms. In this exercise, we will deploy a model to a local server. 

First, you need to connect the terminal to the MLFlow Server by setting the `MLFLOW_TRACKING_URI` environment variable. 

```bash
export MLFLOW_TRACKING_URI=http://localhost:5000
```

Then, you can deploy the model using the `mlflow models serve` command **in your terminal**:

```bash
mlflow models serve --model-uri models:/<model_name>/<model_version> --port 5001
```

Where `<model_name>` is the name of the model and `<model_version>` is the version of the model you want to deploy. You can find the name and version of the model in the MLFlow UI. Also the `--port` argument is the port where the server will be running. It's important to choose a port different than the `5000` port where the MLFlow server is running.

## BONUS: Make a request to the model

Finally, make a request to the model using the `requests` library. You can use the following code to make a request to the model:

In [15]:
import requests
import json
import numpy as np

# Define the URL and payload (JSON data)
url = 'http://localhost:5001/invocations'
headers = {'Content-Type': 'application/json'}

# Create a list representing the (100, 1) vector
input_vector = np.random.rand(2, 8).tolist()

# Create a dictionary with the 'inputs' key and the input_vector
payload = {'inputs': input_vector}

# Convert the payload to JSON format
json_payload = json.dumps(payload)

# Make a POST request
response = requests.post(url, headers=headers, data=json_payload)

# Check the response
if response.status_code == 200:
    print("Request successful. Response:")
    print(response.text)
else:
    print(f"Request failed with status code {response.status_code}")
    print(response.text)

Request failed with status code 400
{"error_code": "INVALID_PARAMETER_VALUE", "message": "Failed to predict data '[[0.21101822 0.99919697 0.08413026 0.18916995 0.731338   0.63528183\n  0.64591351 0.62140125]\n [0.22270805 0.92695482 0.45520974 0.87563821 0.40012676 0.57366854\n  0.66006406 0.40727353]]'. \nError: Failed to enforce schema of data '[[0.21101822 0.99919697 0.08413026 0.18916995 0.731338   0.63528183\n  0.64591351 0.62140125]\n [0.22270805 0.92695482 0.45520974 0.87563821 0.40012676 0.57366854\n  0.66006406 0.40727353]]' with schema '[Tensor('float64', (-1, 10))]'. Error: Shape of input (2, 8) does not match expected shape (-1, 10)."}
