# Track an Experiment with MLflow

In this exercise, you will go through the process of training a machine learning model and apply MLOps principles to manage and monitor your machine learning pipeline.

Since you have already trained initial models (Logistic Regression and Random Forest) in a previous exercise, you will track these experiments using MLflow to monitor model performance over multiple runs.

You will need:
- `MLflow` installed (`pip install mlflow`).

### Objective

By the end of this exercise, you will be able to set up MLflow for experiment tracking, logging model metrics, comparing multiple runs, and deploying a model using MLflow's capabilities.


### Step 1: Set Up MLflow and Required Libraries

**Task:** Import the necessary libraries for MLflow and scikit-learn.

- Import libraries such as `mlflow`, `mlflow.sklearn`, `pandas`, and others you think are necessary


In [1]:
import pandas as pd 
import mlflow
from sklearn.preprocessing import LabelEncoder
import mlflow.sklearn 

### Step 2: Create and Track an Experiment with MLflow

**Task:** Define a new experiment in MLflow.

- What command should you use to set up a new experiment in MLflow?
- Set up an experiment with the name "Wearable_Device_Stress_Classifier"


In [106]:
mlflow.set_experiment("Wearable_Device_Stress_Classifier_experiment")

Traceback (most recent call last):
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 327, in search_experiments
    exp = self._get_experiment(exp_id, view_type)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 421, in _get_experiment
    meta = FileStore._read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1367, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1360, in _read_helper
    result = read_yaml(root, file_name)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packag

Traceback (most recent call last):
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 327, in search_experiments
    exp = self._get_experiment(exp_id, view_type)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 421, in _get_experiment
    meta = FileStore._read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1367, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1360, in _read_helper
    result = read_yaml(root, file_name)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packag

<Experiment: artifact_location='file:///Users/nedday/Documents/All%20Projects%20/Processus%20data/mlruns/377287037009484942', creation_time=1731073639311, experiment_id='377287037009484942', last_update_time=1731073639311, lifecycle_stage='active', name='Wearable_Device_Stress_Classifier_experiment', tags={}>

**Task:** Track training metrics and log model parameters.

- How would you start an MLflow run to log your experiment details?
- Train a Logistic Regression model and track its metrics (e.g., accuracy, precision, recall, F1 score) using MLflow.
- Log the model using `mlflow.sklearn.log_model()`.

**Hint:** You should use `with mlflow.start_run():` to start an MLflow run.


In [107]:
data = pd.read_csv("data/device.csv")

from sklearn.model_selection import train_test_split
label_encoder = LabelEncoder()
data['Stressed State'] = label_encoder.fit_transform(data['Stressed State'])
data['Activity Type'] = label_encoder.fit_transform(data['Activity Type'])

X = data.drop('Stressed State', axis=1)
y = data['Stressed State']






X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [109]:
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Set up the MLflow experiment
mlflow.set_experiment("Wearable_Device_Stress_Classifier_experiment")



# Start the MLflow run
with mlflow.start_run():
    # Initialize the model
    model = LogisticRegression(C=0.01,solver='lbfgs',max_iter=1000)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions on the test set
    y_pred = model.predict(X_test)

    # Log model parameters
    mlflow.log_params({
                      'C': 0.05,
                      'solver':'lbfgs',
                      'max_iter': 1000
                     })

    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average="weighted")
    recall = recall_score(y_test, y_pred, average="weighted")
    f1 = f1_score(y_test, y_pred, average="weighted")
    
    # Log model parameters
    mlflow.log_param("model_type", "LogisticRegression")
    mlflow.log_param("max_iter", model.max_iter)
    mlflow.log_param("random_state", model.random_state)
    
    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("precision", precision)
    mlflow.log_metric("recall", recall)
    mlflow.log_metric("f1_score", f1)
    
    # Log the trained model
    mlflow.sklearn.log_model(model, "logistic_regression_model")
    
    print("Model and metrics logged to MLflow.")

Traceback (most recent call last):
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 327, in search_experiments
    exp = self._get_experiment(exp_id, view_type)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 421, in _get_experiment
    meta = FileStore._read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1367, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packages/mlflow/store/tracking/file_store.py", line 1360, in _read_helper
    result = read_yaml(root, file_name)
  File "/Users/nedday/Documents/All Projects /Processus data/tp1/lib/python3.9/site-packag

Model and metrics logged to MLflow.


### Step 3: Compare Multiple Runs

- **Task:** Modify the model parameters (e.g., change solver type or regularization strength) and re-run the experiment log.
- **Open the MLflow UI:** Start the MLflow UI by running:
  ```bash
  mlflow ui
  ```
- Navigate to `http://127.0.0.1:5000` to compare different experiment runs visually. Observe how changing parameters affects the metrics.


In [41]:
!mlflow ui

[2024-11-08 12:20:38 +0100] [35921] [INFO] Starting gunicorn 23.0.0
[2024-11-08 12:20:38 +0100] [35921] [INFO] Listening at: http://127.0.0.1:5000 (35921)
[2024-11-08 12:20:38 +0100] [35921] [INFO] Using worker: sync
[2024-11-08 12:20:38 +0100] [35922] [INFO] Booting worker with pid: 35922
[2024-11-08 12:20:38 +0100] [35923] [INFO] Booting worker with pid: 35923
[2024-11-08 12:20:38 +0100] [35924] [INFO] Booting worker with pid: 35924
[2024-11-08 12:20:38 +0100] [35925] [INFO] Booting worker with pid: 35925
^C
[2024-11-08 12:21:42 +0100] [35921] [INFO] Handling signal: int
[2024-11-08 12:21:42 +0100] [35923] [INFO] Worker exiting (pid: 35923)
[2024-11-08 12:21:42 +0100] [35922] [INFO] Worker exiting (pid: 35922)
[2024-11-08 12:21:42 +0100] [35925] [INFO] Worker exiting (pid: 35925)
[2024-11-08 12:21:42 +0100] [35924] [INFO] Worker exiting (pid: 35924)


### Step 4: Log additional artifacts

**Task:** Log a confusion matrix as an artifact in MLflow.

- How would you log a confusion matrix plot to MLflow?
- Plot a confusion matrix for your predictions and log it as an artifact.

**Hint:** Use `seaborn` for plotting and `mlflow.log_artifact()` for logging the plot.


In [110]:
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns


cm = confusion_matrix(y_test, y_pred)


plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Predicted Negative', 'Predicted Positive'],
            yticklabels=['True Negative', 'True Positive'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')

# Save the plot as a file to log it
plot_filename = "confusion_matrix.png"
plt.savefig(plot_filename)
plt.close()



mlflow.log_artifact(plot_filename)
print("Confusion matrix plot has been logged as an artifact.")

Confusion matrix plot has been logged as an artifact.


### Step 5: Model packaging with MLflow

**Task:** Register the trained model in the MLflow Model Registry.

- How can you register the best model for versioning and potential deployment?
- Use the `mlflow.register_model()` function to add your model to the registry.



In [111]:
import mlflow
import mlflow.sklearn



# Initialiser et entraîner le modèle
model = LogisticRegression(C=0.01, solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

# Loguer le modèle
mlflow.sklearn.log_model(model, "logistic_regression_model")

# URI du modèle loggé dans le run
model_uri = f"runs:/{mlflow.active_run().info.run_id}/logistic_regression_model"

# Enregistrer le modèle dans le MLflow Model Registry
mlflow.register_model(model_uri, "Wearable_Device_Stress_Classifier_Model")

print("Modèle enregistré dans le registre MLflow.")




Modèle enregistré dans le registre MLflow.


Registered model 'Wearable_Device_Stress_Classifier_Model' already exists. Creating a new version of this model...
Created version '2' of model 'Wearable_Device_Stress_Classifier_Model'.


### Step 6: Deploy the model as a REST API

**Task:** Deploy the registered model as a REST API using MLflow.

- How would you serve the model locally?
- Use `mlflow models serve` to start a local REST API.

```bash
mlflow models serve -m "models:/WearableStressClassifierModel/1" -p 1234
```

**Task:** Make a prediction request to your model's REST API.

- Use Python's `requests` library to send a JSON request to the REST API and get predictions.


In [112]:

sample = X_test.iloc[:1]  # Choix de la première ligne comme exemple
input_data = {
    "dataframe_split": {
        "columns": list(sample.columns),
        "data": sample.values.tolist()
    }
}

# URL de l'API REST
url = "http://127.0.0.1:1234/invocations"

# Envoi de la requête POST
headers = {"Content-Type": "application/json"}
response = requests.post(url, headers=headers, data=json.dumps(input_data))

# Affichage du résultat de la prédiction
if response.status_code == 200:
    prediction = response.json()
    print("Prédiction :", prediction)
else:
    print("Erreur :", response.status_code, response.text)


Prédiction : {'predictions': [1]}


### Step 7: Manage model versions

**Task:** Explore the MLflow Model Registry.

- How can you manage different versions of your model in MLflow?
- Experiment with updating the registered model after retraining with different hyperparameters.


In [113]:
from mlflow.tracking import MlflowClient

# Créer un client MLflow
client = MlflowClient()

# Liste des modèles enregistrés
models = client.search_registered_models()
for model in models:
    print(f"Nom du modèle : {model.name}")
    for version in model.latest_versions:
        print(f"  Version : {version.version} | Stade : {version.current_stage} | Statut : {version.status}")


Nom du modèle : Wearable_Device_Stress_Classifier_Model
  Version : 2 | Stade : None | Statut : READY


In [114]:
model_name = "Wearable_Device_Stress_Classifier_Model"  
model_versions = client.search_model_versions(f"name='{model_name}'")

for version in model_versions:
    print(f"Version : {version.version} | Stade : {version.current_stage} | Statut : {version.status}")


Version : 2 | Stade : None | Statut : READY
Version : 1 | Stade : None | Statut : READY


### Step 8: Automated workflow

**Task:** Use MLflow Projects to automate the workflow.

- Create an `MLproject` file to define the script and dependencies in a reproducible manner.
- Run the project locally to validate your setup.

```bash
mlflow run .
```


### Step 9: Visualize metrics and monitor models

**Task:** Visualize metrics using the MLflow UI.

- How can you use the MLflow UI to compare runs and visualize metrics like `accuracy`, `precision`, `recall`, and `f1_score`?
- Set up custom monitoring dashboards using Grafana if needed.


### Summary and exploration

- **Key Questions:**
  1. How did changing model parameters impact performance metrics like accuracy, precision, and recall?
  2. What are the benefits of tracking multiple runs using MLflow?
  3. How can artifact logging be useful for diagnosing model behavior?
  4. What are the challenges in deploying machine learning models, and how does MLflow assist?
  5. How can visualizing metrics help you understand your model's performance over time?
  6. Why is monitoring model drift important, and how can MLflow help?
