# **Utilisation de la bibliothèque Mlflow**

<img src='https://www.databricks.com/sites/default/files/mlflow.png'>

Dans ce notebook, nous allons utiliser la bibliothèque Mlflow pour suivre les expériences de machine learning.

Nous allons :
- créer un projet Mlflow
- créer une expérience
- créer des runs
- suivre les métriques, les paramètres et les artefacts
- visualiser les résultats dans l'interface Mlflow
- enregistrer un modèle puis le charger dans un autre notebook

**1. Import du jeu de données**

In [None]:
!pip install mlflow boto3

Collecting boto3
  Downloading boto3-1.28.61-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting botocore<1.32.0,>=1.31.61 (from boto3)
  Downloading botocore-1.31.61-py3-none-any.whl (11.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jmespath<2.0.0,>=0.7.1 (from boto3)
  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.8.0,>=0.7.0 (from boto3)
  Downloading s3transfer-0.7.0-py3-none-any.whl (79 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.8/79.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting urllib3<1.27,>=1.25.4 (from botocore<1.32.0,>=1.31.61->boto3)
  Downloading urllib3-1.26.17-py2.py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/143.4 kB[0m [31m13.7 MB/s[0m eta [36m0:00:

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Import dataset
df = pd.read_csv("https://julie-2-next-resources.s3.eu-west-3.amazonaws.com/full-stack-full-time/linear-regression-ft/californian-housing-market-ft/california_housing_market.csv")

# X, y split
X = df.drop("MedHouseVal", axis=1)
y = df.MedHouseVal

# Train / test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

**2. Configuration des credentials**

In [None]:
import os
os.environ

environ{'SHELL': '/bin/bash',
        'NV_LIBCUBLAS_VERSION': '11.11.3.6-1',
        'NVIDIA_VISIBLE_DEVICES': 'all',
        'COLAB_JUPYTER_TRANSPORT': 'ipc',
        'NV_NVML_DEV_VERSION': '11.8.86-1',
        'NV_CUDNN_PACKAGE_NAME': 'libcudnn8',
        'CGROUP_MEMORY_EVENTS': '/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events',
        'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.15.5-1+cuda11.8',
        'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.15.5-1',
        'VM_GCE_METADATA_HOST': '169.254.169.253',
        'HOSTNAME': '3211a000e975',
        'LANGUAGE': 'en_US',
        'TBE_RUNTIME_ADDR': '172.28.0.1:8011',
        'GCE_METADATA_TIMEOUT': '3',
        'NVIDIA_REQUIRE_CUDA': 'cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,d

In [None]:
# Crédentials d'accès à AWS
os.environ['AWS_ACCESS_KEY_ID'] = "AKIA3R62MVALHESATEYJ"
os.environ['AWS_SECRET_ACCESS_KEY'] = "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"
os.environ['ARTIFACT_STORE_URI'] = "s3://isen-mlflow/models/"
os.environ['BACKEND_STORE_URI'] = "postgresql://eagbhergisskna:6e299604b7204f81d625807348dd55dd6d33d426eb2d33762b54c1dcf7367112@ec2-3-214-103-146.compute-1.amazonaws.com:5432/d9ov3338s1olla"

**3. Configuration d'un projet Mlflow**

In [None]:
import mlflow
from mlflow.models import infer_signature


# Connexion à MLflow
mlflow.set_tracking_uri("https://isen-mlflow-fae8e0578f2f.herokuapp.com/")

# Configuration de l'autolog
mlflow.sklearn.autolog()

# Configuration d'une expérience (création si elle n'existe pas)
#mlflow.set_experiment("ISEN Models")

# Connexion à une expérience
experiment = mlflow.get_experiment_by_name(
                                           "ISEN - GrOupe N"  # Nom de l'expérience de votre groupe
                                            )


* 'schema_extra' has been renamed to 'json_schema_extra'


**4. Enregistrement des métriques, des paramètres et des artefacts**

In [None]:
from sklearn.preprocessing import  StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline


# Infer signature : obtention des informations sur les colonnes en entrée
signature = infer_signature(X_train, y_train)

with mlflow.start_run(experiment_id=experiment.experiment_id, run_name='First training'):
    # Pipeline d'entraînement
    model = Pipeline(steps=[
        ("standard_scaler", StandardScaler()),
        ("Regressor",RandomForestRegressor())
    ])

    # Entraînement du modèle
    model.fit(X_train, y_train)

    # Log des métriques
    mlflow.log_metric("train_score", model.score(X_train, y_train))


    mlflow.sklearn.log_model(model,                     # Sauvegarde du modèle
                            "model houssing",           # Nom du modèle
                            signature=signature,        # Informations sur les colonnes en entrée
                            input_example=X_train.head(1),  # Exemple d'entrée
                            registered_model_name="housing_model"   # Nom du modèle enregistré
                            )


# Print Scores
print(f"Train score: {model.score(X_train, y_train)}")
print(f"Test score: {model.score(X_test, y_test)}")

Registered model 'housing_model' already exists. Creating a new version of this model...
2023/10/06 08:05:56 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: housing_model, version 2
Created version '2' of model 'housing_model'.


Train score: 0.9734837894747121
Test score: 0.8047189870085903


In [None]:
# Début de l'expérience (run name = "First Run")
mlflow.start_run(run_name="First Run 0")

# Sauvegarde du modèle
mlflow.sklearn.log_model(model,
                         "model",
                         registered_model_name="model_classification")

Successfully registered model 'model_classification'.
2023/10/06 08:07:31 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: model_classification, version 1
Created version '1' of model 'model_classification'.


<mlflow.models.model.ModelInfo at 0x29f243bd0>

In [None]:
from joblib import dump

# Persist our model
print("Saving model...")
dump(model, "./house_prices_model.joblib")
print(f"Model has been saved here: {os.getcwd()}")

Saving model...
Model has been saved here: /Users/kevinduranty/Desktop/Partie 4 - MLFLOW


In [None]:
# Envoi des métriques
mlflow.log_metrics({"accuracy": 0.9, "loss": 0.2})

In [None]:
# Envoi des paramètres
mlflow.log_params({"epochs": 10, "batch_size": 32})

In [None]:
# Envoi des artefacts
mlflow.log_artifact("./house_prices_model.joblib")

In [None]:
# Fin de l'expérience
mlflow.end_run()

**6. Exemple de code permettant de charger un modèle enregistré**

In [None]:
import mlflow

import os
os.environ['AWS_ACCESS_KEY_ID'] = "AKIA3R62MVALHESATEYJ"
os.environ['AWS_SECRET_ACCESS_KEY'] = "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"


mlflow.set_tracking_uri("https://isen-mlflow-fae8e0578f2f.herokuapp.com/")


logged_model = 'runs:/4fb852481a6840b58c1910f503bc5d89/model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd
#loaded_model.predict(pd.DataFrame(data))

AttributeError: ignored

In [None]:
loaded_model.metadata.signature

inputs: 
  ['MedInc': double, 'HouseAge': double, 'AveRooms': double, 'AveBedrms': double, 'Population': double, 'AveOccup': double, 'Latitude': double, 'Longitude': double]
outputs: 
  [Tensor('float64', (-1,))]
params: 
  None