# Comment utiliser le serveur MLFlow en local

<img src="https://uploads-ssl.webflow.com/6108e07db6795265f203a636/61f90cbb8c06383f8944720e_ML%20Flow.png" height="360px">


Dans ce notebook, nous allons utiliser la bibliothèque Mlflow pour suivre les expériences de machine learning.

Nous allons :
- Créer un projet Mlflow
- Créer une expérience
- Créer des runs
- Suivre les métriques, les paramètres et les artefacts
- Visualiser les résultats dans l'interface Mlflow
- Enregistrer un modèle puis le charger dans un autre notebook

## Lancer le conteneur

Lancez cette commande depuis l'emplacement ou vous souhaitez sauvegarder les données et artefacts MLFLow pour lancer le conteneur Docker : 

`docker run -d --name mlflow-local-server -v $(pwd)/mlflow-data:/mlflow-data -p 5001:5001 -e PORT=5001 davidscanu/mlflow-server:v1.0`

### Paramètres

- `-p 5001:5001` : changer le numéro de port en utilisant la structure suivante : `-p port-de-host:port-dans-le-container`.
- `-e PORT=5001` : port utilsé par le serveur MLFLow à l'intérieur du container.
- `--name mlflow-local-server` : changer le nom du container.
- `-v $(pwd)/mlflow-data:/mlflow-data` : les données des logs de MLFlow sont stockées dans un dossier `mlflow-data`. Ce flag permet de synchroniser ce dossier avec votre dossier local avec le dossier `/mlflow-data` à l'intérieur du conteneur. `dossier-local:dossier-conteneur`.

## Importation du jeu de données

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Import dataset
df = pd.read_csv("https://julie-2-next-resources.s3.eu-west-3.amazonaws.com/full-stack-full-time/linear-regression-ft/californian-housing-market-ft/california_housing_market.csv")

# X, y split
X = df.drop("MedHouseVal", axis=1)
y = df.MedHouseVal

# Train / test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

## Configuration d'un projet Mlflow

- `local_tracking_uri` : URL du serveur MLFLow local.
- `experiment_name` : nom de l'experiment dans lequel nous sauvegardons nos modèles.

In [39]:
local_tracking_uri = "http://localhost:5001"
experiment_name = "california_housing_market_tutorial"
run_name='Run_11'


Utilisation de `mlflow.autolog()`
- https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.autolog

In [40]:
import mlflow
from mlflow.models import infer_signature

# Connexion à MLflow
mlflow.set_tracking_uri(local_tracking_uri)

# Configuration de l'autolog
# mlflow.sklearn.autolog()
mlflow.autolog()

# Configuration d'une expérience (création si elle n'existe pas)
mlflow.set_experiment(experiment_name)

# Connexion à une expérience
experiment = mlflow.get_experiment_by_name(experiment_name)

## Enregistrement des métriques, paramètres et artefacts

Listes des artefacts que nous souhaitons sauvegarder :
- Modèle
- Input example
- Signature
- Paramètres
- Métriques



In [41]:
from sklearn.preprocessing import  StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline

# Infer signature : obtention des informations sur les colonnes en entrée
signature = infer_signature(X_train, y_train)

with mlflow.start_run(experiment_id=experiment.experiment_id, run_name=run_name):
    # Pipeline d'entraînement
    model = Pipeline(steps=[
        ("standard_scaler", StandardScaler()),
        ("Regressor",RandomForestRegressor())
    ])

    # Entraînement du modèle
    model.fit(X_train, y_train)

    # # Log d'une metrique
    # mlflow.log_metric("train_score", model.score(X_train, y_train))

    # # Log de plusieurs métriques
    # mlflow.log_metrics({"accuracy": 0.9, "loss": 0.2})

    # # Log des paramètres
    # mlflow.log_params({"epochs": 10, "batch_size": 32})




    # Log du modèle
    # mlflow.sklearn.log_model(model,
    #                         "model_housing"
    #                         # signature = signature,
    #                         # input_example = X_train[:1]
    #                         )

# Print Scores
print(f"Train score: {model.score(X_train, y_train)}")
print(f"Test score: {model.score(X_test, y_test)}")





Train score: 0.973141565133954
Test score: 0.808543903010663


Vous pouvez accéder à l'exécution automatique la plus récente via la fonction `mlflow.last_active_run()`.

In [19]:
autolog_run = mlflow.last_active_run()
autolog_run

<Run: data=<RunData: metrics={'training_mean_absolute_error': 0.12344426937984533,
 'training_mean_squared_error': 0.036479819630337876,
 'training_r2_score': 0.9726552132861959,
 'training_root_mean_squared_error': 0.1909969100020675,
 'training_score': 0.9726552132861959}, params={'Regressor': 'RandomForestRegressor()',
 'Regressor__bootstrap': 'True',
 'Regressor__ccp_alpha': '0.0',
 'Regressor__criterion': 'squared_error',
 'Regressor__max_depth': 'None',
 'Regressor__max_features': '1.0',
 'Regressor__max_leaf_nodes': 'None',
 'Regressor__max_samples': 'None',
 'Regressor__min_impurity_decrease': '0.0',
 'Regressor__min_samples_leaf': '1',
 'Regressor__min_samples_split': '2',
 'Regressor__min_weight_fraction_leaf': '0.0',
 'Regressor__n_estimators': '100',
 'Regressor__n_jobs': 'None',
 'Regressor__oob_score': 'False',
 'Regressor__random_state': 'None',
 'Regressor__verbose': '0',
 'Regressor__warm_start': 'False',
 'memory': 'None',
 'standard_scaler': 'StandardScaler()',
 'sta

In [None]:
# Envoi des métriques
mlflow.log_metrics({"accuracy": 0.9, "loss": 0.2})

In [None]:
# Envoi des paramètres
mlflow.log_params({"epochs": 10, "batch_size": 32})

In [None]:
# Envoi des artefacts
mlflow.log_artifact("./house_prices_model.joblib")

In [None]:
# Fin de l'expérience
mlflow.end_run()

## 6. Charger un modèle enregistré

In [12]:
!python3 --version

Python 3.10.6


In [13]:
mlflow.__version__

'2.7.1'

In [14]:
import sklearn

sklearn.__version__

'1.3.1'

In [None]:
import mlflow
import pandas as pd

mlflow.set_tracking_uri(local_tracking_uri)

logged_model = 'runs:/5118d0ca77564769b73844999e6757b3/model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
#loaded_model.predict(pd.DataFrame(data))

  from .autonotebook import tqdm as notebook_tqdm
Downloading artifacts: 100%|██████████| 5/5 [00:41<00:00,  8.23s/it]
 - numpy (current: 1.24.2, required: numpy==1.26.0)
 - packaging (current: 23.1, required: packaging==23.2)
 - scipy (current: 1.10.1, required: scipy==1.11.3)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


In [None]:
loaded_model.metadata.signature

inputs: 
  ['MedInc': double, 'HouseAge': double, 'AveRooms': double, 'AveBedrms': double, 'Population': double, 'AveOccup': double, 'Latitude': double, 'Longitude': double]
outputs: 
  [Tensor('float64', (-1,))]
params: 
  None

In [None]:
dir(loaded_model.metadata)[-30:]

['__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_metadata',
 '_saved_input_example_info',
 '_signature',
 'add_flavor',
 'artifact_path',
 'flavors',
 'from_dict',
 'get_input_schema',
 'get_model_info',
 'get_output_schema',
 'get_params_schema',
 'load',
 'load_input_example',
 'log',
 'metadata',
 'mlflow_version',
 'model_uuid',
 'run_id',
 'save',
 'saved_input_example_info',
 'signature',
 'to_dict',
 'to_json',
 'to_yaml',
 'utc_time_created']

In [None]:
loaded_model.metadata.to_dict()

{'run_id': '4fb852481a6840b58c1910f503bc5d89',
 'artifact_path': 'model',
 'utc_time_created': '2023-10-05 19:29:50.533305',
 'flavors': {'python_function': {'env': {'conda': 'conda.yaml',
    'virtualenv': 'python_env.yaml'},
   'loader_module': 'mlflow.sklearn',
   'model_path': 'model.pkl',
   'predict_fn': 'predict',
   'python_version': '3.11.3'},
  'sklearn': {'code': None,
   'pickled_model': 'model.pkl',
   'serialization_format': 'cloudpickle',
   'sklearn_version': '1.3.1'}},
 'model_uuid': 'a0a124a12c1e46cd852ad15384439573',
 'mlflow_version': '2.7.1',
 'signature': {'inputs': '[{"type": "double", "name": "MedInc"}, {"type": "double", "name": "HouseAge"}, {"type": "double", "name": "AveRooms"}, {"type": "double", "name": "AveBedrms"}, {"type": "double", "name": "Population"}, {"type": "double", "name": "AveOccup"}, {"type": "double", "name": "Latitude"}, {"type": "double", "name": "Longitude"}]',
  'outputs': '[{"type": "tensor", "tensor-spec": {"dtype": "float64", "shape": 