# Exemple

In [1]:
import mlflow

In [2]:
#!pip freeze > /mnt/docker/requirements.txt

## Utilisation du package

In [3]:
#Cette cellule permet d'appeler la version packagée du projet et d'en assurer le reload avant appel des fonctions
%load_ext autoreload
%autoreload 2

# Configuration de l'experiment MLFlow

In [4]:
mlflow.tracking.get_tracking_uri()

'/mnt/experiments'

# Outils par défaut

- [pandas_profiling](https://github.com/pandas-profiling/pandas-profiling) : EDA
- [mljar](https://github.com/mljar/mljar-supervised) : auto ml
- [interpretML](https://github.com/interpretml/interpret) : modèle nativelemnt interprétable
- [explainerdashboard](https://github.com/oegedijk/explainerdashboard) : interprétation des modèles

# Exemples d'utilisation

In [5]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error



In [6]:
# Load the data
housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(
    pd.DataFrame(housing.data, columns=housing.feature_names),
    housing.target,
    test_size=0.25,
    random_state=123,
)


In [7]:
print(list(X_train))

['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']


## Exemple pandas_profiling

In [8]:
from pandas_profiling import ProfileReport

In [9]:
pr = ProfileReport(X_train)

In [10]:
pr.to_file(output_file='pandas_profiling.html')

Summarize dataset:   0%|          | 0/27 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

le code précédent a créé une page html, qu'il suffit d'ouvrir dans le browser

## Exemple ML-JAR

In [15]:
from supervised.automl import AutoML # mljar-supervised

# train models with AutoML
automl = AutoML(mode="Explain", results_path='/mnt/auto_ml')
automl.fit(X_train, y_train)

# compute the MSE on test data
predictions = automl.predict(X_test)
print("Test MSE:", mean_squared_error(y_test, predictions))

AutoML directory: /mnt/auto_ml
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble availabe models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 3 models
1_Baseline rmse 8.974686 trained in 0.22 seconds
2_DecisionTree rmse 6.086591 trained in 2.99 seconds
3_Linear rmse 5.046435 trained in 2.14 seconds
* Step default_algorithms will try to check up to 3 models
4_Default_Xgboost rmse 4.23677 trained in 4.05 seconds
5_Default_NeuralNetwork rmse 3.521359 trained in 0.4 seconds
6_Default_RandomForest rmse 4.603043 trained in 1.73 seconds
* Step ensemble will try to check up to 1 model
Ensemble rmse 3.504847 trained in 0.17 seconds
AutoML fit time: 16.53 seconds
AutoML best model: Ensemble
Test MSE: 13.988403191076053


## Exemple explainerdashboard

In [16]:
import explainerdashboard
from explainerdashboard import RegressionExplainer

In [19]:
from sklearn.ensemble import RandomForestRegressor
from explainerdashboard import RegressionExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names, titanic_fare

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid", 
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_fare()
model = RandomForestRegressor().fit(X_train, y_train)

explainer = RegressionExplainer(model, X_test, y_test, 
                                cats=['Deck', 'Embarked', 'Sex'],
                                descriptions=feature_descriptions, 
                                units = "$", # defaults to ""
                                )

db = ExplainerDashboard(explainer,
                        title="Exemple regression", # defaults to "Model Explainer"
                        shap_interaction=False, # you can switch off tabs with bools
                        )

Note: shap=='guess' so guessing for RandomForestRegressor shap='tree'...
Generating self.shap_explainer = shap.TreeExplainer(model)
Changing class type to RandomForestRegressionExplainer...
Building ExplainerDashboard..
Detected notebook environment, consider setting mode='external', mode='inline' or mode='jupyterlab' to keep the notebook interactive while the dashboard is running...
Generating layout...
Calculating shap values...
Calculating predictions...
Calculating residuals...
Calculating absolute residuals...
Calculating dependencies...
Calculating importances...
Calculating ShadowDecTree for each individual decision tree...
Reminder: you can store the explainer (including calculated dependencies) with explainer.dump('explainer.joblib') and reload with e.g. ClassifierExplainer.from_file('explainer.joblib')
Registering callbacks...


In [None]:
db.run(8050)

Starting ExplainerDashboard on http://localhost:8050
Dash is running on http://127.0.0.1:8050/



2021-06-06 07:22:40,084 explainerdashboard.dashboards INFO Dash is running on http://127.0.0.1:8050/



 * Serving Flask app "explainerdashboard.dashboards" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


2021-06-06 07:22:40,088 werkzeug INFO  * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)


In [22]:
db.terminate(8050)

Trying to shut down dashboard on port 8050...
Something seems to have failed: HTTPConnectionPool(host='localhost', port=8050): Max retries exceeded with url: /_shutdown_070574c4-998d-48df-b450-aa0160fde279 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8815cf5ac0>: Failed to establish a new connection: [Errno 111] Connection refused'))


L'exemple suivant crée une appli sur le port (interne) 8050 qui est assigné par docker à la volée par la commande make-dev-start
**ATTENTION** il faut récupérer l'adresse du port ouvert sur la machine par le commande `docker ps` pour le port interne 8050 (par ex 49155), puis aller ouvrir une fenêtre dans le browser sous `localhost:49155` (la valeur après les `:` étant à remplacée par la valeur du port lue dans le terminal