# 1. Getting Started
## a) Connection à Weights and Biases

In [1]:
# 1. Log in to your W&B account
import wandb

wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33malban[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## b) Première run W&B

In [2]:
# 2. Start a W&B Run
run = wandb.init(
    project="classification-car-accidents-2",
    name='My first run',
    tags=["baseline", "random-forest"],
)

In [4]:
#  3. Capture a dictionary of hyperparameters
params = {"n_estimators": 100, "criterion": 'gini', "max_depth": 10}

wandb.config = params

In [5]:
# 4. Train the model
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier
import numpy as np

X_train = pd.read_csv('../data/preprocessed/X_train.csv')
X_test = pd.read_csv('../data/preprocessed/X_test.csv')
y_train = pd.read_csv('../data/preprocessed/y_train.csv')
y_test = pd.read_csv('../data/preprocessed/y_test.csv')
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)

rf_classifier = RandomForestClassifier(**params)

rf_classifier.fit(X_train, y_train)

In [6]:
# 5. Capture a dictionary of metrics
train_accuracy = rf_classifier.score(X_train, y_train)
test_accuracy = rf_classifier.score(X_test, y_test)
wandb.log({"train_accuracy": train_accuracy, "test_accuracy": test_accuracy})

Pour finir, il est possible de stocker les artifacts du modèle dans WandB. Il suffit pour cela d'utiliser la méthode `log_artifact` et de préciser le chemin vers l'artifact du modèle. 

In [7]:
# 6. Track model artifact
import joblib

#Save the trained model to a file
model_filename = '../models/trained_model.joblib'
joblib.dump(rf_classifier, model_filename)

#Track the file
wandb.log_artifact(model_filename)

<Artifact run-31y36b1g-trained_model.joblib>

# 2 Parameter Tracking

Afin de conserver les informations relatives à l'entraînement du modèle, il est nécessaire de tracker les paramètres de ce dernier. Pour cela, nous allons construire un dictionnaire dont les clés correspondent aux noms des paramètres, et les valeurs à leurs valeurs respectives. 

Nous pourrons alors utiliser ce dictionnaire directement en entrée du modèle. 

A noter que ces paramètres seront utiles à la reproductibilité de la run mais également à l'optimisation des paramètres. Il est donc recommandé d'y intégrer tous les paramètres que nous pourrions avoir envie d'optimiser plus tard. 

In [8]:
# 1. Start a W&B Run
run = wandb.init(
    project="classification-car-accidents",
    name='My second run',
    tags=["baseline", "random-forest"],
)

#  2. Capture a dictionary of hyperparameters
params = {"n_estimators": 50, "criterion": 'gini', "max_depth": 5}

wandb.config = params

rf_classifier = RandomForestClassifier(**params)

#--Train the model
rf_classifier.fit(X_train, y_train)

train_accuracy = rf_classifier.score(X_train, y_train)
test_accuracy = rf_classifier.score(X_test, y_test)
wandb.log({"train_accuracy": train_accuracy, "test_accuracy": test_accuracy})

0,1
test_accuracy,▁
train_accuracy,▁

0,1
test_accuracy,0.77136
train_accuracy,0.81966


In [33]:
# Visualize all regression plots
wandb.sklearn.plot_classifier(rf_classifier, 
                              X_train, X_test, 
                              y_train, y_test, 
                              rf_classifier.predict(X_test), rf_classifier.predict_proba(X_test), 
                              y_test, 
                              is_binary=True, 
                              model_name='RandomForest')

wandb.finish()

[34m[1mwandb[0m: 
[34m[1mwandb[0m: Plotting RandomForest.
[34m[1mwandb[0m: Logged feature importances.
[34m[1mwandb[0m: Logged confusion matrix.
[34m[1mwandb[0m: Logged summary metrics.
[34m[1mwandb[0m: Logged class proportions.
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
[34m[1mwandb[0m: Logged calibration curve.
[34m[1mwandb[0m: Logged roc curve.
[34m[1mwandb[0m: Logged precision-recall curve.


0,1
test_accuracy,▁
train_accuracy,▁

0,1
test_accuracy,0.74711
train_accuracy,0.75193


```
run = Run 
experiment = Project
dossier mlruns = dossier wandb
```
