## Introduction
Avant de faire tourner le code suivant, il est nécessaire d'avoir fait tourner de manière indépendante les notebooks suivants :
- 01_importation_fusion
- 02_weather
- 03_featuring

Il est ensuite possible de :
- Utiliser les notebooks "exploration" ou "dataviz" présent dans le dossier NOTEBOOKS pour l'exploration ou afficher les graphes utiliser dans le rapport
- Utiliser les notebooks 04_regression, 05_classification ou 06_deep_learning pour réaliser l'entraînement des modèles agrémenté de graphiques et d'affichages de tableau pour le suivi du raisonnement.

Il est également possible de :
- Faire tourner les modèles à l'aide du main (ce notebook) en choissisant ainsi les modèles que l'on souhaite faire tourner.
- Pour modifier le preprocessing => aller dans les fonctions dfinies dans le notebook "preprocessing"
- Pour modifier les paramètres des modèles => aller dans les fonctions définies dans les notebooks "regression", "classification" ou "deep_learning"

## Importation des librairies

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Standardisation et évaluation
from sklearn.preprocessing import MinMaxScaler, StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score

# Réduction de dimension
from sklearn.feature_selection import VarianceThreshold, SelectFromModel
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# Undersampling
from imblearn.under_sampling import RandomUnderSampler

# Evaludation des modèles
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, r2_score # modèle régression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix # modèle classification

# Divers
from utils import dataframe_info, racine_projet, save_model, load_model
from preprocessing import prepross_reg, prepross_class
from regression import regression_lineaire, ridge_model, lasso_model, elasticnet_model, xgb_model, xgb_gridsearch
from classification import knn_class, decision_tree_class, random_forest_class, xgb_class, random_forest_gridsearch, xgb_class_gridsearch
from deep_learning import deep_learning_dense, deep_learning_improved


## Régression

In [2]:
X_train, y_train, X_test, y_test = prepross_reg()

In [3]:
#dataframe_info(pd.DataFrame(X_train))

In [4]:
# Entraînement modèle de Régression linéaire
lr,r2_lr,rmse_lr,mae_lr = regression_lineaire(X_train, y_train, X_test, y_test)
save_model(lr, 'lr_model')

r^2: 0.16339186525241034
Root Mean Squared Error (RMSE): 140.1886675972663
Mean Absolute Error (MAE): 101.43856528846408


In [5]:
# Entraînement modèle Ridge
ridge,r2_ridge,rmse_ridge,mae_ridge = ridge_model(X_train, y_train, X_test, y_test)
save_model(ridge, 'ridge_model')

Ridge r^2: 0.16339185038426218
Ridge Root Mean Squared Error (RMSE): 140.18866884297847
Ridge Mean Absolute Error (MAE): 101.43856343659506


In [6]:
# Entraînement modèle Lasso
lasso,r2_lasso,rmse_lasso,mae_lasso = lasso_model(X_train, y_train, X_test, y_test)
save_model(lasso, 'lasso_model')

Lasso r^2: 0.15595092553241696
Lasso Root Mean Squared Error (RMSE): 140.8107188194601
Lasso Mean Absolute Error (MAE): 101.76065437022264


In [7]:
# Entraînement modèle ElasticNet
elastic_net,r2_en,rmse_en,mae_en = elasticnet_model(X_train, y_train, X_test, y_test)
save_model(elastic_net, 'elasticnet_model')

r^2: 0.11936812613980696
Root Mean Squared Error (RMSE): 143.82986297706728
Mean Absolute Error (MAE): 104.20704609731379


In [12]:
# Entraînement modèle XGB Regressor
xgb,r2_xgb,rmse_xgb, mae_xgb = xgb_model(X_train, y_train, X_test, y_test)
save_model(xgb, 'xgb_regressor_model')

r^2: 0.329063355922699
Root Mean Squared Error (RMSE): 125.54303161587222
Mean Absolute Error (MAE): 86.95122787852921


In [14]:
# Entraînement modèle Gridsearch XGB
best_xgb, best_params, r2_bestxgb, rmse_bestxgb, mae_bestxgb = xgb_gridsearch(X_train, y_train, X_test, y_test)
save_model(best_xgb, 'gridsearch_xgb_regressor_model')

Fitting 5 folds for each of 64 candidates, totalling 320 fits
Best parameters found: {'colsample_bytree': 1.0, 'gamma': 0, 'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'subsample': 0.8}
Best score: -17231.57588368211
r^2: 0.2656537890434265
Root Mean Squared Error (RMSE): 131.34159848536203
Mean Absolute Error (MAE): 92.89037579193193


## Classification

In [15]:
X_train, y_train, X_test, y_test = prepross_class()

In [16]:
#dataframe_info(pd.DataFrame(X_train))

In [17]:
# Entraînement modèle KNN
knn, accuracy_knn, cl_rep_knn, cm_knn = knn_class(X_train, y_train, X_test, y_test)
save_model(knn, 'knn_model')

Accuracy: 0.33918080404837914
              precision    recall  f1-score   support

           0       0.35      0.51      0.42    119701
           1       0.28      0.30      0.29    117113
           2       0.30      0.24      0.27    118583
           3       0.48      0.30      0.37    117286

    accuracy                           0.34    472683
   macro avg       0.35      0.34      0.33    472683
weighted avg       0.35      0.34      0.33    472683


Confusion Matrix:
[[61066 33884 16607  8144]
 [48631 35648 21439 11395]
 [37905 33090 28617 18971]
 [26421 26134 29737 34994]]


In [18]:
# Entraînement modèle DecisionTree
dt, accuracy_dt, cl_rep_dt, cm_dt = decision_tree_class(X_train, y_train, X_test, y_test)
save_model(dt, 'decisiontree_model')

Accuracy: 0.39044348961143094
              precision    recall  f1-score   support

           0       0.49      0.47      0.48    119701
           1       0.32      0.33      0.33    117113
           2       0.32      0.33      0.33    118583
           3       0.43      0.43      0.43    117286

    accuracy                           0.39    472683
   macro avg       0.39      0.39      0.39    472683
weighted avg       0.39      0.39      0.39    472683


Confusion Matrix:
[[56635 30984 18258 13824]
 [29414 38617 29408 19674]
 [17588 29946 38716 32333]
 [13075 20434 33189 50588]]


In [19]:
# Entraînement modèle Random Forest
rf, accuracy_rf, cl_rep_rf, cm_rf = random_forest_class(X_train, y_train, X_test, y_test)
save_model(rf, 'randomforest_model')

Accuracy: 0.41273326944273436
              precision    recall  f1-score   support

           0       0.43      0.60      0.50    119701
           1       0.33      0.28      0.30    117113
           2       0.34      0.25      0.29    118583
           3       0.52      0.52      0.52    117286

    accuracy                           0.41    472683
   macro avg       0.40      0.41      0.40    472683
weighted avg       0.40      0.41      0.40    472683


Confusion Matrix:
[[71236 27293 12799  8373]
 [46051 33111 22267 15684]
 [30382 25320 30164 32717]
 [18896 15476 22333 60581]]


In [20]:
# Entraînement modèle XGBoost
xgb_class, accuracy_xgb_class, cl_rep_xgb_class, cm_xgb_class = xgb_class(X_train, y_train, X_test, y_test)
save_model(xgb_class, 'xgb_classifier_model')

Accuracy: 0.41194204149504
              precision    recall  f1-score   support

           0       0.40      0.72      0.52    119701
           1       0.33      0.17      0.22    117113
           2       0.35      0.22      0.27    118583
           3       0.51      0.54      0.52    117286

    accuracy                           0.41    472683
   macro avg       0.40      0.41      0.38    472683
weighted avg       0.40      0.41      0.38    472683


Confusion Matrix:
[[86212 14063 10233  9193]
 [60778 19405 20341 16589]
 [41170 16315 26094 35004]
 [26171  9802 18306 63007]]


In [21]:
# Entraînement modèle RandomForest GridSearch
rfgs, best_params_rf, accuracy_rf, cl_rep_rf, cm_rf  = random_forest_gridsearch(X_train, y_train, X_test, y_test)
save_model(rfgs, 'gridsearch_randomforest_model')

Fitting 5 folds for each of 72 candidates, totalling 360 fits
Best parameters found:  {'bootstrap': False, 'max_depth': 30, 'min_samples_split': 10, 'n_estimators': 200}
Accuracy: 0.4245551458376967
              precision    recall  f1-score   support

           0       0.44      0.64      0.52    119701
           1       0.34      0.26      0.30    117113
           2       0.35      0.25      0.30    118583
           3       0.52      0.54      0.53    117286

    accuracy                           0.42    472683
   macro avg       0.41      0.42      0.41    472683
weighted avg       0.41      0.42      0.41    472683


Confusion Matrix:
[[76333 23386 11481  8501]
 [48276 30838 22002 15997]
 [31127 23249 30084 34123]
 [18942 13665 21254 63425]]


In [22]:
# Entraînement modèle XGBoost GridSearch
xgb_class_gs, best_params_xgb_class_gs, accuracy_xgb_class_gs, cl_rep_xgb_class_gs, cm_xgb_class_gs = xgb_class_gridsearch(X_train, y_train, X_test, y_test)
save_model(xgb_class_gs, 'gridsearch_xgb_classifier_model')

Fitting 5 folds for each of 432 candidates, totalling 2160 fits
Meilleurs paramètres trouvés :  {'colsample_bytree': 1.0, 'gamma': 0.1, 'learning_rate': 0.2, 'max_depth': 6, 'n_estimators': 200, 'subsample': 0.8}
Accuracy: 0.4841341871825304
              precision    recall  f1-score   support

           0       0.52      0.74      0.61    119701
           1       0.40      0.33      0.36    117113
           2       0.41      0.33      0.37    118583
           3       0.56      0.54      0.55    117286

    accuracy                           0.48    472683
   macro avg       0.47      0.48      0.47    472683
weighted avg       0.47      0.48      0.47    472683


Confusion Matrix:
[[88001 18700  7387  5613]
 [42303 38387 23934 12489]
 [21989 26677 39484 30433]
 [15403 13110 25803 62970]]


## Deep Learning

In [23]:
X_train, y_train, X_test, y_test = prepross_class()

In [24]:
# Entraînement d'un modèle fully connected de deep learning
dense1, dense1_history, dense1_loss, dense1_accuracy, dense1_cnf_matrix = deep_learning_dense(X_train, y_train, X_test, y_test)
save_model(dense1, 'dense_fullyconnected_model')

Epoch 1/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 2ms/step - accuracy: 0.3693 - loss: 1.2891 - val_accuracy: 0.3970 - val_loss: 1.2479
Epoch 2/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 2ms/step - accuracy: 0.3996 - loss: 1.2445 - val_accuracy: 0.4040 - val_loss: 1.2356
Epoch 3/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 2ms/step - accuracy: 0.4039 - loss: 1.2370 - val_accuracy: 0.4058 - val_loss: 1.2337
Epoch 4/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 2ms/step - accuracy: 0.4040 - loss: 1.2314 - val_accuracy: 0.4063 - val_loss: 1.2303
Epoch 5/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 2ms/step - accuracy: 0.4066 - loss: 1.2280 - val_accuracy: 0.4050 - val_loss: 1.2322
Epoch 6/100
[1m6250/6250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 2ms/step - accuracy: 0.4065 - loss: 1.2269 - val_accuracy: 0.4080 - val_loss: 1.2287
Epoc

In [None]:
# Entraînement d'un modèle dense amélioré avec des dropout et du batching normalization
dense2, dense2_history, dense2_loss, dense2_accuracy, dense2_cnf_matrix = deep_learning_improved(X_train, y_train, X_test, y_test)
save_model(dense2, 'dense_improved_model')