# ‚ú®üåç **Classification des ordures** üåç‚ú®

## üßë‚Äçü§ù‚Äçüßë Membres du groupe

1. üë§ FATOU MBOUP
2. üë§ FATOU KIN√â NDIAYE
3. üë§ BINTOU TENNING NGOM

## üìÑ‚ú® Description du projet

Ce projet a pour objectif de classifier les ordures en diff√©rentes cat√©gories telles que plastique, papier, m√©tal, d√©chets, verre, et carton. Nous essaierons de d√©velopper un mod√®le de classification pr√©cis et efficace.

## üìÇüîó **Montage de Google Drive**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

In [None]:
cd /content/drive/MyDrive/Cours/Informatique/Dic3/Cours/MLOPS/Projet_MLOps/MLOPS_Project

In [None]:
pwd

## üì¶‚ú® Installation des Biblioth√®ques pour Colab

In [None]:
#For colab
!pip install pyngrok

#For Colab
!pip install pendulum
#For Colab
!pip install ydata_profiling
!pip install mlflow
#For Colab
!pip install loguru


## üì•üîß Importation des Modules

In [None]:
# Magics
%reload_ext autoreload
%autoreload 2

# Importations de biblioth√®ques Python
import cv2
import matplotlib.pyplot as plt
import mlflow
import numpy as np
import os
import pandas as pd
import pendulum
import random
import sys
import tensorflow as tf
from collections import Counter
from datetime import datetime
from loguru import logger
from pathlib import Path
from pyngrok import ngrok
from sklearn.metrics import accuracy_score, auc, log_loss, precision_score, recall_score, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, label_binarize, LabelEncoder
from sklearn.utils import shuffle
from tensorflow.keras.applications import MobileNetV2, VGG16, DenseNet121
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, BatchNormalization, Dropout
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import Accuracy, AUC, Precision, Recall, SparseCategoricalAccuracy
from tensorflow.keras.models import Model, save_model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
from torch.utils.data import DataLoader, TensorDataset
from torchvision import models
from tqdm import tqdm

# Importations de mlflow
import mlflow.keras
from mlflow import log_artifacts, log_metric, log_param
from mlflow.models import infer_signature
from mlflow.tracking import MlflowClient
import mlflow.pytorch  # Importer mlflow.pytorch

# Importations des modules personnalis√©s
from settings.config import *
from src.make_dataset import *
from src.make_model_mobilenet import *
from src.make_model_vgg16 import *
from src.make_model_densenet import *

# Importations de PyTorch

import torch
import torch.nn as nn
import torch.optim as optim


## üìäüìñ Chargement et Pr√©traitement des donn√©es

### üîó‚ú® Settings ‚ú®üîó

In [None]:
# Set logging format
log_fmt = "<green>{time:YYYY-MM-DD HH:mm:ss.SSS!UTC}</green> | <level>{level: <8}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> - {message}"
logger.configure(handlers=[{"sink": sys.stderr, "format": log_fmt}])

# current data
CURRENT_DATE = pendulum.now(tz="UTC")

# directories
PROJECT_DIR = Path.cwd().parent
REPORTS_DIR = Path(PROJECT_DIR, "reports")

logger.info(f"\nProject directory: {PROJECT_DIR} \nReports dir: {REPORTS_DIR}")

### üîã‚ú® Load Data ‚ú®üîã

In [None]:
base_dir = 'Garbage_classification'

In [None]:
imgs_data,labels_data, class_names = load_data(base_dir)

In [None]:
class_names

üì∏‚ú® Visualization & Preprocessing ‚ú®üì∏

### üìñ‚ú® Visualization ‚ú®üìñ

In [None]:
visualize_class_distribution(labels_data, class_names)

In [None]:
plot_images_from_subfolders(base_dir)

In [None]:
img_dimensions(base_dir)

### üìä‚ú® Normalization & Encoding ‚ú®üìä

In [None]:
processed_dir = "/content/drive/MyDrive/Cours/Informatique/Dic3/Cours/MLOPS/Projet_MLOps/processed_images"
data, labels = process_dataset(base_dir, processed_dir)

In [None]:
img_dimensions(processed_dir)

In [None]:
plot_images_from_subfolders(processed_dir)

In [None]:
# Convertit la liste data en un tableau NumPy.
# Sp√©cifie le type de donn√©es comme "float32" pour la pr√©cision et l'efficacit√©
# Divise toutes les valeurs par 255.0, normalisant ainsi les pixels dans la plage [0, 1]
data = np.array(data, dtype="float32") / 255.0
#Convertit la liste labels en un tableau NumPy pour un traitement ult√©rieur efficace.
labels = np.array(labels)
# Cr√©e une instance de LabelBinarizer de scikit-learn qui va encoder les √©tiquettes textuelles en format num√©rique
mlb = LabelBinarizer()
labels = mlb.fit_transform(labels)
#Affiche la premi√®re √©tiquette encod√©e pour v√©rification.
print(labels[0])

### üìà‚ú®Apply augmentation to the dataset ‚ú®üìà

In [None]:
final_imgs_data, final_labels_data = increase_dataset(data, labels)

In [None]:
print("Size before augmentation : ",data.shape[0])
print("Size After augmentation : ",final_imgs_data.shape[0])

### ‚úÇÔ∏è‚ú® Split Data ‚ú®‚úÇÔ∏è

In [None]:
x_train, x_val, x_test, y_train, y_val, y_test = split_data(final_imgs_data, final_labels_data)

In [None]:
# Utilisation de la fonction
classes = np.arange(labels.shape[1])

# V√©rifiez les distributions des classes dans chaque ensemble
train_class_distribution = check_class_distribution(y_train, classes)
val_class_distribution = check_class_distribution(y_val, classes)
test_class_distribution = check_class_distribution(y_test, classes)

print("Distribution des classes dans l'ensemble d'entra√Ænement :", train_class_distribution)
print("Distribution des classes dans l'ensemble de validation :", val_class_distribution)
print("Distribution des classes dans l'ensemble de test :", test_class_distribution)

## üîã‚ú® Chargement et Config pour les mod√®les ‚ú®üîã

### üìà‚ú® D√©finition de eval_metrics ‚ú®üìà

In [None]:
def eval_metrics(y_true, y_pred, y_pred_proba):
    """
    Calcule les m√©triques pour une classification multiclasse.

    param y_true: Les vraies √©tiquettes
    param y_pred: Les pr√©dictions du mod√®le
    param y_pred_proba: Les probabilit√©s pr√©dites par le mod√®le
    return: Un dictionnaire contenant les m√©triques calcul√©es
    """
    # Accuracy
    accuracy = accuracy_score(y_true, y_pred)

    # Precision, Recall
    # y_true et y_pred sont les labels r√©els et pr√©dits respectivement
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')

    # Log Loss
    logloss = log_loss(y_true, y_pred_proba)

    # ROC AUC (one-vs-rest)
    n_classes = y_pred_proba.shape[1]
    y_true_bin = label_binarize(y_true, classes=range(n_classes))
    roc_auc = {}
    for i in range(n_classes):
        fpr, tpr, _ = roc_curve(y_true_bin[:, i], y_pred_proba[:, i])
        roc_auc[i] = auc(fpr, tpr)

    # Moyenne des ROC AUC
    mean_roc_auc = np.mean(list(roc_auc.values()))

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'log_loss': logloss,
        'roc_auc': roc_auc,
        'mean_roc_auc': mean_roc_auc
    }


### ‚öôÔ∏è‚ú® D√©finition des params des mod√®les ‚ú®‚öôÔ∏è

#### üìñ‚ú® Define models and parameters to benchmark ‚ú®üìñ

In [None]:
# Define models and parameters to benchmark
ESTIMATOR_PARAMS = {
                VGG16.__name__: {"estimator": VGG16,
                                 "params": VGG16_CONFIG
                                     },

                MobileNetV2.__name__: {"estimator": MobileNetV2,
                                       "params": MOBILENETV2_CONFIG
                                           }
                    ,
                DenseNet121.__name__: {"estimator": DenseNet121, "params": DENSENET_CONFIG
                                                      }
}

ESTIMATOR_PARAMS

In [None]:
for model_name, model_configs in ESTIMATOR_PARAMS.items():
    estimator = model_configs["estimator"]
    params = model_configs["params"]

    print(f"Model Name: {model_name}")
    print(f"Estimator: {estimator}")
    print(f"Params: {params}")
    print("=" * 40)

### üîÜ‚ú® Config For MLFLOW ‚ú®üîÜ

In [None]:
# Set the tracking server to be localhost with sqlite as tracking store
local_registry = "sqlite:///mlruns.db"
print(f"Running local model registry={local_registry}")
mlflow.set_tracking_uri(local_registry)

#### Create an experiment

In [None]:
# Create an experiment if not exists
exp_name = "garbage_classification"
experiment = mlflow.get_experiment_by_name(exp_name)
if not experiment:
    experiment_id = mlflow.create_experiment(exp_name)
else:
    experiment_id = experiment.experiment_id

logger.info(f"Experience id: {experiment_id}")


In [None]:
# Cr√©er et configurer le mod√®le
def run_experiment(experiment_id):
    global x_train, x_val, x_test, y_train, y_val, y_test

    for model_name, model_configs in ESTIMATOR_PARAMS.items():
        logger.info(f"{model_name} \n{model_configs}")

        #estimator = model_configs["estimator"]
        params = model_configs["params"]
        print("params['epochs'] : ",params['epochs'])
        with mlflow.start_run(run_name=f"garbage_classification_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
                             experiment_id=experiment_id,
                              tags={"version": "v1", "priority": "P1"},
                              description="garbage classification modeling",) as run:
            # Enregistre le param√®tre du mod√®le (si applicable)
            mlflow.log_param("model", model_name)
            #mlflow.log_param("input_shape", params["input_shape"])
            epoch_number=params["epochs"]
            mlflow.log_params(params)

            if model_name == "DenseNet121":
                DenseNet_model_path="registered_models"
                # DenseNet
                model, x_train, x_val, x_test, y_train, y_val, y_test = build_densenet_model(DENSENET_CONFIG, data, labels)
                # Utilisation de la fonction pour entra√Æner le mod√®le et r√©cup√©rer l'historique et les param√®tres
                history, training_params = train_densenet_model(model, DENSENET_CONFIG, x_train, y_train, x_val, y_val,model_path=DenseNet_model_path)

                # Utilisation de la fonction
                y_pred_proba = model.predict(x_test)  # Probabilit√©s pr√©dites
                y_pred = np.argmax(y_pred_proba, axis=1)  # Classes pr√©dites

                if y_test.ndim == 2:
                    y_test2 = np.argmax(y_test, axis=1)

                # Appeler la fonction eval_metrics
                metrics = eval_metrics(y_test2, y_pred, y_pred_proba)

                # Enregistrer le mod√®le avec la signature
                signature = infer_signature(x_test, y_pred_proba)
                #mlflow.keras.log_model(model, artifact_path="model", signature=signature)
                mlflow.keras.log_model(model, artifact_path="model", signature=signature, registered_model_name=model_name)

            # Pour MobileNetV2
            elif model_name == "MobileNetV2":

                patience=params["patience"]

                # Cr√©er le mod√®le
                model, params = create_mobilenetv2_model(params["input_shape"], params["num_classes"], params["learning_rate"])

                # Entra√Æner le mod√®le
                history = train_mobilenetv2_model(model, x_train, y_train, x_val, y_val, epoch_number, patience)

                # Tester le mod√®le
                results, y_pred_proba = test_mobilenetv2_model(model, x_test, y_test,class_names)


                # Extraire les classes pr√©dites et r√©elles
                y_pred = [class_names.index(pred) for pred, _ in results]
                y_true = [class_names.index(real) for _, real in results]

                # √âvaluer les m√©triques du mod√®le
                metrics = eval_metrics(y_true, y_pred, y_pred_proba)

                # Enregistrer le mod√®le avec la signature
                signature = infer_signature(x_test, y_pred_proba)
                #mlflow.keras.log_model(model, artifact_path="model", signature=signature)
                mlflow.keras.log_model(model, artifact_path="model", signature=signature, registered_model_name=model_name)

            elif model_name == "VGG16":
                # VGG16 example
                model = create_vgg16_model(VGG16_CONFIG)

                train_loader, val_loader, test_loader = create_data_loaders_for_vgg16(x_train, x_val, x_test, y_train, y_val, y_test,VGG16_CONFIG)
                trained_vgg16 = train_vgg16_model(model, train_loader, val_loader, VGG16_CONFIG)

                # Utilisation de la fonction
                y_pred_proba,metrics = test_model_vgg16(trained_vgg16, test_loader)
                #print("Metrics on the test set:")
                #print(metrics)
                # Enregistrer le mod√®le avec la signature
                signature = infer_signature(x_test, y_pred_proba)
                # mlflow.pytorch.log_model pour les mod√®les PyTorch
                mlflow.pytorch.log_model(model, artifact_path="model", signature=signature, registered_model_name=model_name)

            # Log des m√©triques
            mlflow.log_metric("test_accuracy", metrics['accuracy'])
            mlflow.log_metric("test_precision", metrics['precision'])
            mlflow.log_metric("test_recall", metrics['recall'])
            mlflow.log_metric("test_log_loss", metrics['log_loss'])
            mlflow.log_metric("test_mean_roc_auc", metrics['mean_roc_auc'])

            # Log des ROC AUC pour chaque classe
            for class_index, class_roc_auc in metrics['roc_auc'].items():
                mlflow.log_metric(f"test_roc_auc_class_{class_index}", class_roc_auc)

            if not os.path.exists("outputs"):
                os.makedirs("outputs")
            with open("outputs/test.txt", "w") as f:
                f.write("Looks, like I logged to the local store!")
            log_artifacts("outputs")
            shutil.rmtree('outputs')

            print(f"Model run logged to MLflow with run_id: {run.info.run_id}")


In [None]:
# Ex√©cuter l'exp√©rience
run_experiment(experiment_id)

### üßëüèΩ‚Äçüíª‚ú® Access to the MLFLOW UI ‚ú®üßëüèΩ‚Äçüíª

In [None]:
# run tracking UI in the background
get_ipython().system_raw("mlflow ui --backend-store-uri sqlite:///mlruns.db --port 5000 &")# run tracking UI in the background

In [None]:

# Terminate open tunnels if they exist
ngrok.kill()

# Setting the authtoken (optional)
NGROK_AUTH_TOKEN = "2kD0rcbkl66GfLcWrG5GeU0GZ1w_7x9onfXmmEUvwCS3pBWs6"  # Replace with your actual ngrok auth token
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
public_url = ngrok.connect(5000, bind_tls=True)
print("MLflow Tracking UI:", public_url)
