### Modèle ML — prédiction du risque de panne dans les 7 prochains jours.
L’objectif : montrer une valeur prédictive du projet et donner un vrai impact data-driven pour l’atelier.

Import des librairies 

In [10]:
import pandas as pd
import numpy as np
from datetime import timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import joblib   

In [11]:
# Charger dataset
df = pd.read_csv("maintenance_events.csv", parse_dates=["failure_date"])
machines = df["machine"].unique()

ml_data = []

for machine in machines:
    data = df[df["machine"] == machine].sort_values("failure_date")
    for i, row in data.iterrows():
        # Fenêtre 7 jours avant la panne
        start_window = row["failure_date"] - timedelta(days=7)
        window = data[(data["failure_date"] >= start_window) & (data["failure_date"] < row["failure_date"])]
        
        avg_downtime = window["downtime_h"].mean() if not window.empty else 0
        avg_cost = window["cost_eur"].mean() if not window.empty else 0
        avg_scrap = window["scrap_units"].mean() if not window.empty else 0
        days_since_last = (row["failure_date"] - window["failure_date"].max()).days if not window.empty else 7
        
        # Label = panne (1)
        ml_data.append({
            "machine": machine,
            "avg_downtime_7d": avg_downtime,
            "avg_cost_7d": avg_cost,
            "avg_scrap_7d": avg_scrap,
            "days_since_last": days_since_last,
            "label_next_7d": 1
        })
        
        # Label = pas de panne (0) pour équilibrer
        ml_data.append({
            "machine": machine,
            "avg_downtime_7d": avg_downtime * 0.5,
            "avg_cost_7d": avg_cost * 0.5,
            "avg_scrap_7d": avg_scrap * 0.5,
            "days_since_last": days_since_last + 2,
            "label_next_7d": 0
        })

ml_df = pd.DataFrame(ml_data)


Features / target

In [12]:
X = ml_df[["avg_downtime_7d", "avg_cost_7d", "avg_scrap_7d", "days_since_last"]]
y = ml_df["label_next_7d"]

# Séparation train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)



Modèle simple RandomForest

In [None]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print("✅ Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


✅ Accuracy: 0.896551724137931
              precision    recall  f1-score   support

           0       1.00      0.80      0.89        15
           1       0.82      1.00      0.90        14

    accuracy                           0.90        29
   macro avg       0.91      0.90      0.90        29
weighted avg       0.91      0.90      0.90        29



Sauvegarder modèle pour Streamlit (optionnel)

In [14]:
import joblib
joblib.dump(clf, "rf_model_maintenance.pkl")
print("✅ Modèle sauvegardé : rf_model_maintenance.pkl")

✅ Modèle sauvegardé : rf_model_maintenance.pkl


Ce faisant nous sommes venus à ;

-Transformer les événements en features temporelles.

-Entraîner un Random Forest pour prédire une panne dans les 7 prochains jours.

-Évaluer rapidement le modèle avec accuracy et classification_report.

-Sauvegarder le modèle (rf_model_maintenance.pkl) pour l’utiliser ensuite dans Streamlit.