# Predictive Maintenance – Deutsche Bahn
## Erweiterte Version mit logistischer Regression & Random Forest
Dieses Notebook enthält:
- Laden des Datensatzes
- EDA
- Preprocessing
- Logistische Regression
- Random Forest
- Modellvergleich (Accuracy, Precision, Recall, F1, ROC-AUC)


## 1. Datensatz laden

In [None]:

import pandas as pd

df = pd.read_csv("predictive_maintenance_db.csv")
df.head()


## 2. Basisanalyse

In [None]:
df.info()

In [None]:
df.describe()

## 3. Preprocessing

In [None]:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

X = df.drop("failure_within_30d", axis=1)
y = df["failure_within_30d"]

categorical = ["component_type", "weekday"]
numeric = [col for col in X.columns if col not in categorical]

preprocess = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical),
        ("num", "passthrough", numeric)
    ]
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)


## 4. Logistische Regression

In [None]:

log_reg = Pipeline(steps=[
    ("preprocess", preprocess),
    ("lr", LogisticRegression(max_iter=200, class_weight='balanced'))
])

log_reg.fit(X_train, y_train)
y_pred_lr = log_reg.predict(X_test)
y_proba_lr = log_reg.predict_proba(X_test)[:, 1]

print("### Logistische Regression ###")
print(classification_report(y_test, y_pred_lr))
print("ROC-AUC:", roc_auc_score(y_test, y_proba_lr))


## 5. Random Forest Modell

In [None]:

rf = Pipeline(steps=[
    ("preprocess", preprocess),
    ("rf", RandomForestClassifier(n_estimators=200, random_state=42, class_weight='balanced'))
])

rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
y_proba_rf = rf.predict_proba(X_test)[:, 1]

print("### Random Forest ###")
print(classification_report(y_test, y_pred_rf))
print("ROC-AUC:", roc_auc_score(y_test, y_proba_rf))


## 6. Modellvergleich

In [None]:

print("Vergleich ROC-AUC:")
print("Logistische Regression:", roc_auc_score(y_test, y_proba_lr))
print("Random Forest:", roc_auc_score(y_test, y_proba_rf))
