# Model Evaluation

    ## Model Evaluation
    This notebook evaluates the classification models for voltage sag detection and fault detection using various performance metrics.
    
    Steps:
    1. Evaluate Random Forest model for voltage sag detection.
    2. Evaluate Isolation Forest and Autoencoders for fault detection.
    3. Compare accuracy, precision, recall, and F1 score.
    

In [None]:
Let's move forward with **Notebook 5: Model Evaluation**, where we will evaluate various classification models for **voltage sag detection** and **fault detection** using performance metrics such as **accuracy**, **precision**, **recall**, and **F1 score**.

Here’s how we will structure **Notebook 5**:

---

### **Notebook 5: Model Evaluation**

In this notebook, we will evaluate the performance of classification models, including **Random Forest** for **voltage sag detection**, and **Isolation Forest** and **Autoencoders** for **fault detection**. We will use performance metrics such as **accuracy**, **precision**, **recall**, and **F1 score** to assess each model.

### **Steps**:

1. **Evaluate Random Forest Model for Voltage Sag Detection**:

   * Train a **Random Forest classifier** on the dataset to detect **voltage sag**.
   * Evaluate the model using performance metrics.

2. **Evaluate Isolation Forest and Autoencoders for Fault Detection**:

   * Train and evaluate **Isolation Forest** and **Autoencoders** for detecting faults in the power grid.

3. **Compare Models**:

   * Compare the **Random Forest**, **Isolation Forest**, and **Autoencoders** using performance metrics: **accuracy**, **precision**, **recall**, and **F1 score**.

---

### **Markdown Cell for Notebook 5**:

```markdown
# Model Evaluation

This notebook evaluates the classification models used for **voltage sag detection** and **fault detection**. We will use various performance metrics to evaluate the models and compare their effectiveness in detecting grid issues.

### Steps:
1. **Evaluate Random Forest model** for **voltage sag detection**.
2. **Evaluate Isolation Forest and Autoencoders** for **fault detection**.
3. **Compare accuracy**, **precision**, **recall**, and **F1 score** for each model.

The objective is to understand how well the models perform and identify which model is most effective for detecting voltage sags and faults in the grid.
```

---

### **Code Cell for Model Evaluation**:

```python
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import IsolationForest
from keras.models import Model
from keras.layers import Input, Dense
from sklearn.preprocessing import MinMaxScaler

# Load the preprocessed dataset
df = pd.read_csv('preprocessed_power_consumption.csv', index_col='Timestamp')

# Select relevant features for classification
features = ['Voltage', 'Global_active_power', 'Global_intensity']
X = df[features]
y = df['Voltage_State'].apply(lambda x: 1 if x == 'low_voltage' else 0)  # Label voltage sag (low_voltage = 1, others = 0)

# **1. Random Forest Model for Voltage Sag Detection**
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X, y)  # Train Random Forest model
y_pred_rf = rf_model.predict(X)  # Predict using the trained model

# **2. Evaluate Random Forest for Voltage Sag Detection**
print("\nRandom Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_rf):.4f}")
print(f"Precision: {precision_score(y, y_pred_rf):.4f}")
print(f"Recall: {recall_score(y, y_pred_rf):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_rf):.4f}")
print("\nClassification Report for Random Forest:\n", classification_report(y, y_pred_rf))

# **3. Isolation Forest for Fault Detection**
# Initialize Isolation Forest model
model_iforest = IsolationForest(n_estimators=200, contamination=0.05, random_state=42)
model_iforest.fit(X)  # Train Isolation Forest model
y_pred_iforest = model_iforest.predict(X)  # Predict anomalies
y_pred_iforest = np.where(y_pred_iforest == 1, 0, 1)  # Convert 1 (normal) to 0, and -1 (anomaly) to 1

# **4. Autoencoder for Fault Detection**
# Normalize the data before passing to Autoencoder
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Define Autoencoder model structure
input_layer = Input(shape=(X_scaled.shape[1],))
encoded = Dense(128, activation='relu')(input_layer)  # Encoder layer
decoded = Dense(X_scaled.shape[1], activation='sigmoid')(encoded)  # Decoder layer (reconstruction)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# Train Autoencoder model
autoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=256, shuffle=True, validation_data=(X_scaled, X_scaled))

# **5. Anomaly detection with Autoencoders**
reconstructed = autoencoder.predict(X_scaled)  # Reconstructed data from Autoencoder
reconstruction_error = np.mean(np.abs(reconstructed - X_scaled), axis=1)  # Reconstruction error

# Set threshold for anomaly detection (top 5% reconstruction errors)
threshold = np.percentile(reconstruction_error, 95)
y_pred_autoencoder = (reconstruction_error > threshold).astype(int)  # 1 for anomaly, 0 for normal

# **6. Evaluate Fault Detection Models**
# Evaluate Isolation Forest
print("\nIsolation Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_iforest):.4f}")
print(f"Precision: {precision_score(y, y_pred_iforest):.4f}")
print(f"Recall: {recall_score(y, y_pred_iforest):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_iforest):.4f}")
print("\nClassification Report for Isolation Forest:\n", classification_report(y, y_pred_iforest))

# Evaluate Autoencoder
print("\nAutoencoder Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_autoencoder):.4f}")
print(f"Precision: {precision_score(y, y_pred_autoencoder):.4f}")
print(f"Recall: {recall_score(y, y_pred_autoencoder):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_autoencoder):.4f}")
print("\nClassification Report for Autoencoder:\n", classification_report(y, y_pred_autoencoder))

# **Visualize Comparison (optional)**
plt.figure(figsize=(12, 6))

# Plot Random Forest
plt.subplot(1, 3, 1)
plt.bar(np.arange(len(y)), y_pred_rf, label='Random Forest Prediction', alpha=0.7)
plt.title("Random Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Isolation Forest
plt.subplot(1, 3, 2)
plt.bar(np.arange(len(y)), y_pred_iforest, label='Isolation Forest Prediction', alpha=0.7)
plt.title("Isolation Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Autoencoder
plt.subplot(1, 3, 3)
plt.bar(np.arange(len(y)), y_pred_autoencoder, label='Autoencoder Prediction', alpha=0.7)
plt.title("Autoencoder Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

plt.tight_layout()
plt.show()
```

---

### **Explanation of the Notebook**:

1. **Random Forest for Voltage Sag Detection**:

   * We train a **Random Forest** classifier to detect **voltage sag** (e.g., `low_voltage` condition) and evaluate it using performance metrics like **accuracy**, **precision**, **recall**, and **F1 score**.

2. **Isolation Forest for Fault Detection**:

   * **Isolation Forest** is used for **fault detection**, identifying anomalies in the grid data. We evaluate the model using **classification metrics**.

3. **Autoencoder for Fault Detection**:

   * **Autoencoders** are used to detect **faults** based on **reconstruction error**. The model is trained to reconstruct the input data, and large reconstruction errors indicate anomalies.

4. **Evaluation**:

   * The performance of each model (Random Forest, Isolation Forest, Autoencoder) is evaluated using **accuracy**, **precision**, **recall**, and **F1 score**.

5. **Visualization**:

   * The predictions from **Random Forest**, **Isolation Forest**, and **Autoencoders** are visualized for comparison.

---

### **Next Steps**:

1. **Run the notebook** to evaluate the performance of the **Random Forest**, **Isolation Forest**, and **Autoencoders** models.
2. **Analyze the evaluation results** and see which model performs best for **voltage sag detection** and **fault detection**.
3. **Compare the models** based on the **performance metrics** and choose the best one for deployment.

Let me know if you'd like help with any part of this notebook, or if you'd like to continue running it for model evaluation!
