# Model Evaluation

    ## Model Evaluation
    This notebook evaluates the classification models for voltage sag detection and fault detection using various performance metrics.
    
    Steps:
    1. Evaluate Random Forest model for voltage sag detection.
    2. Evaluate Isolation Forest and Autoencoders for fault detection.
    3. Compare accuracy, precision, recall, and F1 score.
    

In [13]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import IsolationForest
from keras.models import Model
from keras.layers import Input, Dense
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.optimizers import Adam

In [4]:
df.columns

Index(['Global_active_power', 'Global_reactive_power', 'Voltage',
       'Global_intensity', 'Sub_metering_1', 'Sub_metering_2',
       'Sub_metering_3'],
      dtype='object')

In [5]:
# Function to categorize voltage into discrete levels
def categorize_voltage(voltage):
    if voltage < 225:
        return 'low_voltage'  # Category for low voltage
    elif voltage <= 240:
        return 'normal_voltage'  # Category for normal voltage
    else:
        return 'high_voltage'  # Category for high voltage

In [6]:
# Load the preprocessed dataset
df = pd.read_csv('preprocessed_power_consumption.csv', index_col='Timestamp')

# Select relevant features for classification
features = ['Voltage', 'Global_active_power', 'Global_intensity']
X = df[features]



# Apply categorization to the 'Voltage' column and create a new column 'Voltage_State'
df['Voltage_State'] = df['Voltage'].apply(categorize_voltage)

y = df['Voltage_State'].apply(lambda x: 1 if x == 'low_voltage' else 0)  # Label voltage sag (low_voltage = 1, others = 0)


In [None]:
# **1. Random Forest Model for Voltage Sag Detection**
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X, y)  # Train Random Forest model
y_pred_rf = rf_model.predict(X)  # Predict using the trained model


In [None]:
# **2. Evaluate Random Forest for Voltage Sag Detection**
print("\nRandom Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_rf):.4f}")
print(f"Precision: {precision_score(y, y_pred_rf):.4f}")
print(f"Recall: {recall_score(y, y_pred_rf):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_rf):.4f}")
print("\nClassification Report for Random Forest:\n", classification_report(y, y_pred_rf))

In [None]:
# **3. Isolation Forest for Fault Detection**
# Initialize Isolation Forest model
model_iforest = IsolationForest(n_estimators=200, contamination=0.05, random_state=42)
model_iforest.fit(X)  # Train Isolation Forest model
y_pred_iforest = model_iforest.predict(X)  # Predict anomalies
y_pred_iforest = np.where(y_pred_iforest == 1, 0, 1)  # Convert 1 (normal) to 0, and -1 (anomaly) to 1


In [7]:
# **4. Autoencoder for Fault Detection**
# Normalize the data before passing to Autoencoder
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

'''# Define Autoencoder model structure
input_layer = Input(shape=(X_scaled.shape[1],))
encoded = Dense(128, activation='relu')(input_layer)  # Encoder layer
decoded = Dense(X_scaled.shape[1], activation='sigmoid')(encoded)  # Decoder layer (reconstruction)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# Train Autoencoder model
autoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=256, shuffle=True, validation_data=(X_scaled, X_scaled))'''


"# Define Autoencoder model structure\ninput_layer = Input(shape=(X_scaled.shape[1],))\nencoded = Dense(128, activation='relu')(input_layer)  # Encoder layer\ndecoded = Dense(X_scaled.shape[1], activation='sigmoid')(encoded)  # Decoder layer (reconstruction)\n\nautoencoder = Model(input_layer, decoded)\nautoencoder.compile(optimizer='adam', loss='mean_squared_error')\n\n# Train Autoencoder model\nautoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=256, shuffle=True, validation_data=(X_scaled, X_scaled))"

In [8]:
CODE_DIM = 2
INPUT_SHAPE = X_scaled.shape[1]

input_layer = Input(shape=(INPUT_SHAPE,))
x = Dense(64, activation='relu')(input_layer)
x = Dense(16, activation='relu')(x)
code = Dense(CODE_DIM, activation='relu')(x)
x = Dense(16, activation='relu')(code)
x = Dense(64, activation='relu')(x)
output_layer = Dense(INPUT_SHAPE, activation='relu')(x)

autoencoder = Model(input_layer, output_layer, name='anomaly')

2025-05-09 19:22:02.955786: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [11]:
model_name = "anomaly.weights.h5"
checkpoint = ModelCheckpoint(model_name,
                            monitor="val_loss",
                            mode="min",
                            save_best_only = True,
                            save_weights_only=True,
                            verbose=1)
earlystopping = EarlyStopping(monitor='val_loss',
                              min_delta = 0, 
                              patience = 5, 
                              verbose = 1,
                              restore_best_weights=True)

callbacks = [checkpoint, earlystopping]

In [14]:
autoencoder.compile(loss='mae',
                    optimizer=Adam())

In [15]:
history = autoencoder.fit(X_scaled, X_scaled,
                          epochs=25, batch_size=64,
                          validation_data=(X_scaled, X_scaled),
                          callbacks=callbacks, shuffle=True)

Epoch 1/25
[1m32418/32426[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - loss: 0.0632
Epoch 1: val_loss improved from inf to 0.00134, saving model to anomaly.weights.h5
[1m32426/32426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m382s[0m 12ms/step - loss: 0.0632 - val_loss: 0.0013
Epoch 2/25
[1m32426/32426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 195ms/step - loss: 0.0017
Epoch 2: val_loss did not improve from 0.00134
[1m32426/32426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6419s[0m 198ms/step - loss: 0.0017 - val_loss: 0.0014
Epoch 3/25
[1m32421/32426[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - loss: 0.0015
Epoch 3: val_loss did not improve from 0.00134
[1m32426/32426[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m336s[0m 10ms/step - loss: 0.0015 - val_loss: 0.0016
Epoch 4/25
[1m32425/32426[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - loss: 0.0014
Epoch 4: val_loss improved from 0.00134 to 0.00115,

In [None]:
# **5. Anomaly detection with Autoencoders**
reconstructed = autoencoder.predict(X_scaled)  # Reconstructed data from Autoencoder
reconstruction_error = np.mean(np.abs(reconstructed - X_scaled), axis=1)  # Reconstruction error

# Set threshold for anomaly detection (top 5% reconstruction errors)
threshold = np.percentile(reconstruction_error, 95)
y_pred_autoencoder = (reconstruction_error > threshold).astype(int)  # 1 for anomaly, 0 for normal


In [None]:
# **6. Evaluate Fault Detection Models**
# Evaluate Isolation Forest
print("\nIsolation Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_iforest):.4f}")
print(f"Precision: {precision_score(y, y_pred_iforest):.4f}")
print(f"Recall: {recall_score(y, y_pred_iforest):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_iforest):.4f}")
print("\nClassification Report for Isolation Forest:\n", classification_report(y, y_pred_iforest))


In [None]:
# Evaluate Autoencoder
print("\nAutoencoder Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_autoencoder):.4f}")
print(f"Precision: {precision_score(y, y_pred_autoencoder):.4f}")
print(f"Recall: {recall_score(y, y_pred_autoencoder):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_autoencoder):.4f}")
print("\nClassification Report for Autoencoder:\n", classification_report(y, y_pred_autoencoder))


In [None]:
# **Visualize Comparison (optional)**
plt.figure(figsize=(12, 6))

# Plot Random Forest
plt.subplot(1, 3, 1)
plt.bar(np.arange(len(y)), y_pred_rf, label='Random Forest Prediction', alpha=0.7)
plt.title("Random Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Isolation Forest
plt.subplot(1, 3, 2)
plt.bar(np.arange(len(y)), y_pred_iforest, label='Isolation Forest Prediction', alpha=0.7)
plt.title("Isolation Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Autoencoder
plt.subplot(1, 3, 3)
plt.bar(np.arange(len(y)), y_pred_autoencoder, label='Autoencoder Prediction', alpha=0.7)
plt.title("Autoencoder Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

plt.tight_layout()
plt.show()

In [None]:
Let's move forward with **Notebook 5: Model Evaluation**, where we will evaluate various classification models for **voltage sag detection** and **fault detection** using performance metrics such as **accuracy**, **precision**, **recall**, and **F1 score**.

Here’s how we will structure **Notebook 5**:

---

### **Notebook 5: Model Evaluation**

In this notebook, we will evaluate the performance of classification models, including **Random Forest** for **voltage sag detection**, and **Isolation Forest** and **Autoencoders** for **fault detection**. We will use performance metrics such as **accuracy**, **precision**, **recall**, and **F1 score** to assess each model.

### **Steps**:

1. **Evaluate Random Forest Model for Voltage Sag Detection**:

   * Train a **Random Forest classifier** on the dataset to detect **voltage sag**.
   * Evaluate the model using performance metrics.

2. **Evaluate Isolation Forest and Autoencoders for Fault Detection**:

   * Train and evaluate **Isolation Forest** and **Autoencoders** for detecting faults in the power grid.

3. **Compare Models**:

   * Compare the **Random Forest**, **Isolation Forest**, and **Autoencoders** using performance metrics: **accuracy**, **precision**, **recall**, and **F1 score**.

---

### **Markdown Cell for Notebook 5**:

```markdown
# Model Evaluation

This notebook evaluates the classification models used for **voltage sag detection** and **fault detection**. We will use various performance metrics to evaluate the models and compare their effectiveness in detecting grid issues.

### Steps:
1. **Evaluate Random Forest model** for **voltage sag detection**.
2. **Evaluate Isolation Forest and Autoencoders** for **fault detection**.
3. **Compare accuracy**, **precision**, **recall**, and **F1 score** for each model.

The objective is to understand how well the models perform and identify which model is most effective for detecting voltage sags and faults in the grid.
```

---

### **Code Cell for Model Evaluation**:

```python
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import IsolationForest
from keras.models import Model
from keras.layers import Input, Dense
from sklearn.preprocessing import MinMaxScaler

# Load the preprocessed dataset
df = pd.read_csv('preprocessed_power_consumption.csv', index_col='Timestamp')

# Select relevant features for classification
features = ['Voltage', 'Global_active_power', 'Global_intensity']
X = df[features]
y = df['Voltage_State'].apply(lambda x: 1 if x == 'low_voltage' else 0)  # Label voltage sag (low_voltage = 1, others = 0)

# **1. Random Forest Model for Voltage Sag Detection**
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X, y)  # Train Random Forest model
y_pred_rf = rf_model.predict(X)  # Predict using the trained model

# **2. Evaluate Random Forest for Voltage Sag Detection**
print("\nRandom Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_rf):.4f}")
print(f"Precision: {precision_score(y, y_pred_rf):.4f}")
print(f"Recall: {recall_score(y, y_pred_rf):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_rf):.4f}")
print("\nClassification Report for Random Forest:\n", classification_report(y, y_pred_rf))

# **3. Isolation Forest for Fault Detection**
# Initialize Isolation Forest model
model_iforest = IsolationForest(n_estimators=200, contamination=0.05, random_state=42)
model_iforest.fit(X)  # Train Isolation Forest model
y_pred_iforest = model_iforest.predict(X)  # Predict anomalies
y_pred_iforest = np.where(y_pred_iforest == 1, 0, 1)  # Convert 1 (normal) to 0, and -1 (anomaly) to 1

# **4. Autoencoder for Fault Detection**
# Normalize the data before passing to Autoencoder
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Define Autoencoder model structure
input_layer = Input(shape=(X_scaled.shape[1],))
encoded = Dense(128, activation='relu')(input_layer)  # Encoder layer
decoded = Dense(X_scaled.shape[1], activation='sigmoid')(encoded)  # Decoder layer (reconstruction)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# Train Autoencoder model
autoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=256, shuffle=True, validation_data=(X_scaled, X_scaled))

# **5. Anomaly detection with Autoencoders**
reconstructed = autoencoder.predict(X_scaled)  # Reconstructed data from Autoencoder
reconstruction_error = np.mean(np.abs(reconstructed - X_scaled), axis=1)  # Reconstruction error

# Set threshold for anomaly detection (top 5% reconstruction errors)
threshold = np.percentile(reconstruction_error, 95)
y_pred_autoencoder = (reconstruction_error > threshold).astype(int)  # 1 for anomaly, 0 for normal

# **6. Evaluate Fault Detection Models**
# Evaluate Isolation Forest
print("\nIsolation Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_iforest):.4f}")
print(f"Precision: {precision_score(y, y_pred_iforest):.4f}")
print(f"Recall: {recall_score(y, y_pred_iforest):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_iforest):.4f}")
print("\nClassification Report for Isolation Forest:\n", classification_report(y, y_pred_iforest))

# Evaluate Autoencoder
print("\nAutoencoder Model Evaluation:")
print(f"Accuracy: {accuracy_score(y, y_pred_autoencoder):.4f}")
print(f"Precision: {precision_score(y, y_pred_autoencoder):.4f}")
print(f"Recall: {recall_score(y, y_pred_autoencoder):.4f}")
print(f"F1 Score: {f1_score(y, y_pred_autoencoder):.4f}")
print("\nClassification Report for Autoencoder:\n", classification_report(y, y_pred_autoencoder))

# **Visualize Comparison (optional)**
plt.figure(figsize=(12, 6))

# Plot Random Forest
plt.subplot(1, 3, 1)
plt.bar(np.arange(len(y)), y_pred_rf, label='Random Forest Prediction', alpha=0.7)
plt.title("Random Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Isolation Forest
plt.subplot(1, 3, 2)
plt.bar(np.arange(len(y)), y_pred_iforest, label='Isolation Forest Prediction', alpha=0.7)
plt.title("Isolation Forest Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

# Plot Autoencoder
plt.subplot(1, 3, 3)
plt.bar(np.arange(len(y)), y_pred_autoencoder, label='Autoencoder Prediction', alpha=0.7)
plt.title("Autoencoder Predictions")
plt.xlabel("Data Points")
plt.ylabel("Prediction (0: Normal, 1: Anomaly)")

plt.tight_layout()
plt.show()
```

---

### **Explanation of the Notebook**:

1. **Random Forest for Voltage Sag Detection**:

   * We train a **Random Forest** classifier to detect **voltage sag** (e.g., `low_voltage` condition) and evaluate it using performance metrics like **accuracy**, **precision**, **recall**, and **F1 score**.

2. **Isolation Forest for Fault Detection**:

   * **Isolation Forest** is used for **fault detection**, identifying anomalies in the grid data. We evaluate the model using **classification metrics**.

3. **Autoencoder for Fault Detection**:

   * **Autoencoders** are used to detect **faults** based on **reconstruction error**. The model is trained to reconstruct the input data, and large reconstruction errors indicate anomalies.

4. **Evaluation**:

   * The performance of each model (Random Forest, Isolation Forest, Autoencoder) is evaluated using **accuracy**, **precision**, **recall**, and **F1 score**.

5. **Visualization**:

   * The predictions from **Random Forest**, **Isolation Forest**, and **Autoencoders** are visualized for comparison.

---

### **Next Steps**:

1. **Run the notebook** to evaluate the performance of the **Random Forest**, **Isolation Forest**, and **Autoencoders** models.
2. **Analyze the evaluation results** and see which model performs best for **voltage sag detection** and **fault detection**.
3. **Compare the models** based on the **performance metrics** and choose the best one for deployment.

Let me know if you'd like help with any part of this notebook, or if you'd like to continue running it for model evaluation!
