# Fault Detection

    ## Fault Detection
    In this notebook, we implement anomaly detection using Isolation Forest and Autoencoders.
    
    Steps:
    1. Implement Isolation Forest for detecting faults.
    2. Train Autoencoders for anomaly detection.
    3. Visualize anomalies in voltage and power consumption data.
    

In [None]:
Yes, let's move on to the **fourth notebook**: **Fault Detection**.

In this notebook, we'll focus on **anomaly detection** using two techniques:

1. **Isolation Forest** (for outlier detection).
2. **Autoencoders** (for anomaly detection based on reconstruction error).

We will:

1. **Detect faults** in the dataset (e.g., voltage spikes, sudden power surges).
2. **Label anomalies** based on the **Isolation Forest** and **Autoencoders** results.
3. **Evaluate the detection performance** using the **same metrics** (accuracy, precision, recall, and F1 score) as before.

---

### **Fourth Notebook: `04_fault_detection.ipynb`**

#### **Markdown Cell**:

```markdown
# Fault Detection

In this notebook, we implement anomaly detection using **Isolation Forest** and **Autoencoders**.

Steps:
1. Use **Isolation Forest** to detect faults in power distribution.
2. Train **Autoencoders** for anomaly detection.
3. Label anomalies (potential faults) based on the reconstruction error.
4. Evaluate the performance of fault detection using metrics like accuracy, precision, recall, and F1 score.
```

#### **Code Cell**:

```python
# Import necessary libraries for anomaly detection
from sklearn.ensemble import IsolationForest
from keras.models import Model
from keras.layers import Input, Dense
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load the preprocessed dataset
df = pd.read_csv('preprocessed_power_consumption.csv', index_col='Timestamp')

# Select relevant features for anomaly detection
features = ['Voltage', 'Global_active_power', 'Global_intensity']
X = df[features]

# **1. Isolation Forest for Fault Detection**
# Initialize Isolation Forest model
model_iforest = IsolationForest(n_estimators=200, contamination=0.05, max_samples=0.9, random_state=42)
model_iforest.fit(X)  # Train the model on the selected features

# Predict anomalies using Isolation Forest
df['anomaly_iforest'] = model_iforest.predict(X)  # -1 for anomaly, 1 for normal
df['anomaly_iforest'] = df['anomaly_iforest'].map({1: 'Normal', -1: 'Anomaly'})  # Map the predictions

# **2. Autoencoders for Anomaly Detection**
# Normalize the data before passing to Autoencoder
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Define Autoencoder model structure
input_layer = Input(shape=(X_scaled.shape[1],))
encoded = Dense(128, activation='relu')(input_layer)  # Encoder layer
decoded = Dense(X_scaled.shape[1], activation='sigmoid')(encoded)  # Decoder layer (reconstruction)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# Train Autoencoder model
autoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=256, shuffle=True, validation_data=(X_scaled, X_scaled))

# **Anomaly detection with Autoencoders**
reconstructed = autoencoder.predict(X_scaled)  # Reconstructed data from Autoencoder
reconstruction_error = np.mean(np.abs(reconstructed - X_scaled), axis=1)  # Reconstruction error

# Set threshold for anomaly detection (top 5% reconstruction errors)
threshold = np.percentile(reconstruction_error, 95)
df['anomaly_autoencoder'] = (reconstruction_error > threshold).astype(int)  # 1 for anomaly, 0 for normal

# **3. Evaluate the Performance**
# Assuming we have labels or a threshold for detecting faults, we can evaluate performance.
# For demonstration, assume anomalies detected by both methods are potential faults.

# Displaying first few rows with anomalies labeled
print("\nFirst few rows with anomaly labels:")
print(df[['Voltage', 'Global_active_power', 'anomaly_iforest', 'anomaly_autoencoder']].head(10))

# **4. Evaluate Model Performance using Precision, Recall, F1 Score**
# Here, we assume that anomalies labeled by 'anomaly_iforest' or 'anomaly_autoencoder' are the faults.
y_true = (df['anomaly_iforest'] == 'Anomaly').astype(int)  # True labels for faults
y_pred_iforest = (df['anomaly_iforest'] == 'Anomaly').astype(int)  # Predicted anomalies by Isolation Forest
y_pred_autoencoder = df['anomaly_autoencoder']  # Predicted anomalies by Autoencoder

# Evaluation for Isolation Forest
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print("\nIsolation Forest Model Evaluation:")
print(f"Accuracy: {accuracy_score(y_true, y_pred_iforest):.4f}")
print(f"Precision: {precision_score(y_true, y_pred_iforest):.4f}")
print(f"Recall: {recall_score(y_true, y_pred_iforest):.4f}")
print(f"F1 Score: {f1_score(y_true, y_pred_iforest):.4f}")
print("\nClassification Report for Isolation Forest:\n", classification_report(y_true, y_pred_iforest))

# Evaluation for Autoencoder
print("\nAutoencoder Model Evaluation:")
print(f"Accuracy: {accuracy_score(y_true, y_pred_autoencoder):.4f}")
print(f"Precision: {precision_score(y_true, y_pred_autoencoder):.4f}")
print(f"Recall: {recall_score(y_true, y_pred_autoencoder):.4f}")
print(f"F1 Score: {f1_score(y_true, y_pred_autoencoder):.4f}")
print("\nClassification Report for Autoencoder:\n", classification_report(y_true, y_pred_autoencoder))

# **Visualize Anomalies**
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.scatter(df.index, df['Voltage'], c=df['anomaly_iforest'].map({'Normal': 'blue', 'Anomaly': 'red'}), label='Anomalies')
plt.title('Anomalies in Voltage (Isolation Forest)')
plt.xlabel('Time')
plt.ylabel('Voltage (V)')
plt.legend()

plt.subplot(2, 1, 2)
plt.scatter(df.index, df['Voltage'], c=df['anomaly_autoencoder'].map({0: 'blue', 1: 'red'}), label='Anomalies')
plt.title('Anomalies in Voltage (Autoencoder)')
plt.xlabel('Time')
plt.ylabel('Voltage (V)')
plt.legend()

plt.tight_layout()
plt.show()
```

---

### **Explanation of the Script**:

1. **Isolation Forest**:

   * We use **Isolation Forest** to detect anomalies based on **features** such as **Voltage**, **Global\_active\_power**, and **Global\_intensity**. The model is trained and anomalies are detected.
   * Anomalies are labeled as **'Anomaly'** and **'Normal'**.

2. **Autoencoders**:

   * We use **Autoencoders** to detect anomalies by training the model on the scaled dataset and then comparing the **reconstruction error** to identify anomalies.
   * The **top 5%** of reconstruction errors are classified as anomalies.

3. **Model Evaluation**:

   * We use **accuracy**, **precision**, **recall**, and **F1 score** to evaluate the performance of both the **Isolation Forest** and **Autoencoder** models.
   * We assume the **'Anomaly'** labels from both models represent **faults**.

4. **Visualization**:

   * The anomalies detected by **Isolation Forest** and **Autoencoder** are visualized on the **Voltage** time series. Red points represent anomalies, while blue represents normal data.

---

### **Next Steps**:

1. **Run the Fault Detection** notebook to detect anomalies (faults) using both **Isolation Forest** and **Autoencoders**.
2. **Evaluate** the performance using metrics like **accuracy**, **precision**, **recall**, and **F1 score**.
3. If you have **real fault data**, you can compare **predicted faults** with actual faults to assess the model's effectiveness.

Let me know if you'd like to proceed with **running this notebook**, or if you'd like to dive into any other steps!
