# **Anomaly Detection Modeling** #

In this step, we apply multiple unsupervised machine learning models to identify abnormal energy consumption patterns without requiring labeled data. Isolation Forest, Local Outlier Factor, and Robust Covariance models are used and combined through an ensemble approach to improve detection reliability. This step forms the core intelligence of the system by automatically flagging unusual energy behavior.

### **Import Libraries** ###

In [1]:
import pandas as pd
import numpy as np

from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from sklearn.covariance import EllipticEnvelope

import joblib


### **Load Feature-Engineered Data** ###

In [2]:
df = pd.read_csv("../results/feature_engineered_data.csv")
df['timestamp'] = pd.to_datetime(df['timestamp'])

df.head()


Unnamed: 0,timestamp,electricity,hour,day_of_week,month,is_weekend,electricity_rolling_mean,electricity_rolling_std,electricity_deviation,electricity_lag1,electricity_lag24
0,2016-01-01 09:00:00,0.0,9,4,1,0,0.0,0.0,0.0,0.0,0.0
1,2016-01-01 09:00:00,0.0,9,4,1,0,0.0,0.0,0.0,0.0,0.0
2,2016-01-01 09:00:00,0.0,9,4,1,0,0.0,0.0,0.0,0.0,0.0
3,2016-01-01 09:00:00,0.0,9,4,1,0,0.0,0.0,0.0,0.0,0.0
4,2016-01-01 09:00:00,0.0,9,4,1,0,0.0,0.0,0.0,0.0,0.0


### **Select Features for Modeling** ###

In [3]:
feature_cols = [
    'electricity',
    'electricity_rolling_mean',
    'electricity_rolling_std',
    'electricity_deviation',
    'electricity_lag1',
    'electricity_lag24',
    'hour',
    'day_of_week',
    'is_weekend'
]

X = df[feature_cols]


### **Model 1: Isolation Forest** ###

In [4]:
iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.05,
    random_state=42
)

iso_pred = iso_forest.fit_predict(X)

df['iso_anomaly'] = (iso_pred == -1).astype(int)


### **Model 2: Local Outlier Factor (LOF)** ###

In [5]:
lof = LocalOutlierFactor(
    n_neighbors=20,
    contamination=0.05
)

lof_pred = lof.fit_predict(X)

df['lof_anomaly'] = (lof_pred == -1).astype(int)




### **Model 3: Robust Covariance** ###

In [6]:
robust_cov = EllipticEnvelope(
    contamination=0.05,
    random_state=42
)

rc_pred = robust_cov.fit_predict(X)

df['rc_anomaly'] = (rc_pred == -1).astype(int)




### **Ensemble Voting** ###

In [7]:
df['anomaly_votes'] = (
    df['iso_anomaly'] +
    df['lof_anomaly'] +
    df['rc_anomaly']
)

df['is_anomaly'] = (df['anomaly_votes'] >= 2).astype(int)


### **Quick Sanity Check** ###

In [8]:
df['is_anomaly'].value_counts(normalize=True) * 100


is_anomaly
0    97.766004
1     2.233996
Name: proportion, dtype: float64

### **Save Trained Models** ###

In [9]:
joblib.dump(iso_forest, "../models/isolation_forest.pkl")
joblib.dump(lof, "../models/lof_model.pkl")
joblib.dump(robust_cov, "../models/robust_covariance.pkl")


['../models/robust_covariance.pkl']

### **Save Anomaly-Labeled Data** ###

In [10]:
df.to_csv("../results/anomaly_labeled_data.csv", index=False)


## **Observations** ##

---> Multiple unsupervised anomaly detection models were trained on engineered time-series features.

---> Isolation Forest captured global anomalies, while Local Outlier Factor detected local density deviations.

---> Robust Covariance identified statistically distant observations.

---> An ensemble voting strategy was applied to improve robustness and reduce false positives.

---> Approximately a small fraction of timestamps were flagged as anomalies, aligning with expected real-world behavior.

---> The trained models were saved for reuse in deployment and real-time inference.

## **Key Findings** ##

---> Unsupervised models successfully identified rare and abnormal energy consumption patterns from normal operational behavior.

---> Each model captured different types of anomalies, improving overall coverage.

---> The ensemble voting approach reduced false positives by flagging anomalies only when multiple models agreed.

---> Detected anomalies aligned with sudden spikes, drops, and unusual deviations in consumption.

---> The anomaly detection system produced realistic and actionable results suitable for real-world deployment.