# Energy Anomaly & Power Theft Detection System  
### A Data Science Research Framework for Grid Irregularity Detection

---
## Project Overview

Electricity utilities in **Kenya** face significant operational and financial challenges arising from irregular energy consumption, meter tampering, and unauthorized electricity usage. In emerging markets, including Kenya, energy theft contributes to substantial revenue losses and increased strain on distribution infrastructure. Despite the availability of high-frequency smart meter data, many utilities lack structured analytical frameworks capable of distinguishing legitimate variability from suspicious consumption behavior.

This project develops an end-to-end data science framework for detecting abnormal electricity usage patterns and potential power theft using time-series consumption data, environmental context, and scheduled outage information.

To simulate a realistic operational environment, the project constructs a multi-household electricity dataset derived from high-resolution power measurements. Each simulated household exhibits natural consumption variability, while selected households are injected with synthetic theft-like behavior patterns such as sudden sustained drops and abnormal load shape distortions. This enables controlled evaluation of anomaly detection performance in the absence of real-world labeled theft data.

The system integrates three major data components:

1. **Electricity Consumption Data (Core Signal)**  
   Minute-level power and voltage measurements aggregated into structured daily behavioral features.

2. **Weather Data (Contextual Adjustment Layer)**  
   Temperature, precipitation, and wind speed variables used to contextualize legitimate demand variability and reduce false anomaly signals.

3. **Scheduled Outage Information (Operational Filter)**  
   Official maintenance interruptions structured into daily indicators to prevent misclassification of planned supply interruptions as suspicious activity.

The project transitions from raw time-series data to a structured anomaly detection pipeline that:

- Engineers behavioral and volatility-based consumption features  
- Adjusts for environmental and operational context  
- Applies unsupervised anomaly detection techniques  
- Assigns quantitative risk scores to each simulated household  
- Generates structured fraud investigation case exports  

The final output is not merely a binary classification, but a ranked list of high-risk energy usage cases supported by explainable indicators and contextual metadata. This research framework demonstrates how utilities can leverage consumption analytics to move from reactive fraud investigation to proactive anomaly intelligence.

Ultimately, this work presents a scalable blueprint for data-driven energy irregularity detection, combining time-series modeling, contextual feature engineering, and anomaly scoring to support operational grid resilience.

## Business Problem

Electricity utilities operate in environments where revenue protection and grid stability are critical to long-term sustainability. One of the major operational challenges faced by utilities—particularly in emerging economies—is the presence of non-technical losses arising from electricity theft, meter tampering, illegal connections, and irregular consumption behavior.

Energy theft reduces utility revenues, increases operational costs, and creates uneven load stress on the distribution network. In addition to financial loss, unauthorized consumption can compromise grid safety and reliability by introducing unpredictable demand patterns and infrastructure strain. Traditional fraud detection approaches rely heavily on manual inspections, customer reporting, or rule-based heuristics, which are reactive, resource-intensive, and often inefficient.

With the widespread adoption of smart meters, utilities now collect high-frequency electricity consumption data. However, the availability of large volumes of time-series data does not automatically translate into actionable fraud detection. Many utilities lack structured analytical systems capable of distinguishing legitimate consumption variability (e.g., seasonal changes, weather-driven demand shifts, or scheduled outages) from suspicious or abnormal patterns indicative of theft.

The core business problem addressed in this project is:

> How can utilities leverage electricity consumption data, environmental context, and operational schedules to proactively detect abnormal usage patterns and prioritize high-risk cases for investigation?

Specifically, the challenge involves:

- Identifying irregular consumption behavior without relying on fully labeled theft data.
- Reducing false positives caused by legitimate demand fluctuations or scheduled service interruptions.
- Transforming raw anomaly scores into structured, explainable investigation cases.
- Creating a scalable, data-driven framework that supports operational fraud detection teams.

This project addresses the gap by developing an integrated anomaly detection system that combines consumption behavior analysis, contextual environmental features, and operational outage filters to produce ranked theft-risk cases suitable for further investigation.

## Project Objectives

The primary objective of this project is to design and evaluate a data-driven anomaly detection framework capable of identifying abnormal electricity consumption patterns and potential power theft behavior using time-series smart meter data.

To achieve this, the project pursues the following specific objectives:

### 1 Data Preparation & Simulation
- Construct a realistic multi-household electricity consumption dataset derived from high-frequency smart meter readings.
- Introduce controlled variability across simulated households to replicate real-world consumption diversity.
- Inject synthetic theft-like behavioral patterns into selected households to enable controlled anomaly evaluation.

### 2 Feature Engineering
- Aggregate minute-level consumption data into structured daily behavioral features.
- Engineer statistical indicators such as mean load, volatility, load factor, and consumption ratios.
- Incorporate contextual environmental features (temperature, rainfall, wind speed) to adjust for legitimate demand variability.
- Integrate scheduled outage information to prevent misclassification of planned interruptions.

### 3 Anomaly Detection Modeling
- Develop and implement unsupervised anomaly detection techniques (e.g., Isolation Forest, statistical thresholding).
- Generate quantitative anomaly scores for each household.
- Establish risk thresholds to categorize households into low, medium, and high-risk levels.

### 4 Evaluation & Validation
- Assess the model’s ability to distinguish normal consumption variability from injected theft patterns.
- Analyze false positives arising from environmental or operational factors.
- Evaluate robustness of anomaly detection across households with diverse load behaviors.

### 5 Structured Case Generation
- Develop a structured case-export mechanism that ranks high-risk households based on anomaly severity.
- Produce explainable risk indicators to support investigation and auditing.
- Demonstrate how anomaly scores can translate into actionable fraud detection insights.

---

Through these objectives, the project aims to demonstrate how time-series analytics and anomaly detection techniques can be applied to electricity consumption data to support proactive energy irregularity detection in a scalable and research-oriented framework.

## Role of Each Dataset in the Energy Anomaly Detection Framework

This project integrates multiple datasets to construct a context-aware energy anomaly detection system. Each dataset plays a distinct and complementary role in the analytical pipeline.

---

### 1 Multi-Household Daily Power Dataset  
**(power_multi_household_daily.csv)**  

This dataset represents the core behavioral signal of the system. It contains aggregated daily electricity consumption and voltage characteristics for each simulated household (meter).

Key contributions of this dataset:

- Captures daily energy usage patterns per household.
- Provides volatility indicators (standard deviation of power and voltage).
- Enables detection of abnormal consumption behavior (e.g., sudden sustained drops, flattened load profiles).
- Forms the primary feature space used for anomaly detection modeling.

This dataset is the foundation of the anomaly detection framework, as it contains the time-series behavioral dynamics from which irregular patterns can be identified.

---

### 2 Weather Dataset  
**(nairobi_weather_2007_2008.csv)**  

This dataset provides environmental context variables including:

- Maximum and minimum temperature  
- Precipitation (rainfall)  
- Wind speed  

Weather conditions directly influence electricity demand patterns and grid stability. For example:

- Lower temperatures may reduce consumption.
- Heavy rainfall and wind may affect supply stability.
- Seasonal variations may alter usage intensity.

Incorporating weather variables helps:

- Reduce false positives in anomaly detection.
- Distinguish legitimate demand shifts from suspicious irregular behavior.
- Improve model robustness by adjusting for environmental variability.

Thus, the weather dataset acts as a contextual adjustment layer.

---

### 3 Scheduled Outage Dataset  
**(kplc_daily_schedule.csv)**  

This dataset contains structured records of planned power interruptions, including:

- Whether an outage was scheduled on a given date.
- Number of planned interruptions.
- Total scheduled outage duration.

This dataset serves as an operational filter within the anomaly detection pipeline. It prevents the system from misclassifying planned service interruptions as suspicious consumption behavior.

For example:
- A household exhibiting near-zero usage during a scheduled outage is not flagged as anomalous.
- Consumption drops aligned with official maintenance events are treated as legitimate.

This dataset ensures operational realism and reduces misinterpretation of structured supply interruptions.

---

## Integrated System Perspective

Together, these datasets enable a layered detection architecture:

- **Power Data → Behavioral Signal**
- **Weather Data → Environmental Context**
- **Outage Data → Operational Adjustment**

The integration of these layers allows the anomaly detection model to identify genuine irregular consumption behavior while minimizing false alerts caused by external or scheduled factors.

This multi-source integration strengthens the validity, robustness, and real-world applicability of the proposed energy anomaly detection system.