# PowerSight — Energy Consumption Analysis & Forecasting Platform

## Project Overview

This book documents the end-to-end development of **PowerSight**, an energy 
consumption analysis and forecasting platform built on real household power 
consumption data.

The project covers:
- Data ingestion, cleaning and validation
- Exploratory Data Analysis (EDA) with interactive visualizations
- Statistical hypothesis testing
- Anomaly classification (Normal / Spike / Fault / Outage)
- Energy consumption forecasting
- A Streamlit monitoring dashboard
- Full MLOps pipeline with MLflow, DVC, Docker and GitHub Actions

## Dataset

The dataset used is the **UCI Household Electric Power Consumption** dataset — 
2 million readings recorded every minute over 4 years (2006–2010) from a single 
household in Sceaux, France.

**Source:** [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption)

## Authors
- Carlos Reyes

## Data Acquisition

The dataset is downloaded programmatically from the UCI Machine Learning Repository.
Raw data is stored in `data/raw/` and never modified.

In [None]:
# Standard library
import os
import zipfile
from pathlib import Path

# Data
import pandas as pd
import numpy as np

# Utilities
import requests
import yaml
from loguru import logger

In [6]:
with open("../configs/config.yaml", "r") as f:
    config = yaml.safe_load(f)

# Define paths
raw_path = Path("../") / config["paths"]["raw_data"]
raw_dir = raw_path.parent

# Create directory if it doesn't exist
raw_dir.mkdir(parents=True, exist_ok=True)

logger.info(f"Raw data directory: {raw_dir}")
logger.info(f"Target file: {raw_path}")

[32m2026-02-23 23:38:42.283[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m11[0m - [1mRaw data directory: ../data/raw[0m
[32m2026-02-23 23:38:42.283[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m12[0m - [1mTarget file: ../data/raw/household_power_consumption.txt[0m


In [8]:
# Dataset URL
URL = "https://archive.ics.uci.edu/static/public/235/individual+household+electric+power+consumption.zip"

zip_path = raw_dir / "household_power_consumption.zip"

# Download only if file doesn't already exist
if not raw_path.exists():
    logger.info("Downloading dataset...")
    response = requests.get(URL, stream=True)
    
    with open(zip_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    
    logger.info("Download complete. Extracting...")
    
    with zipfile.ZipFile(zip_path, "r") as z:
        z.extractall(raw_dir)
    
    zip_path.unlink()  # delete the zip file after extraction
    logger.info(f"Dataset ready at {raw_path}")

else:
    logger.info("Dataset already exists, skipping download.")

[32m2026-02-23 23:40:13.231[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m8[0m - [1mDownloading dataset...[0m
[32m2026-02-23 23:40:15.657[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m15[0m - [1mDownload complete. Extracting...[0m
[32m2026-02-23 23:40:15.837[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m21[0m - [1mDataset ready at ../data/raw/household_power_consumption.txt[0m


In [9]:
import os
size = os.path.getsize(raw_path) / (1024 * 1024)
logger.info(f"File size: {size:.1f} MB")
logger.info(f"File path: {raw_path}")

[32m2026-02-23 23:40:50.127[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m3[0m - [1mFile size: 126.8 MB[0m
[32m2026-02-23 23:40:50.127[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m4[0m - [1mFile path: ../data/raw/household_power_consumption.txt[0m
