# Household Power Consumption Analysis

Dataset: [Individual Household Electric Power Consumption](https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption)

Columns: Date, Time, Global_active_power, Global_reactive_power, Voltage, Global_intensity, Sub_metering_1, Sub_metering_2, Sub_metering_3

This notebook covers:
- EDA: Time-series trends, missing data, patterns
- Supervised Learning: Time-series forecasting
- Unsupervised Learning: Anomaly detection and clustering
- Rule-Based AI: Usage categorization

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.cluster import KMeans
from sklearn.ensemble import IsolationForest
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')

## 1. Data Loading and Preprocessing

In [1]:
# Load the dataset
df = pd.read_csv('household_power_consumption.txt', sep=';', low_memory=False)

# Display basic info
print("Dataset shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nData types:")
print(df.dtypes)
print("\nFirst 5 rows:")
df.head()

NameError: name 'pd' is not defined

In [None]:
# Convert date and time to datetime
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d/%m/%Y %H:%M:%S')

# Set as index
df.set_index('DateTime', inplace=True)

# Convert numeric columns (they are object due to '?' missing values)
numeric_cols = ['Global_active_power', 'Global_reactive_power', 'Voltage', 
                'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']

for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Check for missing values
print("Missing values per column:")
print(df.isnull().sum())

# Drop rows with missing values for simplicity (or impute)
df.dropna(inplace=True)

print("\nAfter cleaning:")
print("Dataset shape:", df.shape)
print("Date range:", df.index.min(), "to", df.index.max())