##                                              EdgeFlex AI: Smart Household Energy Optimization

### Problem Statement: 

Modern households face a significant challenge in managing their energy consumption efficiently. High dependency on the grid leads to increased electricity costs and a larger carbon footprint, especially during peak demand hours. While the adoption of renewable energy sources like rooftop solar offers a path to sustainability, a critical mismatch often exists between the time of peak energy generation (middle of the day) and peak energy consumption (morning and evening). This inefficiency means valuable, clean energy is often sold back to the grid for a low price, only for the household to buy expensive grid power later.

### Importing Required Libraries

In [106]:
# Core libraries for data handling and numerical operations
import pandas as pd
import numpy as np

# Libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Overview

### Dataset Details - 1
* **Dataset Name**: **REFIT Electrical Load Measurements (Cleaned)**
* **Source**: [University of Strathclyde](https://pureportal.strath.ac.uk/files/62090184/CLEAN_REFIT_081116.7z)
* **File Format**: .csv

#### Dataset Description
The REFIT dataset contains high-frequency electrical consumption data from 20 UK households, collected over two years. For our project, we will focus on a single household. The data provides power readings (in Watts) for the entire household (aggregate) and for several individual appliances, recorded at an 8-second interval. This dataset is ideal for building high-resolution load forecasting models and analyzing appliance-level energy behavior.

#### Feature Description
* **Unix**: The UTC Unix timestamp for the measurement.
* **Aggregate**: Total power consumption of the household in Watts.
* **Appliance1...Appliance9**: Power consumption for an individual monitored appliance in Watts.
* **Issues**: A binary flag (0 or 1) indicating potential data quality issues.

### Dataset Details - 2
* **Dataset Name**: **Solar Power Generation Data**
* **Source**: [Kaggle](https://www.kaggle.com/datasets/pythonafroz/solar-powe-generation-data)
* **File Format**: .csv

#### Dataset Description
The Solar Power Generation dataset provides hourly records from a solar power plant, combining energy production metrics with corresponding weather data. It captures the key environmental factors that influence photovoltaic (PV) system output, such as solar radiation and temperature. This dataset is perfectly suited for training a model to forecast solar energy generation based on weather conditions.

#### Feature Description
* **Date-Hour(NMT)**: Timestamp of the measurement (hourly).
* **SystemProduction**: The total AC power generated by the PV system in kW (our target variable for solar forecasting).
* **Radiation**: The intensity of solar radiation.
* **AirTemperature**: The ambient air temperature in degrees Celsius.
* **RelativeAirHumidity**: The relative humidity of the air.
* **WindSpeed**: The speed of the wind.
* **Sunshine**: The duration of sunshine.
* **AirPressure**: The atmospheric pressure.

#### Loading the Dataset

In [107]:
solar_data_path = r"C:\Users\Meges\Downloads\EdgeFlexAI\Solor_power_plant\Solar Power Plant Data.csv"
energy_data_path = r"C:\Users\Meges\Downloads\EdgeFlexAI\Household_Electric_Power_Consumption\household_power_consumption.txt"

#### Explore and Understand the Data

Dataset 1 - **Household Energy Data**

In [108]:
# First 5 rows to see the data structure
print("Energy Data Head:")
df_energy.head()

Energy Data Head:


Unnamed: 0_level_0,Aggregate,Fridge,Freezer_1,Freezer_2,Washer_Dryer,Washing_Machine,Dishwasher,Computer_Site,Television_Site,Electric_Heater,Issues
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2013-10-09 13:06:17,523,74,0,69,0,0,0,0,0,1,0
2013-10-09 13:06:31,526,75,0,69,0,0,0,0,0,1,0
2013-10-09 13:06:46,540,74,0,68,0,0,0,0,0,1,0
2013-10-09 13:07:01,532,74,0,68,0,0,0,0,0,1,0
2013-10-09 13:07:15,540,74,0,69,0,0,0,0,0,1,0


In [109]:
# Dimensions of the energy dataset
print(f"The energy dataset has {df_energy.shape[0]} rows and {df_energy.shape[1]} columns.")

The energy dataset has 6960008 rows and 11 columns.


In [110]:
# Descriptive statistics for the energy data
print("Descriptive Statistics for Energy Data:")
df_energy.describe()

Descriptive Statistics for Energy Data:


Unnamed: 0,Aggregate,Fridge,Freezer_1,Freezer_2,Washer_Dryer,Washing_Machine,Dishwasher,Computer_Site,Television_Site,Electric_Heater,Issues
count,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0,6960008.0
mean,481.1385,17.53831,16.55047,29.00873,1.844441,11.0286,11.16492,2.473279,5.80341,69.47503,0.008359617
std,812.8927,43.09098,28.83743,38.00527,56.11159,143.916,156.7903,11.92701,13.1705,255.7986,0.09104798
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,185.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
50%,242.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
75%,427.0,0.0,45.0,70.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
max,29159.0,3584.0,3452.0,3657.0,3584.0,3072.0,2525.0,2094.0,3584.0,2119.0,1.0


In [111]:
# Count missing values in each column of the energy data
print("Missing Values in Energy Data:")
df_energy.isnull().sum()

Missing Values in Energy Data:


Aggregate          0
Fridge             0
Freezer_1          0
Freezer_2          0
Washer_Dryer       0
Washing_Machine    0
Dishwasher         0
Computer_Site      0
Television_Site    0
Electric_Heater    0
Issues             0
dtype: int64

Dataset 2 - **Solar Power Data**

In [112]:
# Display the first 5 rows to see the data structure
print("Solar Data Head:")
df_solar.head()

Solar Data Head:


Unnamed: 0,timestamp,wind_speed,sunshine_hours,air_pressure,radiation,air_temperature,relative_humidity,solar_generation_kw
0,01.01.2017-00:00,0.6,0,1003.8,-7.4,0.1,97,0.0
1,01.01.2017-01:00,1.7,0,1003.5,-7.4,-0.2,98,0.0
2,01.01.2017-02:00,0.6,0,1003.4,-6.7,-1.2,99,0.0
3,01.01.2017-03:00,2.4,0,1003.3,-7.2,-1.3,99,0.0
4,01.01.2017-04:00,4.0,0,1003.1,-6.3,3.6,67,0.0


In [113]:
# Get the dimensions of the solar dataset
print(f"The solar dataset has {df_solar.shape[0]} rows and {df_solar.shape[1]} columns.")

The solar dataset has 8760 rows and 8 columns.


In [114]:
# Get a summary of the solar DataFrame (Data Types and Non-Nulls)
print("Solar Data Info:")
df_solar.info()

Solar Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   timestamp            8760 non-null   object 
 1   wind_speed           8760 non-null   float64
 2   sunshine_hours       8760 non-null   int64  
 3   air_pressure         8760 non-null   float64
 4   radiation            8760 non-null   float64
 5   air_temperature      8760 non-null   float64
 6   relative_humidity    8760 non-null   int64  
 7   solar_generation_kw  8760 non-null   float64
dtypes: float64(5), int64(2), object(1)
memory usage: 547.6+ KB


In [115]:
# Generate descriptive statistics for the solar data
print("Descriptive Statistics for Solar Data:")
df_solar.describe()

Descriptive Statistics for Solar Data:


Unnamed: 0,wind_speed,sunshine_hours,air_pressure,radiation,air_temperature,relative_humidity,solar_generation_kw
count,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0
mean,2.639823,11.180479,1010.361781,97.538493,6.978893,76.719406,684.746071
std,1.628754,21.171295,12.793971,182.336029,7.604266,19.278996,1487.454665
min,0.0,0.0,965.9,-9.3,-12.4,13.0,0.0
25%,1.4,0.0,1002.8,-6.2,0.5,64.0,0.0
50%,2.3,0.0,1011.0,-1.4,6.4,82.0,0.0
75%,3.6,7.0,1018.2,115.6,13.4,93.0,464.24995
max,10.9,60.0,1047.3,899.7,27.1,100.0,7701.0


In [116]:
# Count missing values in each column of the solar data
print("Missing Values in Solar Data:")
df_solar.isnull().sum()

Missing Values in Solar Data:


timestamp              0
wind_speed             0
sunshine_hours         0
air_pressure           0
radiation              0
air_temperature        0
relative_humidity      0
solar_generation_kw    0
dtype: int64

### Data Preprocessing and Feature Engineering

**Process the Household Energy Data**

In [118]:
energy_data_path = 'household_energy_clean/CLEAN_House1.csv'
df_energy = pd.read_csv(energy_data_path)

appliance_names = {
    'Appliance1': 'Fridge', 'Appliance2': 'Freezer_1', 'Appliance3': 'Freezer_2',
    'Appliance4': 'Washer_Dryer', 'Appliance5': 'Washing_Machine', 'Appliance6': 'Dishwasher',
    'Appliance7': 'Computer_Site', 'Appliance8': 'Television_Site', 'Appliance9': 'Electric_Heater'
}
df_energy.rename(columns=appliance_names, inplace=True)

df_energy['timestamp'] = pd.to_datetime(df_energy['Unix'], unit='s')
df_energy.set_index('timestamp', inplace=True)
df_energy.drop(['Time', 'Unix'], axis=1, inplace=True)

print(df_energy.head())


                     Aggregate  Fridge  Freezer_1  Freezer_2  Washer_Dryer  \
timestamp                                                                    
2013-10-09 13:06:17        523      74          0         69             0   
2013-10-09 13:06:31        526      75          0         69             0   
2013-10-09 13:06:46        540      74          0         68             0   
2013-10-09 13:07:01        532      74          0         68             0   
2013-10-09 13:07:15        540      74          0         69             0   

                     Washing_Machine  Dishwasher  Computer_Site  \
timestamp                                                         
2013-10-09 13:06:17                0           0              0   
2013-10-09 13:06:31                0           0              0   
2013-10-09 13:06:46                0           0              0   
2013-10-09 13:07:01                0           0              0   
2013-10-09 13:07:15                0           0   

**Process the Solar and Weather Data**

In [None]:
# Process df_solar: rename columns for consistency and convert string timestamp to DatetimeIndex.
solar_column_names = {
    'Date-Hour(NMT)': 'timestamp', 
    'SystemProduction': 'solar_generation_kw',
    'WindSpeed': 'wind_speed', 
    'Sunshine': 'sunshine_hours',
    'AirPressure': 'air_pressure', 
    'Radiation': 'radiation',
    'AirTemperature': 'air_temperature', 
    'RelativeAirHumidity': 'relative_humidity'
}
df_solar.rename(columns=solar_column_names, inplace=True)

# CORRECTED LINE: The format now matches the data's structure (e.g., "01.01.2017-00:00")
df_solar['timestamp'] = pd.to_datetime(df_solar['timestamp'], format='%d.%m.%Y-%H:%M')
df_solar.set_index('timestamp', inplace=True)

print("✅ Solar data cleaned and indexed by timestamp.")
print(df_solar.info())

KeyError: 'timestamp'