# Exploratory Data Analysis

[Introduction]

## Imports

The following code imports the necessary libraries for this notebook:


In [11]:
import polars as pl

## Dataset

The dataset used in this work contains information about the energy consumption, measured in kilowatt-hours (kWh), of 499 anonymized customers located in Spain. The dataset covers the full year of 2019, with observations recorded every hour. In addition to energy consumption, weather data are also provided, specifically the outside temperature in the region of each customer, with the same hourly resolution. Furthermore, each customer is assigned to one of 68 predefined customer profiles, such as private households, shops, or bakeries, allowing for segmentation and analysis based on these categories. 

The dataset is publicly available at [https://fordatis.fraunhofer.de/handle/fordatis/215](https://fordatis.fraunhofer.de/handle/fordatis/215).

The data is provided in three separate files: 
- `consumption.xlsx`, which contains the energy consumption data for the 499 customers. This file includes a `date` column with hourly timestamps and a column for the customer identifier.
- `weather.xlsx`, which includes the hourly weather data (temperature) for the region of each customer. It also contains a `date` column with hourly timestamps and a column for the customer identifier.
- `profiles.xlsx`, which contains information about the 68 predefined customer profiles. This file includes a column for the customer identifiers and another column specifying the associated activity, which defines the customer profile (e.g., private households, shops, bakeries).

The first few rows of each file are displayed below to provide an overview of their structure:


In [19]:
consumption_df = pl.read_excel('../data/raw/consumption.xlsx')

print("First 5 rows of consumption dataset:")
consumption_df.head(5)

First 5 rows of consumption dataset:


date,5d6fcd1cf44b0324bc6b7254,5d6fcd1cf44b0324bc6b7257,5d6fcd1cf44b0324bc6b725a,5d6fcd1cf44b0324bc6b725d,5d6fcd1df44b0324bc6b7260,5d6fcd1df44b0324bc6b726b,5d6fcd1df44b0324bc6b7271,5d6fcd1df44b0324bc6b7274,5d6fcd1ef44b0324bc6b727a,5d6fcd1ef44b0324bc6b727c,5d6fcd1ef44b0324bc6b7289,5d6fcd1ef44b0324bc6b728b,5d6fcd1ff44b0324bc6b728d,5d6fcd1ff44b0324bc6b7293,5d6fcd1ff44b0324bc6b7296,5d6fcd1ff44b0324bc6b7299,5d6fcd1ff44b0324bc6b729b,5d6fcd1ff44b0324bc6b729e,5d6fcd20f44b0324bc6b72a4,5d6fcd20f44b0324bc6b72aa,5d6fcd20f44b0324bc6b72b0,5d6fcd20f44b0324bc6b72b6,5d6fcd21f44b0324bc6b72bc,5d6fcd21f44b0324bc6b72bf,5d6fcdddf44b0324bc6b815b,5d6fcdddf44b0324bc6b815c,5d6fcd21f44b0324bc6b72c5,5d6fcd22f44b0324bc6b72dd,5d6fcd23f44b0324bc6b72e7,5d6fcd23f44b0324bc6b72ed,5d6fcd23f44b0324bc6b72f3,5d6fcd23f44b0324bc6b72f6,5d6fcd23f44b0324bc6b72fa,5d6fcd24f44b0324bc6b72fd,5d6fcd24f44b0324bc6b7302,5d6fcd25f44b0324bc6b7313,…,5d6fcd79f44b0324bc6b79a8,5d6fcd79f44b0324bc6b79ae,5d6fcd79f44b0324bc6b79b1,5d6fcd7bf44b0324bc6b79c2,5d6fcd7bf44b0324bc6b79c1,5d6fcd7bf44b0324bc6b79c0,5d6fcd7af44b0324bc6b79bf,5d6fcd79f44b0324bc6b79b6,5d6fcd7af44b0324bc6b79be,5d6fcd7af44b0324bc6b79bd,5d6fcd7af44b0324bc6b79bc,5d6fcd7af44b0324bc6b79bb,5d6fcd7af44b0324bc6b79ba,5d6fcd7af44b0324bc6b79b9,5d6fcd79f44b0324bc6b79b5,5d6fcd7af44b0324bc6b79b8,5d6fcd79f44b0324bc6b79b7,5d6fcd7bf44b0324bc6b79c5,5d6fcd7bf44b0324bc6b79c7,5d6fcd7bf44b0324bc6b79c9,5d6fcd7bf44b0324bc6b79cc,5d6fcd7bf44b0324bc6b79cf,5d6fcd7bf44b0324bc6b79d1,5d6fcd7cf44b0324bc6b79d4,5d6fcd7cf44b0324bc6b79d7,5d6fcd7cf44b0324bc6b79dc,5d6fcd7cf44b0324bc6b79df,5d6fcd7cf44b0324bc6b79e4,5d6fcd7df44b0324bc6b79ed,5d6fcd7df44b0324bc6b79ef,5d6fcd7df44b0324bc6b79f1,5d6fcd7df44b0324bc6b79f4,5d6fcd7df44b0324bc6b79f7,5d6fcd7df44b0324bc6b79fd,5d6fcd7ef44b0324bc6b7a00,5d6fcd7ef44b0324bc6b7a02,5d6fcd7ef44b0324bc6b7a03
datetime[ms],f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,i64,…,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
2019-01-01 00:00:00,0.039,0.384,0.986,0.706,0.304,0.073,3.262,3.0,0.645,1.279,0.0,0.349,0.517,0.465,0.633,0.967,0.044,1.806,0.0,0.147,6.327,0.379,0.033,0.14,0.325,0.044,0.145,0.024,0.495,0.279,4.479,0.082,0.038,0.106,3.0,3,…,0.035,2.511,1.223,0.181,0.231,0.318,0.112,0.195,0.403,0.144,0.145,0.196,0.284,0.21,0.512,0.248,0.165,4.413,0.308,1.397,3.293,0.013,0.002,0.301,2.585,0.039,1.67,0.431,0.1,0.01,0.258,0.009,0.488,0.81,0.309,0.366,0.29
2019-01-01 01:00:00,0.269,0.051,0.846,2.21,0.673,0.047,3.252,4.0,0.746,0.851,0.0,0.347,0.589,0.171,0.635,0.952,0.11,1.706,0.0,0.145,4.237,0.409,0.036,0.172,0.322,0.044,0.142,0.614,0.492,0.385,4.262,0.095,0.156,0.106,2.0,3,…,0.036,2.299,1.293,0.182,0.207,0.286,0.101,0.216,0.435,0.23,0.208,0.177,0.29,0.186,0.832,0.236,0.167,5.587,0.183,1.075,1.365,0.033,0.003,0.287,2.506,0.425,1.609,0.447,0.025,0.009,0.232,0.009,0.447,0.753,0.381,0.308,0.369
2019-01-01 02:00:00,0.331,0.049,0.97,1.797,0.31,0.058,3.043,4.0,0.678,0.789,0.0,0.347,0.588,0.178,0.628,0.886,0.044,1.518,0.0,0.146,3.772,0.409,0.038,0.182,0.572,0.044,0.148,0.04,0.734,0.404,3.678,0.144,0.038,0.104,3.0,3,…,0.045,1.343,1.291,0.181,0.231,0.548,0.103,0.244,0.415,0.164,0.255,0.212,0.351,0.224,1.473,0.383,0.245,4.448,0.295,1.1,0.611,0.011,0.003,0.294,2.497,0.134,1.583,0.439,0.025,0.009,0.26,0.009,0.508,0.092,0.193,0.137,0.292
2019-01-01 03:00:00,0.093,0.049,0.803,1.01,0.545,0.059,2.96,4.0,0.644,0.637,0.0,0.349,0.432,0.477,0.627,1.013,0.058,1.757,0.0,0.148,3.553,0.411,0.04,0.242,0.322,0.045,0.212,0.055,0.434,0.227,3.844,0.082,0.174,0.685,2.0,2,…,0.026,2.703,1.284,0.18,0.224,0.369,0.103,0.217,0.409,0.163,0.171,0.173,0.29,0.216,0.532,0.213,0.165,1.564,0.296,1.909,0.171,0.033,0.003,0.339,2.696,0.153,1.492,0.913,0.023,0.009,0.252,0.009,0.511,0.032,0.124,0.201,0.358
2019-01-01 04:00:00,0.116,0.05,1.303,0.66,0.41,0.072,3.054,3.0,0.13,0.262,0.0,0.339,0.214,0.144,0.634,0.915,0.085,1.798,0.0,0.148,3.494,0.4,0.039,0.145,0.395,0.044,0.416,0.024,0.434,0.228,3.814,0.09,0.039,0.108,3.0,3,…,0.038,0.786,1.529,0.181,0.231,0.285,0.102,0.186,0.513,0.152,0.162,0.196,0.334,0.21,0.513,0.221,0.165,0.647,0.305,0.782,0.499,0.033,0.003,0.288,2.551,0.086,2.199,1.256,0.024,0.009,0.278,0.009,0.51,0.081,0.081,0.223,0.38


In [18]:
weather_df = pl.read_excel('../data/raw/weather.xlsx')

print("First 5 rows of weather dataset:")
weather_df.head(5)

First 5 rows of weather dataset:


date,5d6fcd1cf44b0324bc6b7254,5d6fcd1cf44b0324bc6b7257,5d6fcd1cf44b0324bc6b725a,5d6fcd1cf44b0324bc6b725d,5d6fcd1df44b0324bc6b7260,5d6fcd1df44b0324bc6b726b,5d6fcd1df44b0324bc6b7271,5d6fcd1df44b0324bc6b7274,5d6fcd1ef44b0324bc6b727a,5d6fcd1ef44b0324bc6b727c,5d6fcd1ef44b0324bc6b7289,5d6fcd1ef44b0324bc6b728b,5d6fcd1ff44b0324bc6b728d,5d6fcd1ff44b0324bc6b7293,5d6fcd1ff44b0324bc6b7296,5d6fcd1ff44b0324bc6b7299,5d6fcd1ff44b0324bc6b729b,5d6fcd1ff44b0324bc6b729e,5d6fcd20f44b0324bc6b72a4,5d6fcd20f44b0324bc6b72aa,5d6fcd20f44b0324bc6b72b0,5d6fcd20f44b0324bc6b72b6,5d6fcd21f44b0324bc6b72bc,5d6fcd21f44b0324bc6b72bf,5d6fcdddf44b0324bc6b815b,5d6fcdddf44b0324bc6b815c,5d6fcd21f44b0324bc6b72c5,5d6fcd22f44b0324bc6b72dd,5d6fcd23f44b0324bc6b72e7,5d6fcd23f44b0324bc6b72ed,5d6fcd23f44b0324bc6b72f3,5d6fcd23f44b0324bc6b72f6,5d6fcd23f44b0324bc6b72fa,5d6fcd24f44b0324bc6b72fd,5d6fcd24f44b0324bc6b7302,5d6fcd25f44b0324bc6b7313,…,5d6fcd79f44b0324bc6b79a8,5d6fcd79f44b0324bc6b79ae,5d6fcd79f44b0324bc6b79b1,5d6fcd7bf44b0324bc6b79c2,5d6fcd7bf44b0324bc6b79c1,5d6fcd7bf44b0324bc6b79c0,5d6fcd7af44b0324bc6b79bf,5d6fcd79f44b0324bc6b79b6,5d6fcd7af44b0324bc6b79be,5d6fcd7af44b0324bc6b79bd,5d6fcd7af44b0324bc6b79bc,5d6fcd7af44b0324bc6b79bb,5d6fcd7af44b0324bc6b79ba,5d6fcd7af44b0324bc6b79b9,5d6fcd79f44b0324bc6b79b5,5d6fcd7af44b0324bc6b79b8,5d6fcd79f44b0324bc6b79b7,5d6fcd7bf44b0324bc6b79c5,5d6fcd7bf44b0324bc6b79c7,5d6fcd7bf44b0324bc6b79c9,5d6fcd7bf44b0324bc6b79cc,5d6fcd7bf44b0324bc6b79cf,5d6fcd7bf44b0324bc6b79d1,5d6fcd7cf44b0324bc6b79d4,5d6fcd7cf44b0324bc6b79d7,5d6fcd7cf44b0324bc6b79dc,5d6fcd7cf44b0324bc6b79df,5d6fcd7cf44b0324bc6b79e4,5d6fcd7df44b0324bc6b79ed,5d6fcd7df44b0324bc6b79ef,5d6fcd7df44b0324bc6b79f1,5d6fcd7df44b0324bc6b79f4,5d6fcd7df44b0324bc6b79f7,5d6fcd7df44b0324bc6b79fd,5d6fcd7ef44b0324bc6b7a00,5d6fcd7ef44b0324bc6b7a02,5d6fcd7ef44b0324bc6b7a03
datetime[ms],f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,…,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
2019-01-01 00:00:00,10.31,10.35,1.65,1.7,11.53,10.76,10.57,0.43,1.66,10.49,8.98,10.33,1.71,8.5,10.75,1.69,1.81,8.55,10.72,-0.94,10.64,2.35,4.4,4.87,4.87,4.87,10.77,10.77,5.03,2.32,1.76,1.63,1.77,1.7,7.51,-0.42,…,11.18,0.14,9.18,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,11.83,12.1,1.61,0.08,10.71,10.6,1.68,10.71,10.33,4.53,10.4,9.53,9.94,3.19,3.14,3.2,2.63,1.51,1.51,1.51
2019-01-01 01:00:00,10.34,10.38,1.33,1.37,11.78,10.79,10.61,0.17,1.33,10.73,8.99,10.37,1.39,8.51,10.78,1.37,1.49,9.0,10.76,-1.2,10.68,1.92,3.96,4.64,4.64,4.64,10.8,10.8,4.77,1.73,1.44,1.31,1.44,1.37,7.77,-0.63,…,11.41,-0.16,9.18,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,9.48,11.95,12.24,1.3,-0.19,10.74,10.63,1.25,10.74,10.58,4.27,10.65,10.21,10.63,3.7,3.65,3.71,2.08,1.21,1.21,1.21
2019-01-01 02:00:00,10.36,10.4,1.01,1.05,12.03,10.82,10.64,-0.09,1.01,10.98,8.99,10.4,1.07,8.52,10.81,1.04,1.16,9.46,10.79,-1.46,10.71,1.5,3.52,4.4,4.4,4.4,10.83,10.83,4.51,1.14,1.11,0.99,1.12,1.05,8.03,-0.84,…,11.64,-0.45,9.17,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,9.47,12.07,12.37,0.99,-0.47,10.77,10.66,0.81,10.77,10.83,4.01,10.9,10.88,11.33,4.22,4.17,4.23,1.53,0.91,0.91,0.91
2019-01-01 03:00:00,10.39,10.43,0.69,0.73,12.28,10.84,10.68,-0.34,0.69,11.23,9.0,10.44,0.74,8.53,10.83,0.72,0.83,9.91,10.83,-1.71,10.75,1.08,3.09,4.17,4.17,4.17,10.85,10.85,4.25,0.55,0.79,0.67,0.79,0.73,8.3,-1.05,…,11.86,-0.74,9.16,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,9.46,12.19,12.5,0.68,-0.75,10.79,10.68,0.38,10.79,11.07,3.75,11.14,11.56,12.02,4.74,4.69,4.75,0.99,0.6,0.6,0.6
2019-01-01 04:00:00,10.27,10.31,0.47,0.51,12.13,10.73,10.57,-0.62,0.48,11.06,8.84,10.33,0.53,8.37,10.72,0.51,0.62,9.75,10.72,-1.88,10.64,0.91,3.14,4.24,4.24,4.24,10.74,10.74,4.29,-0.11,0.57,0.46,0.58,0.51,7.97,-1.39,…,11.69,-0.94,9.04,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,9.34,12.08,12.39,0.46,-0.94,10.68,10.57,0.12,10.68,10.9,3.47,10.97,11.33,11.79,4.47,4.42,4.48,0.76,0.38,0.38,0.38


In [20]:
profiles_df = pl.read_excel('../data/raw/profiles.xlsx')

print("First 5 rows of profiles dataset:")
profiles_df.head(5)

First 5 rows of profiles dataset:


__UNNAMED__0,activity
str,str
"""5d6fcd1cf44b0324bc6b7254""","""Actividades de los hogares com…"
"""5d6fcd1cf44b0324bc6b7257""","""Actividades de los hogares com…"
"""5d6fcd1cf44b0324bc6b725a""","""Comercio al por menor de fruta…"
"""5d6fcd1cf44b0324bc6b725d""","""Actividades de los hogares com…"
"""5d6fcd1df44b0324bc6b7260""","""Actividades de los hogares com…"
