# Exploratory Data Analysis

### Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1320210409)
randomstate = np.random.RandomState(1320210409)

# The data

## Features

All features are hourly and a country-wide average.
- **Time** _[YYYY-MM-DD HH:MM:SS]_
- **el_load:** electricity load _[MW]_
- **prec:** rainfall amount _[mm]_
- **temp:** temperature _[°C]_
- **rhum:** relative humidity [%]
- **grad:** global radiation _[J/cm²]_
- **pres:** momentary sea level air pressure _[hPa]_
- **wind:** average wind speed _[m/s]_
- **Vel_tviz:** Velence water temperature in Agárd _[°C]_
- **Bal_tviz:** Balaton water temperature in Siófok _[°C]_
- **holiday:** 1 or 0 depending on if it's a holiday
- **weekend:** 1 or 0 depending on if it's a weekend
- **covid:** 1 or 0 depending on covid restrictions in Hungary (estimate)

### The goal

I want to predict Hungary's electricity load for the **next couple hours** using this dataset, or it's differently aggregated counterpart (country, region, county or station)

In [None]:
df = pd.read_csv(
    'data/final_dataframe.csv',
    parse_dates=['Time'],
    index_col='Time',
    sep=';'
)

df.info()

df

No null entries, I have dealt with those in the _data_organization_ notebook.

In [None]:
df['hour'] = df.index.hour
df['weekday'] = df.index.weekday
df['dayofmonth'] = df.index.day
df['dayofyear'] = df.index.dayofyear
df['month'] = df.index.month
df['year'] = df.index.year

df

In [None]:
group_by = ['hour', 'weekday', 'dayofmonth', 'dayofyear', 'month', 'year']

def plot_feature(df: pd.DataFrame, groupes: list, feature: str, desc: str, color: str):
    group_len = len(groupes)
    fig, ax = plt.subplots(2, group_len // 2, figsize=(20, 7))
    fig.suptitle(f"Feature: {feature} ({desc})")
    for i, ax in enumerate(ax.flatten()):
        group = groupes[i % group_len]
        grouped = df.groupby(group)[feature].mean()
        ax.set_title(f"Grouped by {group}", fontsize=10)
        marker = 'o' if group != 'dayofyear' else None
        ax.plot(grouped, color=color, marker=marker)

# Eploring the el_load feature

In [None]:
plot_feature(df, group_by, 'el_load', 'Electricity load', 'black')

#### el_load
- daily averages rises during the day, it hits its at 18-19
- lower during the weekend
- we don't learn too much from the day of the month at this time
- during the year, load is higher in winter, probably since there's less sunlight
- we can see the effects of covid between 2020-2022

- **Time** _[YYYY-MM-DD HH:MM:SS]_
- **el_load:** electricity load _[MW]_
- **prec:** rainfall amount _[mm]_
- **temp:** temperature _[°C]_
- **rhum:** relative humidity [%]
- **grad:** global radiation _[J/cm²]_
- **pres:** momentary sea level air pressure _[hPa]_
- **wind:** average wind speed _[m/s]_
- **Vel_tviz:** Velence water temperature in Agárd _[°C]_
- **Bal_tviz:** Balaton water temperature in Siófok _[°C]_
- **holiday:** 1 or 0 depending on if it's a holiday
- **weekend:** 1 or 0 depending on if it's a weekend
- **covid:** 1 or 0 depending on covid restrictions in Hungary (estimate)

#### Precipitation

In [None]:
plot_feature(df, group_by, 'prec', 'Precipitation', 'blue')

#### Temperature

In [None]:
plot_feature(df, group_by, 'temp', 'Temperature', 'red')

#### Relative humidity

In [None]:
plot_feature(df, group_by, 'rhum', 'Relative humidity', 'green')

#### Global radiation

In [None]:
plot_feature(df, group_by, 'grad', 'Global radiation', 'orange')

#### Momentary sea level air pressure

In [None]:
plot_feature(df, group_by, 'pres', 'Momentary sea level air pressure', 'purple')

#### Average wind speed

In [None]:
plot_feature(df, group_by, 'wind', 'Average wind speed', 'brown')

#### Velence water temperature in Agárd

In [None]:
plot_feature(df, group_by, 'Vel_tviz', 'Velence water temperature in Agárd', 'cyan')

#### Balaton water temperature in Siófok

In [None]:
plot_feature(df, group_by, 'Bal_tviz', 'Balaton water temperature in Siófok', 'lightblue')