In [3]:
import pandas as pd

# Load data

Data is loaded into two separate DataFrames, one for the electricity data and one for the weather data

In [4]:
df_el = pd.read_csv("./data/ewz_stromabgabe_netzebenen_stadt_zuerich.csv")

In [None]:
frames = []
for i in range(15, 25, 1):
    frames.append(pd.read_csv(f"./data/ugz_ogd_meteo_h1_20{i}.csv"))
df_wthr = pd.concat(frames)

# Format data

The underlying data is not yet in a usable format for this project, therefore before any data analysis can be done, first the data has be put into a usable format

## Formatting Electricity Data

The following steps have to be performed to put the electricity data into the desired format:

- Convert the "Timestamp" column into a "DateTime" column
- Set the newly created "Date" column as the index -> timeseries data
- Drop the now unused "Timestamp" column
- Sum up all quarter hour datapoints up to receive columns with one hour accuracy, since the weather data     resolution is also only hourly

In [5]:
df_el["Date"] = pd.to_datetime(df_el["Timestamp"], utc=True)
df_el.set_index(["Date"], inplace=True)
df_el.drop(["Timestamp"], inplace=True, axis=1)
df_el = df_el.resample("h").sum()

## Formatting Weather Data

The following steps have to be performed to put the electricity data into the desired format:

- Convert the "Datum" column into a "DateTime" column
- Set the newly created "Date" column as the index -> timeseries data
- Only keep entries for the measurement station "Zch_Stampfenbachstrasse" -> Station with most Datapoints
- Extend the values in the "Parameter" column with the values in the "Einheit" column
- Drop the unused columns "Datum", "Intervall", "Standort" and "Einheit"
- Pivot the table to have all Parameters as columns with their respective "Wert" as values. In this step the DataFrame is also reduced to one entry per hour from the previous eight entries


In [8]:
df_wthr["Date"] = pd.to_datetime(df_wthr["Datum"], utc=True)
df_wthr.set_index(["Date"], inplace=True)
df_wthr = df_wthr[df_wthr["Standort"] == "Zch_Stampfenbachstrasse"]
df_wthr['Parameter'] = df_wthr['Parameter'] + ' [' + df_wthr['Einheit'] + "]"
df_wthr.drop(["Datum", "Intervall", "Standort", "Einheit"], inplace=True, axis=1)
df_wthr = df_wthr.pivot(columns='Parameter', values='Wert')

## Combine both tables

From the previously formatted DataFrames, a single one containing all the information required can be produced

In [10]:
df = df_wthr.join(df_el, how="inner")

In [11]:
df.head(5)

Unnamed: 0_level_0,Hr [%Hr],RainDur [min],StrGlo [W/m2],T [°C],WD [°],WVs [m/s],WVv [m/s],p [hPa],Value_NE5,Value_NE7
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2014-12-31 23:00:00+00:00,89.25,0.0,0.02,-2.09,20.41,1.4,1.4,982.8,65674.7507,135628.059644
2015-01-01 00:00:00+00:00,90.47,0.0,0.01,-2.48,353.85,0.61,0.6,982.64,88747.5885,172742.750946
2015-01-01 01:00:00+00:00,89.45,0.0,0.02,-2.46,21.48,1.31,1.31,983.0,86864.5321,173541.200194
2015-01-01 02:00:00+00:00,89.2,0.0,0.02,-2.63,12.22,1.7,1.66,982.93,84158.7339,162802.86324
2015-01-01 03:00:00+00:00,89.56,0.0,0.02,-2.77,8.3,1.23,1.21,983.03,81133.3041,154123.51378
