# **Predicting Electricity Spot Prices Based on Weather Patterns in Nordic Countries**

In this project, we will combine historical weather data and electricity spot price data for the years 2015-2019 in Finland, Norway, and Sweden. Our goal is to predict the electricity spot prices by using weather features like temperature, precipitation, and wind speed.

In [None]:
#Set up and Libaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import plotly.express as px

print("Libaries imported")

Libaries imported
Libaries imported


In [None]:
#Loading Data
weather_df = pd.read_csv('/kaggle/input/finland-norway-and-sweden-weather-data-20152019/nordics_weather.csv')
electricity_df = pd.read_csv('/kaggle/input/electricity-spot-price/Elspotprices.csv', delimiter=';')
print("Datasets Loaded")

In [None]:
# Basic exploration
print(weather_df.head())
print(electricity_df.head())

print(weather_df.dtypes)
print(electricity_df.dtypes)

print(weather_df.isnull().sum())
print(electricity_df.isnull().sum())

Based on the inital exploration of both datasets, we can see there are no missing values but the data structure needs to be cleaned and parsed correctly

Weather Data:
There are no missing values and the data appears to be clean.

The 'date' column is currently type object, which needs to be converted to 'datetime' and set as the iundex. This will allow for easier time series analysis and aslignment with the Electricity Spot Price Dataset

Electricity Spot Price Data:
Therer are no missing values but the dataset requires some cleaning.

The dataframe formatting means we need to load the data with a delimiter ';'.electricity_df['HourUTC'] = pd.to_datetime(electricity_df['HourUTC'])
electricity_df['HourDK'] = pd.to_datetime(electricity_df['HourDK'])

The 'SpotPriceDKK' and 'SpotPriceEUR' columns have commas not dots in the decimal place, this will cause issues when convering them to numerical values. These will be cleanded and converted to 'float'.

Both 'HourUTC' and 'HourDK' columns are strings which need to be converted to 'datetime' for time-based analysis just like our weather 'data' values. I will set 'HourUTC' as the index as this will allow me to merge the datesend on a common index




In [None]:
#Clean Weather Dataset

# Covert 'date' to 'datetime' and set 'date' as index
weather_df['date'] = pd.to_datetime(weather_df['date'])
weather_df.set_index('date', inplace=True)

print(weather_df.head())
print(weather_df.dtypes)

In [None]:
#Clean Electricity Spot Price Dataset

# Convert the spot prices to numeric, handling commas and converting to float values.
electricity_df['SpotPriceDKK'] = electricity_df['SpotPriceDKK'].str.replace(',', '').astype(float)
electricity_df['SpotPriceEUR'] = electricity_df['SpotPriceEUR'].str.replace(',', '').astype(float)

# Convert the 'HourUTC' and 'HourDK' columns to datetime format
electricity_df['HourUTC'] = pd.to_datetime(electricity_df['HourUTC'])
electricity_df['HourDK'] = pd.to_datetime(electricity_df['HourDK'])

print(electricity_df.head())
print(electricity_df.dtypes)

There is an overlaping time frame from 2017 to 2019 which will be where I merge the two data sets for futher analysis.

In [None]:
# Filter the electricity and weather datasets to match the time range of (2017-2019)
electricity_df_filtered = electricity_df[
    (electricity_df['HourUTC'] >= '2017-01-01') & (electricity_df['HourUTC'] <= '2019-12-31')
]

weather_df_filtered = weather_df[
    (weather_df['date'] >= '2017-01-01') & (weather_df['date'] <= '2019-12-31')
]

# Merge the two datasets on the 'HourUTC' column (adjust if merging on other columns)
merged_df = pd.merge(electricity_df_filtered, weather_df_filtered, left_on='HourUTC', right_index=True, how='inner')

# Display the first few rows of the merged dataframe
print(merged_df.head())
print(merged_df.dtypes)


In [None]:
print(weather_df.columns)
