# **Predicting Electricity Spot Prices Based on Weather Patterns in Nordic Countries**

In this project, we will combine historical weather data and electricity spot price data for the years 2015-2019 in Finland, Norway, and Sweden. Our goal is to predict the electricity spot prices by using weather features like temperature, precipitation, and wind speed.

In [None]:
#Set up and Libaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import plotly.express as px

print("Libaries imported")

In [None]:
#Loading Data
weather_df = pd.read_csv('/kaggle/input/finland-norway-and-sweden-weather-data-20152019/nordics_weather.csv')
electricity_df = pd.read_csv('/kaggle/input/electricity-spot-price/Elspotprices.csv')

print("Datasets Loaded")

In [None]:
# Basic exploration
print(weather_df.head())
print(electricity_df.head())

print(weather_df.info())
print(electricity_df.info())

print(weather_df.isnull().sum())
print(electricity_df.isnull().sum())

Based on the inital exploration of both datasets, we can see there are no missing values but the data structure needs to be cleaned and parsed correctly

Weather Data:
There are no missing values and the data appears to be clean.

The 'date' column is currently type object, which needs to be converted to 'datetime' and set as the iundex. This will allow for easier time series analysis and aslignment with the Electricity Spot Price Dataset

Electricity Spot Price Data:
Therer are no missing values but the dataset requires some cleaning.

The column formatting is currently a single column with concatenated values. These need to be split and orangized into seperate columns.

The 'SpotPriceDKK' and 'SpotPriceEUR' columns have commas not dots in the decimal place, this will cause issues when convering them to numerical values. These will be cleanded and converted to 'float'.

Both 'HourUTC' and 'HourDK' columns are strings which need to be converted to 'datetime' for time-based analysis just like our weather 'data' values. I will set 'HourUTC' as the index as this will allow me to merge the datesend on a common index




In [None]:
#Clean Weather Dataset

# Covert 'date' to 'datetime' and set 'date' as index
weather_df['date'] = pd.to_datetime(weather_df['date'])
weather_df.set_index('date', inplace=True)

print(weather_df.head())

In [None]:
#Clean Electricity Spot Price Dataset


# Split the single column into multiple columns based on the semicolon delimiter
electricity_df = electricity_df[0].str.split(';', expand=True)

# Assign appropriate column names
electricity_df.columns = ['HourUTC', 'HourDK', 'PriceArea', 'SpotPriceDKK', 'SpotPriceEUR']

# Convert types
electricity_df['HourUTC'] = pd.to_datetime(electricity_df['HourUTC'], errors='coerce')
electricity_df['SpotPriceDKK'] = pd.to_numeric(electricity_df['SpotPriceDKK'], errors='coerce')
electricity_df['SpotPriceEUR'] = pd.to_numeric(electricity_df['SpotPriceEUR'], errors='coerce')

# Drop rows with missing values
electricity_df.dropna(inplace=True)

# Check the result
print(electricity_df.head())
