# **Predicting Electricity Spot Prices Based on Weather Patterns in Nordic Countries**

In this project, I will combine weather, electricity spot price and energy productionn and consumption data for Norway in the perido of 2017-2019.

https://www.statnett.no/en/
https://www.ncdc.noaa.gov/cdo-web/
https://www.energidataservice.dk/tso-electricity/Elspotprices

In [None]:
#Set up and Libaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import plotly.express as px

print("Libaries imported")

In [None]:
#Loading Data
weather_df = pd.read_csv('/kaggle/input/finland-norway-and-sweden-weather-data-20152019/nordics_weather.csv')
electricity_df = pd.read_csv('/kaggle/input/electricity-spot-price/Elspotprices.csv', delimiter=';')

# List of production and consumption CSV files.
production_consumption_files = ['/kaggle/input/production-and-consumption2017-2019/ProductionConsumption-2017.csv', 
                                '/kaggle/input/production-and-consumption2017-2019/ProductionConsumption-2018.csv', 
                                '/kaggle/input/production-and-consumption2017-2019/ProductionConsumption-2019.csv']

# Read production and comsumption CSV files into dataframes and concatenate them together.
dfs = [pd.read_csv(file, delimiter=';') for file in production_consumption_files]
production_consumption_df = pd.concat(dfs, ignore_index=True)

print("Datasets Loaded")

In [None]:
# Basic exploration
print(weather_df.head())
print(weather_df.dtypes)
print(weather_df.isnull().sum())

print(electricity_df.head())
print(electricity_df.dtypes)
print(electricity_df.isnull().sum())

print(production_consumption_df.head())
print(production_consumption_df .dtypes)
print(production_consumption_df .isnull().sum())

Based on the inital exploration of the datasets, we can see there are no missing values but the data structure needs to be cleaned and parsed correctly and the time zones need to be aligned.

Weather Data:
There are no missing values and the data appears to be clean.

The 'date' column is currently type 'object', which needs to be converted to 'datetime' and set as the index. This will allow for easier time series analysis and aslignment with the other datasets.

Electricity Spot Price Data:
Therer are no missing values but the dataset requires some cleaning.

The dataframe formatting means we need to load the data with a delimiter ';'

I will remove the 'SpotPriceDKK', 'HourDK' and 'PriceArea' columns due to redudency.

The 'SpotPriceEUR' column has commas not dots in the decimal place, this will cause issues when convering them to numerical values. These converted to 'float'.

The 'HourUTC'column is a strings, which need to be converted to 'datetime' for time-based analysis just like our other datasets. I will set 'HourUTC' as the index.

Production and Conmsumption Data: ---
The dataframe formatting means we need to load the data with a delimiter ';'
I also need to specify the correct formating for the date and time before converting to datetime and setting it the index.


In [None]:
#Clean Weather Dataset

# Covert 'date' to 'datetime' and set 'date' as index
weather_df['date'] = pd.to_datetime(weather_df['date'])
weather_df.set_index('date', inplace=True)

print(weather_df.head())
print(weather_df.dtypes)

In [None]:
#Clean Electricity Spot Price Dataset

# Convert the 'HourUTC' column to datetime
electricity_df['HourUTC'] = pd.to_datetime(electricity_df['HourUTC'])

# Set 'HourUTC' as index for the electricity data
electricity_df.set_index('HourUTC', inplace=True)

# Drop the redundant columns 'SpotPriceDKK' and 'TimeDKK'
electricity_df = electricity_df.drop(columns=['SpotPriceDKK', 'HourDK', 'PriceArea'])


print(electricity_df.head())
print(electricity_df.dtypes)

In [None]:
# Clean Production and Consumption Dataset

# Convert 'Time' to datetime without timezone information
production_consumption_df['Time'] = pd.to_datetime(production_consumption_df['Time'])

# 3. Set 'Time' as index
production_consumption_df.set_index('Time', inplace=True)

# Display the first few rows and the data types
print(production_consumption_df.head())
print(production_consumption_df.dtypes)


There is an overlaping time frame from 2017 to 2019 which will be where I merge the datasets for futher analysis and filter the weather dataset for Norway specificly.

In [None]:
# Filter the datasets to match the time range of (2017-2019)
weather_df_filtered = weather_df[
    (weather_df.index >= '2017-01-01') & (weather_df.index <= '2019-12-31') & (weather_df['country'] == 'Norway')
]
electricity_df_filtered = electricity_df[
    (electricity_df.index >= '2017-01-01') & (electricity_df.index <= '2019-12-31')
]
production_consumption_df_filtered = production_consumption_df[
    (production_consumption_df.index >= '2017-01-01') & (production_consumption_df.index <= '2019-12-31')
]

# Merge electricity and weather data
merged_df_1 = pd.merge(
    electricity_df_filtered, 
    weather_df_filtered, 
    left_on='HourUTC', 
    right_index=True, 
    how='inner'
)

# Merge the result with production_consumption_df
merged_df = pd.merge(
    merged_df_1, 
    production_consumption_df_filtered, 
    left_on='HourUTC', 
    right_index=True, 
    how='inner'
)

# Remove duplicates (if any)
merged_df = merged_df.drop_duplicates()

# Reset the index for easier manipulation
merged_df.reset_index(inplace=True)

# Display the first few rows of the merged dataframe
print(merged_df.head())
print(merged_df.dtypes)

#Outliers

Due to the nature of spot price and to some exten production and consumption it would be benefitial to look for outliers that could skew the data in futher analysis.


In [None]:
#Boxplot of Spotprice to check for outliers
plt.figure(figsize=(10, 6))
sns.boxplot(x=merged_df['SpotPriceEUR'])
plt.title("Boxplot of Electricity Spot Prices")
plt.show()

#Using IQR to verify the results from the inital Boxpot
Q1 = merged_df['SpotPriceEUR'].quantile(0.25)
Q3 = merged_df['SpotPriceEUR'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = merged_df[(merged_df['SpotPriceEUR'] < lower_bound) | (merged_df['SpotPriceEUR'] > upper_bound)]
print(outliers)

Based on the initial results from the boxplot and IQR. There are some outliers / anomalies that need more investigation. Initially I want to check the negative values in the spot price against the production and consumption values to see if these values are realistic of over production / under consumption or if they are indeed an outliers / anomalies.

In [None]:
# Filter for negative spot prices
negative_prices = merged_df[merged_df['SpotPriceEUR'] < 0]

# Plot Spot Price vs Production
plt.figure(figsize=(10, 6))
plt.scatter(negative_prices['Production'], negative_prices['SpotPriceEUR'], color='blue', alpha=0.6, label="Negative Spot Prices")
plt.xlabel('Production (MW)', fontsize=12)
plt.ylabel('Spot Price (EUR)', fontsize=12)
plt.title('Negative Spot Prices vs Production', fontsize=14)
plt.axhline(0, color='red', linestyle='--', label='Spot Price = 0')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

# Plot Spot Price vs Consumption
plt.figure(figsize=(10, 6))
plt.scatter(negative_prices['Consumption'], negative_prices['SpotPriceEUR'], color='green', alpha=0.6, label="Negative Spot Prices")
plt.xlabel('Consumption (MW)', fontsize=12)
plt.ylabel('Spot Price (EUR)', fontsize=12)
plt.title('Negative Spot Prices vs Consumption', fontsize=14)
plt.axhline(0, color='red', linestyle='--', label='Spot Price = 0')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()


Spot Price vs. Production:
There is a pattern where the negative spot prices are associated with production levels. These negative prices seem to occur when production is higher. This could indicate an oversupply of electricity, leading to a reduction in prices (potentially even negative values) as producers might be paid to offload excess electricity.

Spot Price vs. Consumption:
However negative spot prices also seem to correlate with higher consumption levels. For both high production and high consumption to be associated with negative spot prices, this does seem unusual. Typically, negative electricity prices can occur when there is an oversupply relative to demand. 

Due to both production and consumption being high and the spot price is still negative, it could indicate a recording error or an anomaly. To gte more information I will cross-check the values for boither the production and consumption where these negative prices are reconrded.

In [None]:
# Filtering rows where the spot price is negative
negative_spot_prices = merged_df[merged_df['SpotPriceEUR'] < 0]

# Checking the corresponding production and consumption values
negative_spot_prices[['HourUTC', 'SpotPriceEUR', 'Production', 'Consumption']]


It is evident that there are multiple instances of negative spot prices recorded alongside varying levels of electricity production and consumption.

Looking at the relationship between these negative spot prices and production/consumption:
For the negative spot price values, the corresponding production and consumption vary widely. Some rows show high levels of production (e.g., 14140 MW production with a spot price of -11,150,000 EUR), while others show moderate consumption.

This suggests that the negative spot prices in this dataset may not necessarily be caused by the production or consumption values directly. However, the large discrepancies (such as negative values of -46 million EUR) do raise questions about potential data entry errors or anomalies.
Potential next steps:

    Data Validation: Investigate whether these extreme negative values have been correctly recorded. This might involve verifying against external sources or performing checks within the dataset (e.g., verifying values against expected ranges or correcting errors based on domain knowledge).
    Trend Analysis: Explore if there is a pattern in these negative spot prices across time or seasons. Are they occurring during specific times (e.g., holidays or periods of high demand)?
    Impact of Extreme Values: Analyze how these extreme spot price values influence your models and the overall analysis. If errors are confirmed, you may decide to either remove or adjust the erroneous data points.

Let me know if you'd like to proceed with further analysis or exploration!