In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
import plotly.graph_objects as go

from utils import utils
from algorithms import predict_model
import algorithms.flexa_battery_optimise as algorithm

# Before you continue
I build two random forest models to predict price and you could get the details of them in the code below. But both of them are far away from perfect. We could of course fine-tune them but due to lack of time I will not do that now. Also we could add extra data to build more powerful models, like load (power consumtion), power generation data, or holidays, but I guess what I have done now is enough for the purpose of a code challenge :)

A interesting finding here is, although model 2 has better Mean Squared Error compared with model 1, but acutally using model 1 give you better revenue, as model 1 catches the trend of the price better.

### Read data

In [2]:
data_60 = utils.transform_df(pd.read_csv('data/Day-ahead_Prices_60min.csv')).drop(columns=['BZN|DE-LU'])

### Build the first model
Random Forest model is trained with the first 5 month data as trainset and last month data as testset. Input of the model is only the weekday and hour.

In [3]:
train_data, test_data, X_train, X_test, y_train, y_test = predict_model.split_train_and_test(data_60)
model_1 = predict_model.random_forest(X_train, y_train)
# Make predictions on the test set
y_pred = model_1.predict(X_test)
test_data_1 = test_data.assign(y_pred=y_pred)
# Evaluate the model
mse_1 = mean_squared_error(y_test, y_pred)
print('Mean Squared Error of model 1:', mse_1)

Mean Squared Error of model 1: 5942.474712048875


Show the result of the prediction

In [4]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=test_data_1['start_timestamp'], y=test_data_1['spot_price'], mode='lines', name='spot_price'))
fig.add_trace(go.Scatter(x=test_data_1['start_timestamp'], y=test_data_1['y_pred'], mode='lines', name='prediction'))
fig.update_layout(title='Spot Price vs. Predicted Price --- Model 1 without previous day price', xaxis_title='Start Timestamp', yaxis_title='Price')
fig.show()


### Build second model
The second model is still random forest, but use the price of the same hour but yesterday as another input for the model.

In [5]:
train_data, test_data, X_train, X_test, y_train, y_test = predict_model.split_train_and_test(data_60, data_of_yesterday=True)
model_2 = predict_model.random_forest(X_train, y_train)
# Make predictions on the test set
y_pred = model_2.predict(X_test)
test_data_2 = test_data.assign(y_pred=y_pred)
# Evaluate the model
mse_2 = mean_squared_error(y_test, y_pred)
print('Mean Squared Error of model 2:', mse_2)

Mean Squared Error of model 2: 3347.611894672622


In [6]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=test_data_2['start_timestamp'], y=test_data_2['spot_price'], mode='lines', name='spot_price'))
fig.add_trace(go.Scatter(x=test_data_2['start_timestamp'], y=test_data_2['y_pred'], mode='lines', name='prediction'))
fig.update_layout(title='Spot Price vs. Predicted Price --- Model 2 with previous day price', xaxis_title='Start Timestamp', yaxis_title='Price')
fig.show()

### Optimization
Use the prediction-model to predict the next 24 hours spot price for everyday in June and optimize the operation of the battery

First to see the result based on the prediction of model 1

In [7]:
df_1 = utils.split_60_mins_data_into_15_mins(test_data_1)
result_1 = algorithm.battery_optimisation(df_1.start_timestamp, df_1.y_pred, include_revenue=False, solver='glpk').rename(columns={'spot_price': 'predicted_spot_price', 'datetime': 'start_timestamp'})
result_1 = result_1.merge(df_1[['start_timestamp', 'spot_price']], on='start_timestamp', how='left')

In [8]:
result_1[0:50]

Unnamed: 0,start_timestamp,predicted_spot_price,power,market_dispatch,opening_capacity,spot_price
0,2022-05-30 23:00:00,182.699978,0.0,0.0,0.0,227.5
1,2022-05-30 23:15:00,182.699978,0.0,0.0,0.0,227.5
2,2022-05-30 23:30:00,182.699978,0.0,0.0,0.0,227.5
3,2022-05-30 23:45:00,182.699978,0.0,0.0,0.0,227.5
4,2022-05-31 00:00:00,176.647798,0.0,0.0,0.0,200.91
5,2022-05-31 00:15:00,176.647798,0.0,0.0,0.0,200.91
6,2022-05-31 00:30:00,176.647798,0.0,0.0,0.0,200.91
7,2022-05-31 00:45:00,176.647798,0.0,0.0,0.0,200.91
8,2022-05-31 01:00:00,179.335972,0.0,0.0,0.0,183.0
9,2022-05-31 01:15:00,179.335972,0.0,0.0,0.0,183.0


Now we get the result of the first optimization, let's use the real spot price to calculate the real revenue in this case.

In [9]:
MLF = 1
result_1['revenue'] = np.where(result_1.market_dispatch < 0, 
                              result_1.market_dispatch * result_1.spot_price / MLF,
                              result_1.market_dispatch * result_1.spot_price * MLF)
revenue_sum = result_1['revenue'].sum()
print(f"The revenue is {round(revenue_sum, 2)} Euro.")
print("As a comparison, the revenue in question 1 (with perfect prediction) is 4365.44 Euro")

The revenue is 3784.16 Euro.
As a comparison, the revenue in question 1 (with perfect prediction) is 4365.44 Euro


Now let's see the optimization based on the model 2.

In [10]:
df_2 = utils.split_60_mins_data_into_15_mins(test_data_2)
result_2 = algorithm.battery_optimisation(df_2.start_timestamp, df_2.y_pred, include_revenue=False, solver='glpk').rename(columns={'spot_price': 'predicted_spot_price', 'datetime': 'start_timestamp'})
result_2 = result_2.merge(df_2[['start_timestamp', 'spot_price']], on='start_timestamp', how='left')

In [17]:
MLF = 1
result_2['revenue'] = np.where(result_2.market_dispatch < 0, 
                              result_2.market_dispatch * result_2.spot_price / MLF,
                              result_2.market_dispatch * result_2.spot_price * MLF)
revenue_sum = result_2['revenue'].sum()
print(f"The revenue is {round(revenue_sum, 2)} Euro.")
print("As a comparison, the revenue in question 1 (with perfect prediction) is 4365.44 Euro")
print("And as another comparison, a bad model (model 1, bad in the sense of MSE error) can actually give you better revenue.")
print("I think the reason is, model 1 catches the trend of the price better. It tells you better when the price could go high or go low.")

The revenue is 840.57 Euro.
As a comparison, the revenue in question 1 (with perfect prediction) is 4365.44 Euro
And as another comparison, a bad model (model 1, bad in the sense of MSE error) can actually give you better revenue.
I think the reason is, model 1 catches the trend of the price better. It tells you better when the price could go high or go low.
