# Model development

This notebook is intended for the development of price forecasting models.

# 1. A Simple Baseline Model

In our exploratory data analysis (see `exploratory_analysis.ipynb`), we inspected a number of variables and their relationship with electricity prices using actual historical data. Most of the variables we looked at appeared to have a meaningful connection to electricity prices, so it would be reasonable to use all of those variables in a price forecasting model, however, some are redundant since we don't need to include both wind speed and wind generation, for example. We also saw in the autocorrelation plots that electricity prices at time $t$ were related to prices at earlier times.

We also need to consider that the available data for forecasting are different. It is possible to collect forecasts of electricity demand, wind and solar generation from the National Energy System Operator (NESO) website. It is also possible to get data on prices of futures contracts on goal and gas from commodities markets, but this type of data is often behind paywalls.

With the above considerations, we can propose an electricity pricing model of the form

$$
P^{\mathrm{el}}_{t} = \sum^{\ell}_{k=1} \alpha_k P^{\mathrm{el}}_{t - k} + g(t, \mathbf{X}^f_{t}, \mathbf{Y}_{t-1}; \boldsymbol{\theta}) + \epsilon_t,
$$

where
- $P^{\mathrm{el}}_t$: price of electricity at time t
- $g$: some function parameterised by $\boldsymbol{\theta}$
- $\mathbf{X}^f_t$: a vector of forecasted exogeneous variables
- $\mathbf{Y}_{t-1}$: a vector of actual exogeneous variables
- $\epsilon_t$: noise term

Currently available forecasted exogeneous variables are:
- $D^f_t$: forecasted electricity demand 
- $W^f_t$: forecasted wind generation 
- $S^f_t$: forecasted solar generation

The actual exogeneous variables we will use are:
- Natural gas prices
- The electricity generation mix excluding renewables (wind and solar)

The idea of including the non-renewable generation mix term is to capture that non-renewable generation like nuclear and biomass is slowly varying.

The proposed model is an autoregressive model with exogeneous variables (ARX). For simplicity we will use $g$ that is linearly dependent on the exogenous variables. 