# QBUS3850 Lab 1 Tasks

This tutorial will cover reading in data using pandas, understanding how dates and times work in Python and implementing an expanding window. The dataset that we will use is the electricity dataset from lectures (this can be found on Canvas). We will evaluate forecasts from simple exponential smoothing, Holt's method and the Holt Winters' additive method.

## Data and data types

1. Read the data from the file into a variable called `df`. 
2. Check the type of each variable by running `df.dtypes`. Is the variable `SETTLEMENTDATE` a datetime or an object? 
3. If it is an object, convert it to a datetime.

The function `read_csv()` can be used to read in the data, make sure that the file electricity.csv is in your working directory otherwise provide a full path to the file.

In [None]:
# TODO: read dataset into dataframe

By default, `read_csv()` has read in `SETTLEMENTDATE` as the same type as `REGION`, i.e. as a string and not a datetime. Sometimes we can add the argument `parse_dates=True` but that will not work in this case. Instead to coerce `SETTLEMENTDATE` to a datetime run the following:

In [None]:
df['SETTLEMENTDATE'] = pd.to_datetime(df['SETTLEMENTDATE'])
df.dtypes

Note now that the data type of `SETTLEMENTDATE` is a datetime.

Let's define a helper function that trains and evaluate a given model

In [None]:
def evaluate_model(model, hmax, test_ses):
    """
    A helper function that evalutae a given model, and returns the
    squared error against a given test series.
    """
    #Fit Model
    fit_ses = model.fit()
    #Make forecasts
    fc_ses = fit_ses.forecast(hmax)
    #Compute square error
    return np.square(fc_ses-test_ses)

### Forecast exercise

1. Using data from April 1, 00:30  to April 25, 00:00 as training data generate forecasts for the next 6 hours (12 half hour periods) using:
  - Simple Exponential Smoothing
  - Holt's Method
  - Holt Winters' additive method
2. Compute the squared error (i.e. $(y_{t+h}-\hat{y}_{t+h})^2$) for each method at each horizon


In [None]:
#Switch off warnings
import warnings
warnings.simplefilter('ignore')

#datetime needed to manipulate dates
import datetime 
#numpy needed to work with vectors
import numpy as np
#statmodels needed for models
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt, ExponentialSmoothing

# TODO: Implements forecast model for different methods

Some things to notice:

- Holt Winters has the lowest square error at one step ahead
- Holt has the lowest square error 12-steps  ahead
- Simple exponential smoothing has the worst squared errors at all horizons
- None of this is meaningful since we are only looking at a single instance of forecasts.

3. Repeat the same exercise but for an expanding window that expands one half hour at a time. Do this over 192 windows (i.e. four days). This will take a minute or two to run.
4. Compute Root Mean Square Error (RMSE) given by $RMSE=\sqrt{\frac{1}{192}\sum (y_{t+h}-\hat{y}_{t+h})^2}$ over all windowsfort each forcasting horizon.
5. Which method is the best at a one-step ahead horizon?
6. Which method is the best at a twelve-step ahead horizon?


In [None]:
#Switch off warnings
import warnings
warnings.simplefilter('ignore')

# TODO: define moving window

#Loop

for i in datetime_list:
    "TODO: implements the evaluation methods over a moving window"
    


rmse_ses = np.sqrt(rmse_ses/n_wind)
rmse_holt = np.sqrt(rmse_holt/n_wind)
rmse_hw = np.sqrt(rmse_hw/n_wind)

#Initialise Results data frame
res = pd.DataFrame({
    'h': range(1, hmax+1),
    'SES': rmse_ses,
    'Holt': rmse_holt,
    'HW': rmse_hw,
})

print(res)

- The best method one-step ahead is the Holt Winters Method.
- The best method twelve-steps ahead is Simple Exponential Smoothing.
- Holt performs reasonably well at short horizons but very poorly at medium to long horizons.
- Another thing to note is that we are not using all of the data. If we use all available data then we will have some windows towards the end for which longer-horizon forecasts can not be evaluated. This is not a major problem, but note that the denominator in MSE will be different for different horizons (unlike here where we could divide by 192 at all horizons to compute RMSE).  

7. What is a major shortcoming of this evaluation? Hint: What happens on April 25th in Australia.

April 25th is Anzac Day a major public holiday in Australia. In 2021 it was on a Sunday meaning the holiday was moved to the 26th. Therefore the evaluation period includes a public holiday and these days are typically idiosyncratic.

## Additional Exercises (For those who finish quickly or as subsequent homework)

Modify the code above:

  1. To use a rolling rather than expanding window.
  2. To roll the window forward by 4 hours rather than half an hour.
  3. To use the mean absolute error $MAE=\frac{1}{192}\sum|y+{t+h}-\hat{y}_{t+h}|$ as an evaluation criterion.