## Evaluation and Calibration of the Empirical Models in TELL

This notebook is meant to be a landing page for exploring the empirical models that underpin `tell`. It will set up the meteorological variables we use and analyze different aspects of the model development and evaluation.

In [None]:
# Start by importing the TELL package and information about your operating system:
import os 
import tell

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from pandas import DataFrame
from glob import glob
from datetime import datetime


In [None]:
# Identify the current working directory, the "tell_data" directory where the data will is stored, and the image output directory:
current_dir =  os.path.join(os.path.dirname(os.getcwd()))
tell_data_dir = os.path.join(current_dir, r'tell_data')
tell_image_dir = os.path.join(tell_data_dir, r'outputs', r'visualizations')


### Variable Correlations

The first thing we'll look at is the relationship between the predictive variables in `tell` and the historical electricity demand for different Balancing Authorities (BAs). The function below plots the correlation between the different variables in the historical observed data that underpins `tell`. In general, demand is fairly well correlated with most of the meteorological variables, with the strongest dependency being on 2-m air temperature (T2). The stength of the correlations varies significantly from BA-to-BA. Demand in some BAs is closely tied to variations in weather (e.g., NEVP, ERCO, FPL). In more moderate climates (e.g., SCL, BPAT) the relationships are weaker. This pattern is useful for interpreting the performance of the empircal models in `tell`.

In [None]:
# Plot the correlation between predictive variables in TELL and the historical demand in a given BA. Note that if you set the 'ba_to_plot' variable to "All" the function 
# will compute and plot the average correlation across all BAs:
#tell.plot_ba_variable_correlations(ba_to_plot = 'NEVP', 
                                   #data_input_dir = tell_data_dir,
                                   #image_output_dir = tell_image_dir,
                                   #image_resolution = 150,
                                   #save_images = False)


In [None]:
# Train the empirical models using the standard settings:
prediction_df, validation_df = tell.train_batch(target_region_list = (tell.get_balancing_authority_to_model_dict().keys()),
                                                data_dir = os.path.join(tell_data_dir, r'outputs', r'compiled_historical_data'),
                                                n_jobs = -1)

# Run the MLP training step for a single BA (i.e., "region"):
#prediction_df, validation_df = tell.train(region = 'PJM',
                                          #data_dir = os.path.join(tell_data_dir, r'outputs', r'compiled_historical_data'))

# View the validation dataframe that contains error statistics for the trained models:
validation_df


In [None]:
# Plot the statistical performance (e.g., RMS_ABS, RMS_NORM, MAPE, or R2) of the predictive models across all the BAs in TELL:
tell.plot_mlp_summary_statistics(validation_df, 
                                 image_output_dir = tell_image_dir,
                                 image_resolution = 150,
                                 save_images = True)
