# Train and Run the Underlying MLP Models in TELL Using Evolving Time Windows

This notebook is the core tool for training and running the evolving sets of multilayer perceptron (MLP) models used in this experiment. It assumes that the base Total ELectricity Loads (TELL) model has already been installed and the datasets updated to include the most recent data.

In [7]:
# Start by importing the TELL package and information about your operating system:
import os 
import tell
import yaml

import pandas as pd


## Set the Directory Structure


In [5]:
# Identify the top-level directory to store the trained MLP models:
cleaned_ba_data_output_dir =  '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/cleaned_historical_data/'
ba_to_process_input_dir =  '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/'
weather_data_input_dir =  '/Users/burl878/Documents/Code/code_repos/tell/tell/tell_data/sample_forcing_data/historical_weather'
model_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/trained_mlp_models/'
prediction_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/mlp_projections/'
composite_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/composite_projections/'


## Set the List of Balancing Authorities to Analyze

BAs used in this analysis are controlled by a master file `balancing_authorities_modeled.yml` stored in the `/data` directory.

In [8]:
# Read the yml file into a dictionary:
with open((ba_to_process_input_dir + 'balancing_authority_modeled.yml'), 'r') as yml:
     ba_list = yaml.load(yml, Loader=yaml.FullLoader)
     bas = [i for i in ba_list.keys()]

# Return the list of BAs to process/plot:
bas


['AZPS', 'BPAT', 'CISO', 'ERCO', 'FPL', 'ISNE', 'PJM', 'SWPP']

## MLP Model Training

The MLP models underpinning TELL use temporal variations in weather to project hourly demand. More information about this approach is in the MLP section of the `tell` [User Guide](https://immm-sfa.github.io/tell/user_guide.html). The default settings for the MLP model training steps are included in the `mlp_settings.yml` file included in the data folder of the `tell` repository. By default the MLP models are trained on data from 2016-2018 and evaluated using data from 2019. The time windows for training and evaluating the models can be modified by altering the `start_time`, `end_time`, and `split_datetime` parameters when calling the `tell.train` function. The workflow does not change the default MLP parameters (e.g., hidden layer sizes, maximum number of iterations, etc.) from TELL. Those could be modified in theory.

In [9]:
# For more information about the training of predictive models you can call the help function:
help(tell.train)


Help on function train in module tell.mlp_train:

train(region: str, data_dir: str, **kwargs)
    Generate predictions for MLP model for a target region from an input CSV file.
    
    :param region:                      Indicating region / balancing authority we want to train and test on.
                                        Must match with string in CSV files.
    :type region:                       str
    
    :param data_dir:                    Full path to the directory that houses the input CSV files.
    :type data_dir:                     str
    
    :param mlp_hidden_layer_sizes:      The ith element represents the number of neurons in the ith hidden layer.
    :type mlp_hidden_layer_sizes:       Optional[int]
    
    :param mlp_max_iter:                Maximum number of iterations. The solver iterates until convergence
                                        (determined by ‘tol’) or this number of iterations. For stochastic solvers
                                     

In [13]:
# Run the MLP training step for a single BA to get a feel of the functionality:
prediction_df, validation_df = tell.train(region = 'CISO',
                                          data_dir = cleaned_ba_data_output_dir,
                                          start_time = '2018-01-01 00:00:00',
                                          end_time = '2020-12-31 23:00:00',
                                          split_datetime = '2019-12-31 23:00:00',
                                          save_model = True,
                                          model_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/trained_mlp_models/MLP1/')

# View the head of the prediction dataframe that contains the time-series of projected load in the evaluation year:
display(prediction_df.head(10))

# View validation dataframe that contains error statistics for the trained model:
validation_df


Unnamed: 0,datetime,predictions,ground_truth,region
0,2020-01-01 00:00:00,19558.85504,22497.0,CISO
1,2020-01-01 01:00:00,24234.389987,24297.0,CISO
2,2020-01-01 02:00:00,24867.025706,26918.0,CISO
3,2020-01-01 03:00:00,25423.816113,26849.0,CISO
4,2020-01-01 04:00:00,25082.532154,25968.0,CISO
5,2020-01-01 05:00:00,24106.800709,25044.0,CISO
6,2020-01-01 06:00:00,23192.030982,24185.0,CISO
7,2020-01-01 07:00:00,22227.265649,23305.0,CISO
8,2020-01-01 08:00:00,21233.518247,22467.0,CISO
9,2020-01-01 09:00:00,20275.718493,21810.0,CISO


Unnamed: 0,BA,RMS_ABS,RMS_NORM,MAPE,R2
0,CISO,1990.201611,0.080216,0.058195,0.859668


In [15]:
# Run the training iteratively across all training windows and BAs:
for model in ['MLP1', 'MLP2', 'MLP3', 'MLP4', 'MLP5', 'MLP6']:

    # Set the training and evaluation periods for the specific model training window:
    if model == 'MLP1':
       start_time_model = '2016-01-01 00:00:00'
       end_time_model = '2018-12-31 23:00:00'
       split_datetime_model = '2017-12-31 23:00:00'
    if model == 'MLP2':
       start_time_model = '2017-01-01 00:00:00'
       end_time_model = '2019-12-31 23:00:00'
       split_datetime_model = '2018-12-31 23:00:00'
    if model == 'MLP3':
       start_time_model = '2018-01-01 00:00:00'
       end_time_model = '2020-12-31 23:00:00'
       split_datetime_model = '2019-12-31 23:00:00'
    if model == 'MLP4':
       start_time_model = '2019-01-01 00:00:00'
       end_time_model = '2021-12-31 23:00:00'
       split_datetime_model = '2020-12-31 23:00:00'
    if model == 'MLP5':
       start_time_model = '2020-01-01 00:00:00'
       end_time_model = '2022-12-31 23:00:00'
       split_datetime_model = '2021-12-31 23:00:00'
    if model == 'MLP6':
       start_time_model = '2021-01-01 00:00:00'
       end_time_model = '2023-12-31 23:00:00'
       split_datetime_model = '2022-12-31 23:00:00'

    # Create the model output directory name:
    output_directory = (model_output_directory + model + '/')
  
    # Check to see if the output directory exist and if not then create it:
    if not os.path.exists(output_directory):
       os.makedirs(output_directory)

    # Loop over the eight BAs used in this LDRD analysis:
    for ba in bas:
        
        # Run the MLP training for that BA and model training period:
        prediction_df, validation_df = tell.train(region = ba,
                                                  data_dir = cleaned_ba_data_output_dir,
                                                  start_time = start_time_model,
                                                  end_time = end_time_model,
                                                  split_datetime = split_datetime_model,
                                                  save_model = True,
                                                  model_output_directory = output_directory)

        # Print the model and BA combination to monitor the progress:
        print('Model = ', model, ', BA = ', ba)


Model =  MLP1 , BA =  AZPS
Model =  MLP1 , BA =  BPAT
Model =  MLP1 , BA =  CISO
Model =  MLP1 , BA =  ERCO
Model =  MLP1 , BA =  FPL
Model =  MLP1 , BA =  ISNE
Model =  MLP1 , BA =  PJM
Model =  MLP1 , BA =  SWPP
Model =  MLP2 , BA =  AZPS
Model =  MLP2 , BA =  BPAT
Model =  MLP2 , BA =  CISO
Model =  MLP2 , BA =  ERCO
Model =  MLP2 , BA =  FPL
Model =  MLP2 , BA =  ISNE
Model =  MLP2 , BA =  PJM
Model =  MLP2 , BA =  SWPP
Model =  MLP3 , BA =  AZPS
Model =  MLP3 , BA =  BPAT
Model =  MLP3 , BA =  CISO
Model =  MLP3 , BA =  ERCO
Model =  MLP3 , BA =  FPL
Model =  MLP3 , BA =  ISNE
Model =  MLP3 , BA =  PJM
Model =  MLP3 , BA =  SWPP
Model =  MLP4 , BA =  AZPS
Model =  MLP4 , BA =  BPAT
Model =  MLP4 , BA =  CISO
Model =  MLP4 , BA =  ERCO
Model =  MLP4 , BA =  FPL
Model =  MLP4 , BA =  ISNE
Model =  MLP4 , BA =  PJM
Model =  MLP4 , BA =  SWPP
Model =  MLP5 , BA =  AZPS
Model =  MLP5 , BA =  BPAT
Model =  MLP5 , BA =  CISO
Model =  MLP5 , BA =  ERCO
Model =  MLP5 , BA =  FPL
Model =  M

## Project Forward Using the Trained MLP Models

Use the TELL forward projection function to run the trained MLP models forward in time to project "future" loads.

In [16]:
# For more information about the how to use the models to project loads forward in time you can call the help function:
help(tell.predict)


Help on function predict in module tell.mlp_predict:

predict(region: str, year: int, data_dir: str, datetime_field_name: str = 'Time_UTC', save_prediction: bool = False, prediction_output_directory: Optional[str] = None, **kwargs)
    Generate predictions for MLP model for a target region from an input CSV file.
    
    :param region:                      Indicating region / balancing authority we want to train and test on.
                                        Must match with string in CSV files.
    :type region:                       str
    
    :param year:                        Target year to use in YYYY format.
    :type year:                         int
    
    :param data_dir:                    Full path to the directory that houses the input CSV files.
    :type data_dir:                     str
    
    :param save_prediction:             Choice to write predictions to a .csv file
    :type save_prediction:              bool
    
    :param prediction_output_directory

In [17]:
# Run the MLP forward projection step for a single BA to get a feel of the functionality:
pdf = tell.predict(region = 'CISO',
                   year = 2023,
                   data_dir = weather_data_input_dir,
                   datetime_field_name = 'Time_UTC',
                   save_prediction = True,
                   model_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/trained_mlp_models/MLP1',
                   prediction_output_directory = '/Users/burl878/Documents/Code/code_repos/burleyson-etal_2025_ldrd/data/mlp_projections/MLP1')

pdf


Unnamed: 0,Time_UTC,Load,BA
0,2023-01-01 00:00:00,25411.36,CISO
1,2023-01-01 01:00:00,26044.72,CISO
2,2023-01-01 02:00:00,26146.16,CISO
3,2023-01-01 03:00:00,26215.70,CISO
4,2023-01-01 04:00:00,26246.76,CISO
...,...,...,...
8755,2023-12-31 19:00:00,22907.30,CISO
8756,2023-12-31 20:00:00,23024.45,CISO
8757,2023-12-31 21:00:00,23075.88,CISO
8758,2023-12-31 22:00:00,23113.96,CISO


In [18]:
# Run the forward projection step iteratively across all training windows and BAs:
for model in ['MLP1', 'MLP2', 'MLP3', 'MLP4', 'MLP5', 'MLP6']:

    # Set the first forward year for each model:
    if model == 'MLP1':
       first_forward_year = 2018
    if model == 'MLP2':
       first_forward_year = 2019
    if model == 'MLP3':
       first_forward_year = 2020
    if model == 'MLP4':
       first_forward_year = 2021
    if model == 'MLP5':
       first_forward_year = 2022
    if model == 'MLP6':
       first_forward_year = 2023

    # Create the model output directory name:
    output_directory = (model_output_directory + model + '/')
  
    # Check to see if the output directory exist and if not then create it:
    if not os.path.exists(output_directory):
       os.makedirs(output_directory)

    # Loop over the eight BAs used in this LDRD analysis:
    for ba in bas:

        # Loop over the years from the first forward year for that model through 2023:
        for year in range(first_forward_year,2024,1):
        
            # Run the MLP forward projection for that BA, model, and year:
            pdf = tell.predict(region = ba,
                               year = year,
                               data_dir = weather_data_input_dir,
                               datetime_field_name = 'Time_UTC',
                               save_prediction = True,
                               model_output_directory = (model_output_directory + model + '/'),
                               prediction_output_directory = (prediction_output_directory + model + '/'))

            # Print the model and BA combination to monitor the progress:
            print('Model = ', model, ', BA = ', ba, ', Year = ', str(year))


Model =  MLP1 , BA =  AZPS , Year =  2018
Model =  MLP1 , BA =  AZPS , Year =  2019
Model =  MLP1 , BA =  AZPS , Year =  2020
Model =  MLP1 , BA =  AZPS , Year =  2021
Model =  MLP1 , BA =  AZPS , Year =  2022
Model =  MLP1 , BA =  AZPS , Year =  2023
Model =  MLP1 , BA =  BPAT , Year =  2018
Model =  MLP1 , BA =  BPAT , Year =  2019
Model =  MLP1 , BA =  BPAT , Year =  2020
Model =  MLP1 , BA =  BPAT , Year =  2021
Model =  MLP1 , BA =  BPAT , Year =  2022
Model =  MLP1 , BA =  BPAT , Year =  2023
Model =  MLP1 , BA =  CISO , Year =  2018
Model =  MLP1 , BA =  CISO , Year =  2019
Model =  MLP1 , BA =  CISO , Year =  2020
Model =  MLP1 , BA =  CISO , Year =  2021
Model =  MLP1 , BA =  CISO , Year =  2022
Model =  MLP1 , BA =  CISO , Year =  2023
Model =  MLP1 , BA =  ERCO , Year =  2018
Model =  MLP1 , BA =  ERCO , Year =  2019
Model =  MLP1 , BA =  ERCO , Year =  2020
Model =  MLP1 , BA =  ERCO , Year =  2021
Model =  MLP1 , BA =  ERCO , Year =  2022
Model =  MLP1 , BA =  ERCO , Year 

## Compile the Historical and Forward Projection Datasets into a Single .csv File for Each BA


In [19]:
# Loop over the eight BAs used in this LDRD analysis:
for ba in bas:

    # Load in the compiled historical data from the TELL repository:
    base_df = pd.read_csv((cleaned_ba_data_output_dir + ba + '_cleaned_historical_data.csv'), index_col=None, header=0)

    # Convert the time columns into one datetime variable:
    base_df['Time_UTC'] = pd.to_datetime(base_df[['Year', 'Month', 'Day', 'Hour']])
    
    # Convert the temperature from Kelvin to Fahrenheit:
    base_df['T2'] = ((1.8 * (base_df['T2'] - 273)) + 32).round(2)

    # Round the population to the nearest whole integer and demand to the nearest tenth:
    base_df['Cleaned_Demand_MWh'] = base_df['Cleaned_Demand_MWh'].round(1)
    base_df['Total_Population'] = base_df['Total_Population'].round(0).astype(int)
    
    # Rename the demand and population columns to something simpler:
    base_df.rename(columns={'Cleaned_Demand_MWh': 'Demand_MWh', 'Total_Population': 'Population'}, inplace=True)

    # Add in the BA code for better tracking:
    base_df['BA'] = ba
    
    # Only keep the columns that are needed:
    base_df = base_df[['BA', 'Time_UTC', 'T2', 'Q2', 'SWDOWN', 'GLW', 'WSPD', 'Population', 'Demand_MWh']].copy()

    # Loop over the different models and merge in their demand projections iteratively:
    for model in ['MLP1', 'MLP2', 'MLP3', 'MLP4', 'MLP5', 'MLP6']:

        # Set the first forward year for each model:
        if model == 'MLP1':
           first_forward_year = 2018
        if model == 'MLP2':
           first_forward_year = 2019
        if model == 'MLP3':
           first_forward_year = 2020
        if model == 'MLP4':
           first_forward_year = 2021
        if model == 'MLP5':
           first_forward_year = 2022
        if model == 'MLP6':
           first_forward_year = 2023

        # Loop over the years from the first forward year for that model through 2023:
        for year in range(first_forward_year,2024,1):

            # Load in the MLP projection for that model and year combination:
            proj_df = pd.read_csv((prediction_output_directory + model + '/' + str(year) + '/' + ba  + '_' + str(year) + '_mlp_output.csv'), index_col=None, header=0)

            # Only keep the columns that are needed:
            proj_df = proj_df[['Time_UTC', 'Load']].copy()

            # Round the demand to the nearest tenth:
            proj_df['Load'] = proj_df['Load'].round(1)
    
            # Rename the load variable to reflect the model used to generate the projection:
            proj_df.rename(columns={'Load': (model + '_MWh')}, inplace=True)

            # Convert the time to a datetime variable:
            proj_df['Time_UTC'] = pd.to_datetime(proj_df['Time_UTC'])

            # Aggregate the output into a new dataframe:
            if year == first_forward_year:
               aggregate_proj_df = proj_df
            else:
               aggregate_proj_df = pd.concat([aggregate_proj_df, proj_df])
            
        # Merge the base_df and proj_df dataframes using common UTC times:
        base_df = base_df.merge(aggregate_proj_df, on=['Time_UTC'], how='left')

    # Copy the data into an output dataframe:
    output_df = base_df.copy()

    # Replace NaN values with -999:
    output_df = output_df.fillna(-999)

    # Set the output file name:
    csv_output_filename = (composite_output_directory + ba + '_Composite_Data.csv')

    # Write out the dataframe to a .csv file:
    output_df.to_csv(csv_output_filename, sep=',', index=False)

    # Clean up the variables and move to the next BA in the loop:
    del base_df, model, first_forward_year, year, proj_df, aggregate_proj_df, output_df, csv_output_filename
