### STAT 4185 Final Project
### Jacob Gratrix

## An Analysis of the US Power Grid and its System Optimality

### Introduction

The United States power grid, while commercially balkanized among regional markets and operating firms, is the largest and most connected piece of infrastructure ever constructed by mankind. While the relationships between consumer satisfaction with grid operations and the actual upkeep status of the grid are no doubt complex, here we aim to ascertain the rough degree to which this infrastructure accomplishes its explicit goal. That is to provide electricity from where it is supplied to where it is demanded.



## Data Collection

We first import the relevant libraries for performing our analysis.

In [1]:
import sys

import numpy as np
import pandas as pd
import matplotlib as plt

We also import several libraries for the data collection. These include the builtin `datetime` package and the `gridstatus` API for collating data from regional power operators.

In [29]:
import gridstatus as gs
import datetime as dt

import urllib
from bs4 import BeautifulSoup as bs
import re

Next it is necessary to define a list of service operators, `interconnections`, for convenience of analysing multiple regions at once. 
Additionally, we define two functions, `get_fuel_mixes` and `get_grid_loads` to return Pandas DataFrames amenable to processing.

We will use data from the current year, from a subset of the months of the year, and include data from just the first day of the month in order to keep the data volume manageable. 

Note that `interconnections` will not include the PJM, MISO, SPP, or Ercot regional system operators. This is because they either have no published fuel mix data for the dates requested, or don't support the fuel mix retrieval method in the `gridstatus` module. 

In [9]:
isone = gs.ISONE() # New England ISO
caiso = gs.CAISO() # California ISO
nyiso = gs.NYISO() # New York ISO
pjm = gs.PJM() # PJM Interconnection RTO
miso = gs.MISO() # Midcontinent ISO
spp = gs.SPP() # Southwest Power Pool RTO
ercot = gs.Ercot() # Electric Reliability Council, Texas

interconnections = [isone, caiso, nyiso] # Excluded: PJM, MISO, SPP, Ercot

months = ['January', 'March', 'May', 'July', 'September', 'November']

def get_fuel_mixes(iso):
    fuel_mixes = {}
    for mo in months:
        start = pd.Timestamp(f'{mo} 1, 2022')
        end = pd.Timestamp(f'{mo} 2, 2022') # We'll only use one day each month to keep amt of data manageable
        fuel_mix = iso.get_fuel_mix(start, verbose=False)
        fuel_mixes[mo] = fuel_mix
    return fuel_mixes

def get_grid_loads(iso):
    grid_loads = {}
    for mo in months:
        start = pd.Timestamp(f'{mo} 1, 2022')
        end = pd.Timestamp(f'{mo} 2, 2022') # We'll only use one day each month to keep amt of data manageable
        grid_load = iso.get_load(start)
        grid_loads[mo] = grid_load
    return grid_loads

Using these functions, we obtain our fuel mix data and grid load data for the first day in the first odd months of 2022. 
Please note that this cell takes upwards of 90 seconds to run. As an alternative, two cells down it is coded to initialize the dataframes from `.csv` files containing the data accompanying this notebook. 

In [10]:
%%capture --no-display

mix_dfs = {}
load_dfs = {}

for iso in interconnections:
    mix_dfs[iso] = get_fuel_mixes(iso)
    load_dfs[iso] = get_grid_loads(iso)


In [30]:
for iso in interconnections:
    for mo in months:
        mix_output = pd.DataFrame(mix_dfs[iso][mo])
        mix_output.to_csv(path_or_buf=f'./data/mix-{repr(iso)[12:16]}-{mo}', index=False)
        load_output = pd.DataFrame(load_dfs[iso][mo])
        load_output.to_csv(path_or_buf=f'./data/load-{repr(iso)[12:16]}-{mo}', index=False)


The following cell may be used if the `./data` folder is accessible in your directory to generate the relevant Pandas dataframes. 

In [None]:
mix_dfs = {}
load_dfs = {}

for iso in interconnections:
    iso_str = repr(iso)[12:16]
    mix_dfs[iso] = {}
    load_dfs[iso] = {}
    for mo in months:
        mix_dfs[iso][mo] = pd.read_csv(f'./data/mix-{iso_str}-{mo}.csv')
        load_dfs[iso][mo] = pd.read_csv(f'./data/load-{iso_str}-{mo}.csv')


## Data Cleaning and Munging

## A Few Visualizations

## Constructing the Models

## Results and Conclusions