# Produce Eastern Hydro Profile Using Multiple Data Sources

Following data sources are used to generate eastern_hydro_v3.csv
* EIA monthly net generation for conventional hydro plants from Form 923
* Hourly total hydro generation profiles of 4 Independent System Operators (ISO): ISONE, NYISO, PJM and SWPP in 2016
* Hourly net demand profile of base grid

Features of this methodology
* Pumped storage hydro (HPS) and conventional hydro (HYC) are handled separately
* Historical hourly hydro profiles of ISONE, NYISO, PJM and SWPP, are used directly
* For the rest of areas, which are not covered by the 4 ISOs, the hydro profile is generated by scaling the net demand of the corresponding state from the results of our Eastern 2016 basecase scenario based on the monthly total net generation from EIA 923. This is similar to western hydro profile v2.

*Note that in order to run this demo properly, **timezonefinder** package is required to be installed as a dependency in order to find out the local timezone of a given geographic coordination in (lat, lon) pair.*

In [1]:
import json
import pytz
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
from timezonefinder import TimezoneFinder
from powersimdata.input.grid import Grid
from powersimdata.network.usa_tamu.constants.zones import (interconnect2loadzone, 
                                                           loadzone2state, 
                                                           state2abv)

from prereise.gather.helpers import (trim_eia_form_923,
                                     get_monthly_net_generation)
from prereise.gather.hydrodata.eia.helpers import scale_profile
from prereise.gather.demanddata.eia.map_ba import map_buses_to_county
from prereise.gather.hydrodata.eia.decompose_profile import get_profile_by_plant, get_normalized_profile
from prereise.gather.hydrodata.eia.net_demand import get_net_demand_profile

In [2]:
# Note that using current version of the grid won't be able to reproduce the eastern_hydro_v3 profile. 
# The purpose of this notebook is to illustrate the methodology we used to generate this profile.
eastern = Grid(['Eastern'])

Reading bus.csv
Reading plant.csv
Reading gencost.csv
Reading branch.csv
Reading dcline.csv
Reading sub.csv
Reading bus2sub.csv
Reading zone.csv


## 1. Generate eastern pumped storage hydro profiles

Generate hourly profile for HPS based on a deterministic model described in 'hps_plants_eastern.xlsx'


In [3]:
# This step takes 30 sec to finish
eastern_hps = pd.read_excel(io='./hps_plants_eastern.xlsx',sheet_name = 'all_plantIDs',header = 0)
time_index = pd.date_range(start='2016-01-01 00:00:00', end='2016-12-31 23:00:00', freq='H')
eastern_hydro_v3_hps = pd.DataFrame(index = time_index, columns = sorted(eastern_hps['PlantIDs']))

utc = pytz.utc
tf = TimezoneFinder()

for plantid in tqdm(eastern_hydro_v3_hps.columns):
    lat = eastern.plant.loc[plantid,'lat']
    lon = eastern.plant.loc[plantid,'lon']
    capacity = eastern.plant.loc[plantid,'Pmax']
    tz_target = pytz.timezone(tf.certain_timezone_at(lat=lat, lng=lon))
    for time_ind in time_index:
        time_utc = utc.localize(time_ind)
        time_local = time_utc.astimezone(tz_target)
        # weekday, 0:Monday, 1:Tuesday, 2:Wednesday, 3:Thursday, 4:Friday
        if time_local.weekday() <= 4:
            if time_local.hour in {11,18}:
                eastern_hydro_v3_hps.loc[time_ind,plantid] = capacity*0.5
            if 11 < time_local.hour < 18:
                eastern_hydro_v3_hps.loc[time_ind,plantid] = capacity
eastern_hydro_v3_hps.fillna(0,inplace = True)

100%|██████████| 96/96 [00:36<00:00,  2.67it/s]


Total HPS generation during the year turns out to be 30301.6 GWh based on current approach, which is 50% higher than the reported number in EIA 923, i.e. 19884GWh. Hence, we decided to scale the current HPS profile down by 35%

In [4]:
eastern_hydro_v3_hps = eastern_hydro_v3_hps.apply(lambda x: x*0.65)

In [5]:
eastern_hydro_v3_hps.sum().sum()

20397404.72770007

## 2. Generate eastern conventional hydro profiles
### a) Generate a mapping between each conventional hydro generator and BAs via counties
This is a similar procedure as generating the mapping between bus to BA via county in eastern demand v5.

In [6]:
eastern_hyc_id_list = set(eastern.plant[eastern.plant['type'] == 'hydro'].index) - set(eastern_hps['PlantIDs'])
eastern_hyc = eastern.plant.loc[sorted(eastern_hyc_id_list)][['Pmax','lat','lon','zone_name']].copy()
eastern_hyc, eastern_hyc_no_county_match = map_buses_to_county(eastern_hyc)

100%|██████████| 2210/2210 [19:11<00:00,  1.92it/s]


In [7]:
eastern_hyc_no_county_match

[]

In [8]:
data = json.load(open('../../../../data/ba_to_county.txt'))
ba_county_list = {}
for val in data['groups'].values():
    ba_county_list[val['label']] = set(val['paths'])
    
eastern_hyc['BA'] = None
for index, row in eastern_hyc.iterrows():
    for BA, clist in ba_county_list.items():
        try:
            county = row['County'].replace(' ','_')
            county = county.replace('.','')
            county = county.replace('-','')
            county = county.replace('\'','_')
            if row['County'] == 'LaSalle__IL':
                county = 'La_Salle__IL'
            if row['County'] == 'Lac Qui Parle__MN':
                county = 'Lac_qui_Parle__MN'
            if row['County'] == 'Baltimore__MD':
                county = 'Baltimore_County__MD'
            if row['County'] == 'District of Columbia__DC':
                county = 'Washington__DC'
            if row['County'] == 'St. Louis City__MO':
                county = 'St_Louis_Co__MO'
            if county in clist:
                eastern_hyc.loc[index,'BA'] = BA
                break
        except:
            continue
            
eastern_hyc_no_BA_match = list(eastern_hyc[eastern_hyc['BA'].isna()].index)

# Fix mismatch county names in Virginia Mountains
for ind in eastern_hyc_no_BA_match:
    if eastern_hyc.loc[ind,'zone_name'] == 'Virginia Mountains':
        eastern_hyc.loc[ind,'BA'] = 'PJM'

eastern_hyc_no_BA_match = list(eastern_hyc[eastern_hyc['BA'].isna()].index)

# Assign the rest no-ba-match buses to SWPP
for ind in eastern_hyc_no_BA_match:
    eastern_hyc.loc[ind,'BA'] = 'SWPP'

In [9]:
eastern_hyc_no_BA_match = list(eastern_hyc[eastern_hyc['BA'].isna()].index)
eastern_hyc_no_BA_match

[]

In [10]:
eastern_hyc.BA.unique()

array(['ISONE', 'NYISO', 'PJM', 'Carolina', 'TVA_LGEE', 'SOCO', 'AEC',
       'MISO', 'SWPP'], dtype=object)

In [11]:
eastern_hyc.to_csv('eastern_hyc_to_BA.csv')

### b) Decompose 2016 total hydro profiles of ISONE, NYISO, PJM, SWPP into plant level profiles in the corresponding region
Load total profiles of ISONE, NYISO, PJM and SWPP


In [12]:
isone_hydro = pd.read_csv('../../../data/neiso_hydro_2016.csv', index_col = 0)
nyiso_hydro = pd.read_csv('../../../data/nyiso_hydro_2016.csv', index_col = 0)
pjm_hydro = pd.read_csv('../../../data/pjm_hydro_2016.csv', index_col = 0)
swpp_hydro = pd.read_csv('../../../data/spp_hydro_2016.csv', index_col = 0)

In [13]:
hydro_v3_isone = get_profile_by_plant(eastern_hyc[eastern_hyc['BA'] == 'ISONE'], isone_hydro['hydro'])

In [14]:
hydro_v3_nyiso = get_profile_by_plant(eastern_hyc[eastern_hyc['BA'] == 'NYISO'], nyiso_hydro['GenMWh'])

In [15]:
hydro_v3_pjm = get_profile_by_plant(eastern_hyc[eastern_hyc['BA'] == 'PJM'], pjm_hydro['hydro'])

In [16]:
hydro_v3_swpp = get_profile_by_plant(eastern_hyc[eastern_hyc['BA'] == 'SWPP'], swpp_hydro['hydro'])

In [17]:
hydro_v3_isone.index = eastern_hydro_v3_hps.index
hydro_v3_nyiso.index = eastern_hydro_v3_hps.index
hydro_v3_pjm.index = eastern_hydro_v3_hps.index
hydro_v3_swpp.index = eastern_hydro_v3_hps.index

### c) For the hydro plants in the rest of the area, using the same methodology as in western hydro profile v2, i.e. scale the hourly net demand profile based on the monthly total net generation of conventional hydro reported in EIA 923 in each state, then decompose into plant level profile based on the corresponding plant capacities.

In [18]:
eastern_loadzone_to_state_abbrev = {}
for lz in interconnect2loadzone['Eastern']:
    eastern_loadzone_to_state_abbrev[lz] = state2abv[loadzone2state[lz]]

In [19]:
state_ba_fraction = defaultdict(lambda: defaultdict(float))
ba_name = {'ISONE','NYISO','SWPP','PJM'}
for index,row in eastern_hyc.iterrows():
    if row['BA'] in ba_name:
        state_ba_fraction[eastern_loadzone_to_state_abbrev[row['zone_name']]][row['BA']] += row['Pmax']
    state_ba_fraction[eastern_loadzone_to_state_abbrev[row['zone_name']]]['total'] += row['Pmax']

In [20]:
state_ba_fraction

defaultdict(<function __main__.<lambda>()>,
            {'ME': defaultdict(float,
                         {'ISONE': 714.7999999999985,
                          'total': 714.7999999999985}),
             'NH': defaultdict(float,
                         {'ISONE': 424.8070000000001,
                          'total': 424.8070000000001}),
             'VT': defaultdict(float,
                         {'ISONE': 327.4110000000001,
                          'total': 327.4110000000001}),
             'MA': defaultdict(float,
                         {'ISONE': 268.8959999999998,
                          'total': 268.8959999999998}),
             'RI': defaultdict(float, {'ISONE': 2.8, 'total': 2.8}),
             'CT': defaultdict(float,
                         {'ISONE': 111.3, 'total': 118.5, 'NYISO': 7.2}),
             'NY': defaultdict(float,
                         {'NYISO': 4674.361000000001,
                          'total': 4674.361000000001}),
             'NJ': defaultdict(floa

* Observing from state_ba_fraction, there is no such state that overlaps with PJM and SWPP simultaneously
* Observing from state_ba_fraction, there is no such state that partially overlaps with ISONE or NYISO and partially doesn't overlap with any other BAs
* We only need to consider states not overlapping with any of the 4 ISOs or partailly overlap with either PJM or SWPP to generate the hydro profile of the rest states.

In [21]:
eastern_monthly_hyc_rest = {}
eia_923_filename = 'EIA923_Schedules_2_3_4_5_M_12_2016_Final_Revision.xlsx'
eia_923_form = trim_eia_form_923(eia_923_filename)
for state, ba in tqdm(state_ba_fraction.items()):
    if len(ba) == 1:
        eastern_monthly_hyc_rest[state] = get_monthly_net_generation(state, eia_923_form, 'hydro', hps=False)
    elif 'PJM' in ba:
        total_state_profile = get_monthly_net_generation(state, eia_923_form, 'hydro', hps=False)
        frac = 1-(ba['PJM']/ba['total'])
        if frac > 0:
            eastern_monthly_hyc_rest[state] = [val*frac for val in total_state_profile]
    elif 'SWPP' in ba:       
        total_state_profile = get_monthly_net_generation(state, eia_923_form, 'hydro', hps=False)
        frac = 1-(ba['SWPP']/ba['total'])
        if frac > 0:
            eastern_monthly_hyc_rest[state] = [val*frac for val in total_state_profile]

100%|██████████| 36/36 [00:00<00:00, 177.36it/s]


For Montana, we only have 5 HYC generators (1 plant) in eastern, which we found the corresponding real plant in EIA 923. Comparing with EIA 923, all hydro plants in MT are connected to WECC, so we zero out the hydro generation for eastern in MT here.

In [22]:
eastern_monthly_hyc_rest['MT'] = [0]*12

In [23]:
eastern_hyc[eastern_hyc['zone_name'] == 'Montana Eastern']

Unnamed: 0_level_0,Pmax,lat,lon,zone_name,County,BA
plant_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10378,0.0,48.0166,-106.419,Montana Eastern,McCone__MT,MISO
10379,0.0,48.0166,-106.419,Montana Eastern,McCone__MT,MISO
10380,0.0,48.0166,-106.419,Montana Eastern,McCone__MT,MISO
10381,0.0,48.0166,-106.419,Montana Eastern,McCone__MT,MISO
10382,0.0,48.0166,-106.419,Montana Eastern,McCone__MT,MISO


For Eastern Texas, we only have 5 hyc generators (2 plants) in Eastern, which we found the corresponding real plants in EIA 923

In [24]:
eastern_monthly_hyc_rest['TX'] = [8430, 8091, 9172, 16705, 23493, 26282, 5259, 7536, 7229, 2376, 5503, 2665]

In [25]:
eastern_hyc[eastern_hyc['zone_name'] == 'East Texas']

Unnamed: 0_level_0,Pmax,lat,lon,zone_name,County,BA
plant_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
9204,0.286,31.0424,-94.0865,East Texas,Jasper__TX,MISO
9205,2.29,31.0424,-94.0865,East Texas,Jasper__TX,MISO
9206,2.29,31.0424,-94.0865,East Texas,Jasper__TX,MISO
9207,14.883,31.0424,-94.0865,East Texas,Jasper__TX,MISO
9208,14.883,31.0424,-94.0865,East Texas,Jasper__TX,MISO
9209,23.184,31.1976,-93.5861,East Texas,Sabine__LA,MISO
9210,23.184,31.1976,-93.5861,East Texas,Sabine__LA,MISO


### d) Get net demand for each state to define the hourly shape of HYC in the rest of states not covered by the 4 ISOs.

In [26]:
eastern_net_demand_state_rest = {}
for state in eastern_monthly_hyc_rest:
    eastern_net_demand_state_rest[state] = get_net_demand_profile(state, interconnect="Eastern")

--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading solar
--> Loading demand
--> Loading wind
--> Loading

Scale hourly net demand profile based on monthly net generation to get the HYC hourly total profile of the corresponding state

In [27]:
eastern_hyc_hourly_total_state_rest = {}
for state in eastern_net_demand_state_rest:
    eastern_hyc_hourly_total_state_rest[state] = scale_profile(pd.Series(eastern_net_demand_state_rest[state], index=eastern_hydro_v3_hps.index), eastern_monthly_hyc_rest[state])

Decompose HYC hourly total profile into plant level profiles proportional to plant capacities
* Two HYC generators [9209,9210] in Louisiana are placed in East Texas loadzone. We put them back to LA when generating plant level profiles.

In [28]:
hydro_v3_rest_state = {}
for state in eastern_hyc_hourly_total_state_rest:
    plantlist = list(eastern_hyc[(eastern_hyc['zone_name'].apply(lambda x: eastern_loadzone_to_state_abbrev[x]) == state) & (~eastern_hyc['BA'].isin({'ISONE','NYISO','SWPP','PJM'}))].index)
    if state == 'TX':
        plantlist.remove(9209)
        plantlist.remove(9210)
    if state == 'LA':
        plantlist.append(9209)
        plantlist.append(9210)
    plant_df = eastern_hyc.loc[plantlist].copy()
    hydro_v3_rest_state[state] = get_profile_by_plant(plant_df,eastern_hyc_hourly_total_state_rest[state])

In [29]:
# Generate final eastern hydro v3
eastern_hydro_v3 = pd.concat(list(hydro_v3_rest_state.values()),axis = 1)
eastern_hydro_v3.index = eastern_hydro_v3_hps.index
eastern_hydro_v3 = pd.concat([eastern_hydro_v3, eastern_hydro_v3_hps],axis = 1)
eastern_hydro_v3 = pd.concat([eastern_hydro_v3, hydro_v3_isone, hydro_v3_nyiso, hydro_v3_pjm, hydro_v3_swpp],axis = 1)
eastern_hydro_v3 = eastern_hydro_v3[sorted(eastern_hydro_v3.columns)]

In [30]:
eastern_hydro_v3_normalize = get_normalized_profile(eastern.plant[eastern.plant.type == "hydro"], eastern_hydro_v3)

In [31]:
eastern_hydro_v3_normalize.to_csv('eastern_hydro_v3_normalize.csv')