## Creating timeseries

This notebook goes through the steps of taking the NREL time series data provided to us by Xinshuo and mapping that to the Texas7k dataset given to us by Texas A&M University. Here, we will focus on **wind** and **load**. We actually don't need to do much for load on this end, since the timeseries is already provided to us, and the allocation from the zonal level to the bus level will occur in our modified version of rts_gmlc.py

In [189]:
import pandas as pd
import numpy as np
import datetime

# import the files we will need (excluding the timeseries file given to us by Xinshuo)
bus = pd.read_csv("./finished/bus.csv")
branch = pd.read_csv("./finished/branch.csv")
gen = pd.read_csv("./finished/gen.csv")
wind_mappings = pd.read_csv("./NREL Stuff/Texas7k_NREL_wind_map.csv")

In [226]:
# start with wind forecast for now.... can turn this into a function that applies to all the asset types later
wind_forecast_df = pd.read_csv("./NREL Stuff/wind_day_ahead_forecast_site_2018.csv")
wind_forecasting_horizon = 24
# we will get some days around the day that we are interested in to populate for this test run
days_before = 0
days_after = 1
day_of_interest = 191 # july 10
wind_forecast_subset = wind_forecast_df.iloc[np.maximum(0,(day_of_interest-days_before - 1))*wind_forecasting_horizon:np.minimum(365,(day_of_interest+days_after))*wind_forecasting_horizon,:]
# getting the year, month, day, and hour in order to mimic the RTS formatting
temp = wind_forecast_subset.loc[:,'Forecast_time']
dates = pd.to_datetime(temp).dt.tz_localize("UTC") - datetime.timedelta(hours=6) #pull 6 hours for date consistency reasons
# wind_forecast_subset.loc[:,'Forecast_time'] = pd.to_datetime(temp.loc[:,:]).

year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = np.tile(np.arange(1, wind_forecasting_horizon+1), days_after-days_before+1)
wnd_output_df_DA = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
wnd_output_df_DA

Unnamed: 0,Year,Month,Day,Period
4560,2018,7,10,1
4561,2018,7,10,2
4562,2018,7,10,3
4563,2018,7,10,4
4564,2018,7,10,5
4565,2018,7,10,6
4566,2018,7,10,7
4567,2018,7,10,8
4568,2018,7,10,9
4569,2018,7,10,10


In [227]:
# need to generate the forecasts for the appropriate wind assets 
wnd_nm = 'WND (Wind)'
#get the wind_assets of gen
wind_gens = gen[gen['Fuel'] == wnd_nm]
wind_mappings.head()

Unnamed: 0,Texas7k BusNum,Texas7k GenID,Texas7k SubNum,Texas7k Max MW,Texas7k Min MW,EIA-860 Plant Code,EIA-860 Plant Name,EIA-860 Operating Year,EIA-860 Nameplate Capacity (MW),NREL Wind Site,Mapping Status,Distribution Factor,NREL Capacity Proportion
0,190193,1,3131,253.0,34.1,60902,Dermott Wind,2017,253.0,Amazon Wind Farm Texas,1,1.0,330.0
1,120493,1,1261,99.8,20.75,58000,Anacacho Wind Farm LLC,2012,99.8,Anacacho Wind Farm,1,1.0,129.0
2,160281,1,2424,188.0,46.66,57927,Baffin Wind,2014,188.0,Baffin,1,1.0,264.0
3,150496,1,2197,120.0,45.51,57156,Barton Chapel Wind Farm,2009,120.0,Barton Chapel Wind Farm,1,1.0,157.0
4,220216,1,3727,196.7,65.47,59972,Bearkat,2018,196.7,Bearkat I,1,1.0,257.0


In [228]:
# creates a temporary dataframe for the output
temp_output_df_DA = wnd_output_df_DA

# creates a dictionary for the number of times the plant code is used
# this is necessary because we need to make sure we're pulling the correct distribution factor, nrel capacity, and
# texas7k max capacity when scaling in situations where multiple Texas7k generators have the same plant code and
# therefore map from the same NREL wind farm
plant_codes_num_used = {}
gen_codes = np.unique(wind_gens['EIA-860 Plant Code'])
times_used = [0]*len(gen_codes)
plant_codes_num_used = dict(zip(gen_codes, times_used))

# will essentially iterate across the rows of wind_gens
for i in np.arange(wind_gens.shape[0]):
    # finds the gen uid for the associated row, as well as the plant code
    gen_uid = wind_gens.iloc[i]['GEN UID']
    gen_code = wind_gens.iloc[i]['EIA-860 Plant Code']
    # finds the nrel name in wind mappings which agrees with the plant code. this can return lists of length greater than 1
    nrel_name = wind_mappings[wind_mappings['EIA-860 Plant Code'] == gen_code]['NREL Wind Site']
    # finds the index of the correct name in wind_mappings so it can accurately pull the distribution and max capacities
    mapping_idx = nrel_name.index
    # if the name is non-unique (i.e. more than 1 7k generator maps to the same NREL generator), we have to be careful
    if nrel_name.size > 1:
        # chooses the index based on the number of times each plant code has been used already (recall python is 0 indexed)
        nrel_name = list(nrel_name)[plant_codes_num_used[gen_code]]
        mapping_idx = list(mapping_idx)[plant_codes_num_used[gen_code]]
    # based on the mappings above, pull the 7k max, NREL capacity, and distribution factor
    texas7kmax = wind_mappings.iloc[mapping_idx]['Texas7k Max MW']
    nrel_capacity = wind_mappings.iloc[mapping_idx]['NREL Capacity Proportion']
    dist_factor = wind_mappings.iloc[mapping_idx]['Distribution Factor']
    # will multiply the forecast by the below to scale it for texas 7k
    forecast_multiplier = float(dist_factor / nrel_capacity * texas7kmax)
    # assign to the output dataframe
    temp_output_df_DA[gen_uid] = wind_forecast_subset[nrel_name] * forecast_multiplier
    plant_codes_num_used[gen_code] += 1


Now we will handle the load forecasts. This is slightly trickier, as NREL provides 48 hours of forecasts for loads, as opposed to 24. This means that days are double-forecasted, and we have to be careful to make sure we are always pulling from the correct forecast time.

In [229]:
load_forecast_df = pd.read_csv("./NREL Stuff/load_day_ahead_forecast_zone_2018.csv")
load_forecasting_horizon = 48
hours_in_day = 24
days_after_load = 0
load_forecast_subset = load_forecast_df.iloc[np.maximum(0,(day_of_interest-days_before - 1))*load_forecasting_horizon:np.minimum(365,(day_of_interest+days_after_load))*load_forecasting_horizon,:]
temp = load_forecast_subset.loc[:,'Forecast_time']
dates = pd.to_datetime(temp).dt.tz_localize("UTC")- datetime.timedelta(hours=6) #pull 6 hours for date consistency reasons
# wind_forecast_subset.loc[:,'Forecast_time'] = pd.to_datetime(temp.loc[:,:]).

year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = np.tile(np.arange(1, hours_in_day+1), (days_after_load-days_before+1)*int(load_forecasting_horizon/hours_in_day))
load_output_df_DA = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
zones = load_forecast_subset.columns[2:]
load_output_df_DA[zones] = load_forecast_subset.loc[:, zones]
load_output_df_DA

Unnamed: 0,Year,Month,Day,Period,Coast,East,Far_West,North,North_Central,South_Central,Southern,West
9120,2018,7,10,1,10807.822266,1383.214722,2664.858643,904.42926,12928.860352,5857.957031,3002.57666,1207.646118
9121,2018,7,10,2,10172.105469,1344.203613,2643.786377,876.927063,12715.223633,5793.959473,2979.704834,1207.219727
9122,2018,7,10,3,10572.186523,1358.227051,2635.102051,869.612366,12978.654297,5812.027832,2874.006104,1199.040649
9123,2018,7,10,4,10316.332031,1385.50293,2644.237061,878.220581,13113.492188,5702.532715,2955.104736,1204.137817
9124,2018,7,10,5,10982.30957,1475.664062,2676.032959,941.527161,13973.3125,6416.700684,3016.244141,1221.724854
9125,2018,7,10,6,11064.65918,1574.629028,2715.373779,963.000671,14286.724609,6472.459473,3156.372559,1275.190063
9126,2018,7,10,7,11990.109375,1655.611328,2756.442383,1058.470703,15152.405273,6882.882812,3329.994629,1327.799683
9127,2018,7,10,8,13135.856445,1792.708862,2802.72998,1150.734375,15726.408203,7266.425293,3541.91333,1370.216553
9128,2018,7,10,9,14080.686523,1918.43396,2868.10376,1167.906372,16852.880859,7742.043945,3967.829346,1409.932617
9129,2018,7,10,10,14679.250977,1981.028198,2853.259521,1254.248291,17572.84375,8193.204102,4202.567383,1471.746826


Now we will do load actuals.

In [230]:
load_actuals_df = pd.read_csv("./NREL Stuff/load_actual_1h_2018.csv")
load_actuals_horizon = 24
days_after_load_actuals = 1
shift = 6 # shift because of a slight inconsistency; these files go from 6 pm Dec 31 2017 to 6 pm Dec 31 2018 (in local time)
load_actuals_subset = load_actuals_df.iloc[shift + np.maximum(0,(day_of_interest-days_before - 1))*load_actuals_horizon: shift + np.minimum(365,(day_of_interest+days_after_load_actuals))*load_actuals_horizon,:]
temp = load_actuals_subset.loc[:,'Time']
dates = pd.to_datetime(temp).dt.tz_localize("UTC") - datetime.timedelta(hours=6)
year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = np.tile(np.arange(1, hours_in_day+1), days_after_load_actuals-days_before+1)
load_output_df_RT = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
zones = load_actuals_subset.columns[1:]
load_output_df_RT[zones] = load_actuals_subset.loc[:, zones]
print(load_output_df_RT)

      Year  Month  Day  Period         Coast         East     Far_West  \
4566  2018      7   10       1  10462.982178  1347.273488  2631.696757   
4567  2018      7   10       2  10337.150553  1330.130310  2633.568746   
4568  2018      7   10       3  10490.323242  1334.708130  2628.633952   
4569  2018      7   10       4  10921.807861  1423.182434  2651.891907   
4570  2018      7   10       5  11220.314779  1463.707967  2688.754069   
4571  2018      7   10       6  11895.974609  1563.610870  2700.949483   
4572  2018      7   10       7  12949.113770  1723.209218  2728.428507   
4573  2018      7   10       8  14063.087321  1849.736176  2784.630656   
4574  2018      7   10       9  15165.322591  1998.541453  2838.836283   
4575  2018      7   10      10  16117.284993  2125.158040  2930.400594   
4576  2018      7   10      11  17011.923503  2230.984945  3024.369690   
4577  2018      7   10      12  17580.604980  2298.551310  3082.517293   
4578  2018      7   10      13  17936.

What remains:
* Wind actuals (should repurpose wind code above)
* Solar forecasts / actuals (should repurpose wind code above)
* Verify that these are working properly (correct format of output, make sure output reconciles with what it should be, and corresponds to the correct rows)
* Verify understanding of time zones