## Creating timeseries

This notebook goes through the steps of taking the NREL time series data provided to us by Xinshuo and mapping that to the Texas7k dataset given to us by Texas A&M University. Here, we will focus on **wind** and **load**. We actually don't need to do much for load on this end, since the timeseries is already provided to us, and the allocation from the zonal level to the bus level will occur in our modified version of rts_gmlc.py

In [2]:
import pandas as pd
import numpy as np
import datetime

# import the files we will need (excluding the timeseries file given to us by Xinshuo)
bus = pd.read_csv("./TX_Data/SourceData/bus.csv")
branch = pd.read_csv("./TX_Data/SourceData/branch.csv")
gen = pd.read_csv("./TX_Data/SourceData/gen.csv")
wind_mappings = pd.read_csv("./NREL Stuff/Texas7k_NREL_wind_map.csv")

In [41]:
# start with wind forecast for now.... can turn this into a function that applies to all the asset types later
wind_forecast_df = pd.read_csv("./NREL Stuff/wind_day_ahead_forecast_site_2018.csv")
wind_actual_df = pd.read_csv("./NREL Stuff/wind_actual_1h_site_2018.csv")

# change this to just save out the whole year
wind_forecasting_horizon = 24
# we will get some days around the day that we are interested in to populate for this test run
days_before = 1
days_after = 1
day_of_interest = 191 # july 10
#wind_forecast_subset = wind_forecast_df.iloc[np.maximum(0,(day_of_interest-days_before - 1))*wind_forecasting_horizon:np.minimum(365,(day_of_interest+days_after))*wind_forecasting_horizon,:]
#wind_actual_subset = wind_actual_df.iloc[np.maximum(0,(day_of_interest-days_before-1))*wind_forecasting_horizon:np.minimum(365,(day_of_interest+days_after))*wind_forecasting_horizon,:]
# getting the year, month, day, and hour in order to mimic the RTS formatting
#temp = wind_forecast_subset.loc[:,'Forecast_time']
# dates = pd.to_datetime(temp).dt.tz_localize("UTC") - datetime.timedelta(hours=6) #pull 6 hours for date consistency reasons
# wind_forecast_subset.loc[:,'Forecast_time'] = pd.to_datetime(temp.loc[:,:]).

wind_forecast_subset = wind_forecast_df
# has a few extra hours without Forecasts
wind_actual_subset = wind_actual_df.iloc[6:-18]
wind_actual_subset = wind_actual_subset.reset_index()
# adjusted for full year -> cut out the hours that lacked forecasts on either end because I don't think it matters -ER
dates = pd.Series(pd.date_range(start="2018-01-01 06:00:00", end="2018-12-31 05:00:00", freq='H'))
#year = dates.dt.year
#month = dates.dt.month
#day = dates.dt.day
hours = np.tile(np.arange(0, wind_forecasting_horizon), 365)
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets

      index                 Time  Aguayo Wind   Ajax Wind  \
0         6  2018-01-01 06:00:00    61.429666  137.497501   
1         7  2018-01-01 07:00:00    51.727666  125.475001   
2         8  2018-01-01 08:00:00    46.811333   98.070000   
3         9  2018-01-01 09:00:00    54.863667  111.772500   
4        10  2018-01-01 10:00:00    49.277666  205.852500   
...     ...                  ...          ...         ...   
8730   8736  2018-12-31 00:00:00     0.000000  249.794998   
8731   8737  2018-12-31 01:00:00     0.000000  280.822499   
8732   8738  2018-12-31 02:00:00     0.000000  262.185001   
8733   8739  2018-12-31 03:00:00     0.000000  251.527499   
8734   8740  2018-12-31 04:00:00     2.646000  250.320000   

      Amazon Wind Farm Texas  Anacacho Wind Farm  \
0                  85.366416              8.2500   
1                  59.939917              9.5535   
2                  71.830917              8.0850   
3                  67.993751              5.2965   
4      

In [36]:
# need to generate the forecasts for the appropriate wind assets 
wnd_nm = 'WND (Wind)'
#get the wind_assets of gen
wind_gens = gen[gen['Fuel'] == wnd_nm]
wind_mappings.head()

Unnamed: 0.1,Unnamed: 0,Texas7k BusNum,Texas7k GenID,Texas7k SubNum,Texas7k Max MW,Texas7k Min MW,EIA-860 Plant Code,EIA-860 Plant Name,EIA-860 Operating Year,EIA-860 Nameplate Capacity (MW),NREL Wind Site,Mapping Status,Distribution Factor,NREL Capacity Proportion,GEN UID
0,0,190193,1,3131,253.0,34.1,60902,Dermott Wind,2017,253.0,Amazon Wind Farm Texas,1,1.0,330.0,60902_OnshoreWindTurbine_1
1,1,120493,1,1261,99.8,20.75,58000,Anacacho Wind Farm LLC,2012,99.8,Anacacho Wind Farm,1,1.0,129.0,58000_OnshoreWindTurbine_1
2,2,160281,1,2424,188.0,46.66,57927,Baffin Wind,2014,188.0,Baffin,1,1.0,264.0,57927_OnshoreWindTurbine_1
3,3,150496,1,2197,120.0,45.51,57156,Barton Chapel Wind Farm,2009,120.0,Barton Chapel Wind Farm,1,1.0,157.0,57156_OnshoreWindTurbine_1
4,4,220216,1,3727,196.7,65.47,59972,Bearkat,2018,196.7,Bearkat I,1,1.0,257.0,59972_OnshoreWindTurbine_1


In [42]:
# creates a temporary dataframe for the output
temp_output_df_DA = pd.DataFrame({"Year": dates.dt.year, "Month": dates.dt.month, "Day": dates.dt.day, "Period": dates.dt.hour})
temp_output_df_AC = pd.DataFrame({"Year": dates.dt.year, "Month": dates.dt.month, "Day": dates.dt.day, "Period": dates.dt.hour})
# creates a dictionary for the number of times the plant code is used
# this is necessary because we need to make sure we're pulling the correct distribution factor, nrel capacity, and
# texas7k max capacity when scaling in situations where multiple Texas7k generators have the same plant code and
# therefore map from the same NREL wind farm
plant_codes_num_used = {}
gen_codes = np.unique(wind_gens['EIA-860 Plant Code'])
times_used = [0]*len(gen_codes)
plant_codes_num_used = dict(zip(gen_codes, times_used))

# will essentially iterate across the rows of wind_gens
for i in np.arange(wind_gens.shape[0]):
    # finds the gen uid for the associated row, as well as the plant code
    gen_uid = wind_gens.iloc[i]['GEN UID']
    gen_code = wind_gens.iloc[i]['EIA-860 Plant Code']
    # finds the nrel name in wind mappings which agrees with the plant code. this can return lists of length greater than 1
    nrel_name = wind_mappings[wind_mappings['EIA-860 Plant Code'] == gen_code]['NREL Wind Site']
    # finds the index of the correct name in wind_mappings so it can accurately pull the distribution and max capacities
    mapping_idx = nrel_name.index
    # if the name is non-unique (i.e. more than 1 7k generator maps to the same NREL generator), we have to be careful
    if nrel_name.size > 1:
        # chooses the index based on the number of times each plant code has been used already (recall python is 0 indexed)
        nrel_name = list(nrel_name)[plant_codes_num_used[gen_code]]
        mapping_idx = list(mapping_idx)[plant_codes_num_used[gen_code]]
    # based on the mappings above, pull the 7k max, NREL capacity, and distribution factor
    texas7kmax = wind_mappings.iloc[mapping_idx]['Texas7k Max MW']
    nrel_capacity = wind_mappings.iloc[mapping_idx]['NREL Capacity Proportion']
    dist_factor = wind_mappings.iloc[mapping_idx]['Distribution Factor']
    # will multiply the forecast by the below to scale it for texas 7k
    forecast_multiplier = float(dist_factor / nrel_capacity * texas7kmax)
    # assign to the output dataframe
    tst = wind_forecast_subset[nrel_name] * forecast_multiplier
    temp_output_df_DA[gen_uid] = wind_forecast_subset[nrel_name] * forecast_multiplier
    temp_output_df_AC[gen_uid] = wind_actual_subset[nrel_name] * forecast_multiplier
    plant_codes_num_used[gen_code] += 1
temp_output_df_DA.to_csv("./TX_Data/timeseries_data_files/WIND/DAY_AHEAD_wind.csv", index=False)
temp_output_df_AC.to_csv("./TX_Data/timeseries_data_files/WIND/REAL_TIME_wind.csv", index=False)

Now we will handle the load forecasts. This is slightly trickier, as NREL provides 48 hours of forecasts for loads, as opposed to 24. This means that days are double-forecasted, and we have to be careful to make sure we are always pulling from the correct forecast time.

In [71]:
load_forecast_df = pd.read_csv("./NREL Stuff/load_day_ahead_forecast_zone_2018.csv")
load_forecasting_horizon = 24
hours_in_day = 24
days_after_load = 1
# delete the first forecast for load
dup = load_forecast_df.duplicated(subset = "Forecast_time", keep="last")
load_forecast_df = load_forecast_df[~dup.values]
# load_forecast_subset = load_forecast_df.iloc[np.maximum(0,(day_of_interest-days_before - 1))*load_forecasting_horizon:np.minimum(365,(day_of_interest+days_after_load))*load_forecasting_horizon,:]
# temp = load_forecast_subset.loc[:,'Forecast_time']
# dates = pd.to_datetime(temp).dt.tz_localize("UTC")- datetime.timedelta(hours=6) #pull 6 hours for date consistency reasons
# wind_forecast_subset.loc[:,'Forecast_time'] = pd.to_datetime(temp.loc[:,:]).

load_forecast_subset = load_forecast_df.drop("Issue_time", axis=1)
load_forecast_subset = load_forecast_subset.reset_index()
dates = pd.Series(pd.date_range(start="2018-01-01 06:00:00", end="2018-12-31 05:00:00", freq='H'))

year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = dates.dt.hour#*int(load_forecasting_horizon/hours_in_day))
load_output_df_DA = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
zones = load_forecast_subset.columns[2:]
load_output_df_DA[zones] = load_forecast_subset.loc[:, zones]
print(load_output_df_DA.head(50))
load_output_df_DA.to_csv("./TX_Data/timeseries_data_files/LOAD/DAY_AHEAD_regional_Load.csv", index=False)

    Year  Month  Day  Period        Forecast_time         Coast         East  \
0   2018      1    1       6  2018-01-01 06:00:00  10917.600586  2056.409668   
1   2018      1    1       7  2018-01-01 07:00:00  11253.848633  2100.552734   
2   2018      1    1       8  2018-01-01 08:00:00  11185.164062  2098.724854   
3   2018      1    1       9  2018-01-01 09:00:00  11436.691406  2147.535156   
4   2018      1    1      10  2018-01-01 10:00:00  12384.709961  2272.806641   
5   2018      1    1      11  2018-01-01 11:00:00  12793.460938  2318.131348   
6   2018      1    1      12  2018-01-01 12:00:00  12855.050781  2335.874756   
7   2018      1    1      13  2018-01-01 13:00:00  12763.123047  2332.282227   
8   2018      1    1      14  2018-01-01 14:00:00  13190.888672  2304.929443   
9   2018      1    1      15  2018-01-01 15:00:00  13198.246094  2273.939453   
10  2018      1    1      16  2018-01-01 16:00:00  13044.884766  2218.216797   
11  2018      1    1      17  2018-01-01

Now we will do load actuals.

In [77]:
load_actuals_df = pd.read_csv("./NREL Stuff/load_actual_1h_2018.csv")
# load_actuals_horizon = 24
# days_after_load_actuals = 0
# shift = 6 # shift because of a slight inconsistency; these files go from 6 pm Dec 31 2017 to 6 pm Dec 31 2018 (in local time)
# load_actuals_subset = load_actuals_df.iloc[shift + np.maximum(0,(day_of_interest-days_before - 1))*load_actuals_horizon: shift + np.minimum(365,(day_of_interest+days_after_load_actuals))*load_actuals_horizon,:]
# temp = load_actuals_subset.loc[:,'Time']
# dates = pd.to_datetime(temp).dt.tz_localize("UTC") - datetime.timedelta(hours=6)

load_actuals_subset = load_actuals_df.iloc[6:-18].reset_index()
year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = dates.dt.hour
load_output_df_RT = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
zones = load_actuals_subset.columns[1:]
load_output_df_RT[zones] = load_actuals_subset.loc[:, zones]
load_output_df_RT.to_csv("./TX_Data/timeseries_data_files/LOAD/REAL_TIME_regional_Load.csv", index=False)

### Solar

In [48]:
# import the solar mappings file
sol_mappings = pd.read_csv("./NREL Stuff/solar_meta.csv")

# start with solar forecast for now.... can turn this into a function that applies to all the asset types later
sol_forecast_df = pd.read_csv("./NREL Stuff/solar_day_ahead_forecast_site_2018.csv")
sol_actual_df = pd.read_csv("./NREL Stuff/solar_actual_1h_site_2018.csv")
sol_forecasting_horizon = 24

# ignored for the purposes
# we will get some days around the day that we are interested in to populate for this test run
days_before = 1
days_after = 1
day_of_interest = 190 # july 10
#sol_forecast_subset = sol_forecast_df.iloc[np.maximum(0,(day_of_interest-days_before - 1))*sol_forecasting_horizon:np.minimum(365,(day_of_interest+days_after))*sol_forecasting_horizon,:]
#sol_actual_subset = sol_actual_df.iloc[np.maximum(0,(day_of_interest-days_before-1))*sol_forecasting_horizon:np.minimum(365,(day_of_interest+days_after))*sol_forecasting_horizon,:]

sol_forecast_subset = sol_forecast_df

sol_actual_subset = sol_actual_df.iloc[30:-18]
sol_actual_subset = sol_actual_subset.reset_index()


# getting the year, month, day, and hour in order to mimic the RTS formatting
#temp = sol_forecast_subset.loc[:,'Forecast_time']
#dates = pd.to_datetime(temp).dt.tz_localize("UTC") - datetime.timedelta(hours=6) #pull 6 hours for date consistency reasons

dates = pd.Series(pd.date_range(start="2018-01-02 06:00:00", end="2018-12-31 05:00:00", freq='H'))

year = dates.dt.year
month = dates.dt.month
day = dates.dt.day
hours = dates.dt.hour
sol_output_df_DA = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
# this has the first 4 columns set up - what remains is to populate with the appropriate time series which correspond
# to the correct assets
#sol_output_df_DA

In [50]:
# need to generate the forecasts for the appropriate solar assets 
sol_nm = 'SUN (Solar)'
#get the wind_assets of gen
sol_gens = gen[gen['Fuel'] == sol_nm]
# creates a temporary dataframe for the output
temp_output_df_DA = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})
temp_output_df_AC = pd.DataFrame({'Year': year, 'Month':month, 'Day':day, 'Period':hours})

# creates a dictionary for the number of times the plant code is used
# this is necessary because we need to make sure we're pulling the correct distribution factor, nrel capacity, and
# texas7k max capacity when scaling in situations where multiple Texas7k generators have the same plant code and
# therefore map from the same NREL solar farm
plant_codes_num_used = {}
gen_codes = np.unique(sol_gens['EIA-860 Plant Code'])
times_used = [0]*len(gen_codes)
plant_codes_num_used = dict(zip(gen_codes, times_used))

# will essentially iterate across the rows of sol_gens
for i in np.arange(sol_gens.shape[0]):
    # finds the gen uid for the associated row, as well as the plant code
    gen_uid = sol_gens.iloc[i]['GEN UID']
    gen_code = sol_gens.iloc[i]['EIA-860 Plant Code']
    # finds the nrel name in wind mappings which agrees with the plant code. this can return lists of length greater than 1
    nrel_name = sol_mappings[sol_mappings['Plant ID'] == gen_code]['site_ids']
    # in the case that it is assets that have no NREL map then output zeroes
    if (nrel_name.empty):
        temp_output_df_DA[gen_uid] = np.zeros(len(sol_forecast_subset))
        temp_output_df_AC[gen_uid] = np.zeros(len(sol_actual_subset))
        continue
    # finds the index of the correct name in solar_mappings so it can accurately pull the distribution and max capacities
    mapping_idx = nrel_name.index
    # if the name is non-unique (i.e. more than 1 7k generator maps to the same NREL generator), we have to be careful
    if nrel_name.size > 1:
        # chooses the index based on the number of times each plant code has been used already (recall python is 0 indexed)
        nrel_name = list(nrel_name)[plant_codes_num_used[gen_code]]
        mapping_idx = list(mapping_idx)[plant_codes_num_used[gen_code]]
    # Use the value from texas 7k on Max MW because it isn't present in the mappings file
    texas7kmax = sol_gens.iloc[i]['PMax MW']
    # set nrel_capacity to be equal to 7k max as a reasonable estimate
    nrel_capacity = sol_gens.iloc[i]['PMax MW']
    # all of these were just set to 1 as a reasonable estimate
    dist_factor = sol_mappings.iloc[mapping_idx]['Distribution Factor']
    # will multiply the forecast by the below to scale it for texas 7k
    forecast_multiplier = float(dist_factor / nrel_capacity * texas7kmax)
    # assign to the output dataframe
    temp_output_df_DA[gen_uid] = sol_forecast_subset[nrel_name] * forecast_multiplier
    temp_output_df_AC[gen_uid] = sol_actual_subset[nrel_name] * forecast_multiplier
    plant_codes_num_used[gen_code] += 1
temp_output_df_DA.to_csv("./TX_Data/timeseries_data_files/PV/DAY_AHEAD_pv.csv", index=False)
temp_output_df_AC.to_csv("./TX_Data/timeseries_data_files/PV/REAL_TIME_pv.csv", index=False)

What remains:
* Verify that these are working properly (correct format of output, make sure output reconciles with what it should be, and corresponds to the correct rows)
* Verify understanding of time zones - ideally, we should not have any conversion, but in RTS as currently set up, it seems somewhat necessary. We can potentially change this ourselves dow the line
