Jenna Ruzekowicz (jenna.ruzekowicz@nrel.gov) and Caleb Phillips (caleb.phillips@nrel.gov)

The purpose of this notebook is to read in two sets of data over the same time period for comparison:
1) WTK-LED data
2) Either: 
    a) power output and/or wind speed data from turbine(s) 
    b) wind speed measurements from met tower(s)

Data sets are matched on location and time stamp.

A combined and labeled dataframe will be exported to a csv file in the "01Data" folder. The naming convention for the csv file will be as follows: source_lat_lon_startdate_enddate.csv
where source is either "bergey", "oneenergy"...

Notes: 
- really long string potential issue
- simple naming, numbering? 
- number cases with file as reference sheet
- metadata global view dataframe
- lat lon heights stag of analysis
- what's processed and what is not? 

Notes: 
Might need to install Rex if it isn't installed already:
conda install nrel-rex --channel=nrel

More about rex: https://github.com/NREL/rex
2018 5-min monthly h5 (the file you referenced on the 21st):
/campaign/tap/CONUS/wtk/5min/2018/{month}/conus_2018-{month}.h5
 
2018 5-min yearly h5 slices:
/shared-projects/wtk-led/CONUS/wtk/2018/yearly_h5/conus_2018_{height}m.h5
 
2019 60-min yearly h5:
/campaign/tap/CONUS/wtk/60min/2019/conus_2019.h5

In [1]:
import numpy as np
import pandas as pd
import geopandas as gpd
from rex.resource_extraction import MultiYearWindX
from dw_tap.data_fetching import get_data_wtk_led_on_eagle

Step 1) Read in either power output/wind speed data for wind turbine(s) or wind speed data from met tower(s)

In [14]:
#Reading in data from W1 turbine at Marion OH location (oneenergy turbine), 2018
power_output_df = pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180131.000000.marion.w1.xlsx", header=0, usecols="B, C, M")
power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180228.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180331.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180430.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180531.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180630.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180731.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180831.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20180930.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20181031.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20181130.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
#power_output_df = pd.concat([power_output_df, pd.read_excel("../../data/marionw1/turbine.oneenergy.00.20181231.000000.marion.w1.xlsx", header=0, usecols="B, C, M")])
power_output_df.rename(columns={'Time':'datetime', 'WindTurbineEnergyYield_kWh_':'measured_production', 'AvgWindSpeed_m_s_':'measured_ws'}, inplace=True)
#Below line is required time conversion, only run once
power_output_df['datetime'] = power_output_df['datetime'].dt.tz_localize('UTC')

In [15]:
print(power_output_df)

                      datetime  measured_ws  measured_production
0    2018-01-01 05:00:00+00:00         5.92                   52
1    2018-01-01 05:10:00+00:00         6.01                   58
2    2018-01-01 05:20:00+00:00         5.96                   56
3    2018-01-01 05:30:00+00:00         6.01                   58
4    2018-01-01 05:40:00+00:00         5.82                   52
...                        ...          ...                  ...
4455 2018-04-01 04:10:00+00:00         9.47                  221
4456 2018-04-01 04:20:00+00:00         8.98                  209
4457 2018-04-01 04:30:00+00:00         8.75                  200
4458 2018-04-01 04:40:00+00:00         8.69                  180
4459 2018-04-01 04:50:00+00:00         8.50                  178

[12647 rows x 3 columns]


In [8]:
power_output_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12647 entries, 0 to 4459
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   datetime             12647 non-null  datetime64[ns]
 1   measured_ws          12647 non-null  float64       
 2   measured_production  12647 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 395.2 KB


Step 2) Read in WTK-LED data for 2018 at the same location

In [16]:
z_turbine = 80
lat, lon = 40.591555, -83.182092
#change 13 for 3 here until I have accesss to all data
files = ['/campaign/tap/CONUS/wtk/5min/2018/%s/conus_2018-%s.h5' % (str(i).zfill(2), str(i).zfill(2)) for i in range(1,3)]

atmospheric_df = pd.DataFrame()
for file in files:

    myr = MultiYearWindX(file, hsds=False)

    d = get_data_wtk_led_on_eagle(myr, 
                                  lat, lon, z_turbine, "IDW", 
                                  power_estimate=False,
                                  start_time_idx=None, 
                                  end_time_idx=None,
                                  time_stride=None)
    atmospheric_df = pd.concat([atmospheric_df, d])
print(atmospheric_df)

                      datetime        ws          wd
0    2018-01-01 00:00:00+00:00  2.001774  122.668775
1    2018-01-01 00:05:00+00:00  2.062723  127.272024
2    2018-01-01 00:10:00+00:00  2.184248  130.633885
3    2018-01-01 00:15:00+00:00  2.288770  132.435883
4    2018-01-01 00:20:00+00:00  2.342875  132.914820
...                        ...       ...         ...
8059 2018-02-28 23:35:00+00:00  7.192879  255.850598
8060 2018-02-28 23:40:00+00:00  7.556836  254.577095
8061 2018-02-28 23:45:00+00:00  7.594353  254.661698
8062 2018-02-28 23:50:00+00:00  7.593395  255.673829
8063 2018-02-28 23:55:00+00:00  7.443811  257.438752

[16992 rows x 3 columns]


In [9]:
atmospheric_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16992 entries, 0 to 8063
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   datetime  16992 non-null  datetime64[ns, UTC]
 1   ws        16992 non-null  float64            
 2   wd        16992 non-null  float64            
dtypes: datetime64[ns, UTC](1), float64(2)
memory usage: 531.0 KB


Step 3) Merge the two dataframes

In [17]:
analysis_df = atmospheric_df.merge(power_output_df[['datetime', 'measured_production', 'measured_ws']], on='datetime', how='left')

In [19]:
analysis_df = analysis_df.

AttributeError: 'DataFrame' object has no attribute 'drop_na'

In [21]:
analysis_df[~analysis_df['measured_ws'].isna()]

Unnamed: 0,datetime,ws,wd,measured_production,measured_ws
60,2018-01-01 05:00:00+00:00,6.555662,197.591424,52.0,5.92
62,2018-01-01 05:10:00+00:00,6.121590,198.373360,58.0,6.01
64,2018-01-01 05:20:00+00:00,5.774725,199.061868,56.0,5.96
66,2018-01-01 05:30:00+00:00,5.540386,201.817458,58.0,6.01
68,2018-01-01 05:40:00+00:00,5.292899,208.860196,52.0,5.82
...,...,...,...,...,...
16982,2018-02-28 23:10:00+00:00,5.352271,264.625933,55.0,5.91
16984,2018-02-28 23:20:00+00:00,5.937176,260.259899,80.0,6.73
16986,2018-02-28 23:30:00+00:00,6.383798,258.244048,85.0,6.61
16988,2018-02-28 23:40:00+00:00,7.556836,254.577095,88.0,6.82


Notes: 
Lots of different inputs
naming convention
file checking
function in package
check validation case (function) -> gets path dir, checks for all variables? 
does it have atmo. obstacles. 
checker within the for loop

before we read the data
what do we do with no obstacles? 

in each dir (each analysis case)-> lat, lon, and height
some type of checking

tangent: 
notebooks to always save intermediate results! 
have a naming convention 
organize the intermediate steps