### This notebook guides the user regarding how to fetch timeseries and map data for a specific/all location at runtime.


In [None]:
from windwatts_data import WTKLedClient1224

In [2]:
# Initialize object to the WTKLedClient1224
# Since the config file is in same directory "notebooks" as the ".ipynb" file, the below config path works.
wtk = WTKLedClient1224(config_path='./1224_config.json')

Fetching results...: 3polls [00:03,  1.25s/polls]


#### Robust function for fetching both timeseries and map.

**fetch_filtered_data_1224()** is more robust version to fetch both timeseries(location specific) and map(non-locations specific) data which provides user with more parameters to retrieve data.
Please refer to the documentation to know more about the function's features.
The **fetch_timeseries_1224()**, **fetch_windspeed_map_1224()** and **fetch_winddirection_map_1224()** were the safe function as they restricted user in terms of paramaters as it was done intentionally to safe gaurd user from performing costly queries if same can be achieved using earlier functions.

**fetch_filtered_data_1224()*** gives the user full freedom to perform complex queries.

In [3]:
# fetching desired columns for specified location and year.
df = wtk.fetch_filtered_data_1224(columns=['windspeed_100m','winddirection_100m','mohr'],lat=39.90270,long=-82.98916,years=[2001],n_nearest=2)

Fetching results...: 1polls [00:00,  1.61polls/s]


In [4]:
df

Unnamed: 0,windspeed_100m,winddirection_100m,mohr,index
0,8.57,235.60,101,1b2e7e
1,9.64,248.90,102,1b2e7e
2,9.35,244.47,103,1b2e7e
3,9.51,245.31,104,1b2e7e
4,8.91,241.61,105,1b2e7e
...,...,...,...,...
571,5.38,233.07,1220,1b2e7f
572,5.86,230.06,1221,1b2e7f
573,6.05,230.55,1222,1b2e7f
574,6.65,231.77,1223,1b2e7f



This function functions similarly to the fetch_windspeed_timeseries_1224(). But it offers more control over selection of more columns, selection by time and doesn't restrict user to a number of nearest location(1-16) as it was in case of the safe function. n_nearest is optional here when given along with latitude and longitude. No need to specify n_nearest when you are not specifying latitude and longitude as all locations will be considered.

#### Fetching timeseries

In [5]:
# fetching desired columns for specified location, year, month and hour.
df = wtk.fetch_filtered_data_1224(columns=['windspeed_100m','winddirection_100m','mohr'],lat=39.90270,long=-82.98916,years=[2001],months=[2],hours=[12])

Fetching results...: 1polls [00:00,  1.60polls/s]


In [6]:
df

Unnamed: 0,windspeed_100m,winddirection_100m,mohr,index
0,7.63,279.12,212,1b2e7e


In [7]:
# user can also specify multiple years, months and hours.
df = wtk.fetch_filtered_data_1224(columns=['windspeed_100m','winddirection_100m','mohr'],lat=39.90270,long=-82.98916,years=[2001],months=[2,3],hours=[12,1],n_nearest=2)

Fetching results...: 1polls [00:00,  1.60polls/s]


In [8]:
df

Unnamed: 0,windspeed_100m,winddirection_100m,mohr,index
0,9.67,229.34,201,1b2e7e
1,7.63,279.12,212,1b2e7e
2,8.15,317.1,301,1b2e7e
3,7.59,332.1,312,1b2e7e
4,9.68,229.98,201,1b2e7f
5,7.54,280.77,212,1b2e7f
6,8.28,316.05,301,1b2e7f
7,7.62,331.03,312,1b2e7f


In [9]:
# Instead of column names, users can also specify desired height(s) to extract columns specific to a height(s).
df = wtk.fetch_filtered_data_1224(heights=[100],lat=39.90270,long=-82.98916,years=[2001,2002],months=[2],hours=[12])

Fetching results...: 1polls [00:00,  1.59polls/s]


In [10]:
df

Unnamed: 0,pressure_100m,temperature_100m,winddirection_100m,windspeed_100m,mohr,year,index
0,98570.0,-0.18,279.12,7.63,212,2001,1b2e7e
1,98220.0,-0.66,281.91,7.79,212,2002,1b2e7e


In [11]:
# If a specific height doesn't exist in the columns, then the columns adjacent to the given height values will be returned. Columns with height 20m doesn't exist in the data.
df = wtk.fetch_filtered_data_1224(heights=[20],lat=39.90270,long=-82.98916,years=[2001,2002],months=[2],hours=[12])

Fetching results...: 1polls [00:00,  1.60polls/s]


In [12]:
df

Unnamed: 0,temperature_30m,winddirection_10m,winddirection_30m,windspeed_10m,windspeed_30m,mohr,year,index
0,-0.64,257.96,266.75,2.59,4.51,212,2002,1b2e7e
1,-0.26,243.36,262.87,2.29,4.12,212,2001,1b2e7e


Note: You can either mention specific column names or specific heights, not both.

#### Fetching Map Data

In [13]:
# fetching windspeed map data at 100m for a specific time.
df = wtk.fetch_filtered_data_1224(columns=['windspeed_100m'],years=[2001],months=[2],hours=[12])

Fetching results...: 1polls [00:00,  1.59polls/s]


Query result is stored at: s3://wtk-test-athena-results/e69cefa2-f9f4-421c-aac3-720cc0f3ef61.csv.


In [14]:
df

Unnamed: 0,windspeed_100m,index
0,12.88,0002e7
1,14.35,00034a
2,12.91,00008c
3,12.48,00026d
4,7.76,0003c0
...,...,...
2599772,9.90,278d34
2599773,9.62,278e42
2599774,9.76,278f16
2599775,9.77,278f27


In [15]:
# fetching windspeed map data at 100m for multiple time periods.
df = wtk.fetch_filtered_data_1224(columns=['windspeed_100m','winddirection_100m','mohr','year'],years=[2001,2002],months=[2,4],hours=[12,15])

Fetching results...: 1polls [00:00,  1.61polls/s]


Query result is stored at: s3://wtk-test-athena-results/8949f85a-674a-429d-b84a-1d99ac3d47ef.csv.


Note: Depending on the result file size and network speed , the time to retrieve the results at runtime to the local machine can vary. The result file size for above query is around 800 MB.

In [16]:
df

Unnamed: 0,windspeed_100m,winddirection_100m,mohr,year,index
0,10.83,24.32,212,2002,00018b
1,10.74,20.28,215,2002,00018b
2,9.74,67.91,412,2002,00018b
3,9.69,294.94,415,2002,00018b
4,11.47,255.09,212,2002,000049
...,...,...,...,...,...
20798211,10.18,87.54,415,2001,2758ba
20798212,9.68,68.46,212,2001,2757f2
20798213,9.81,69.32,215,2001,2757f2
20798214,7.67,88.65,412,2001,2757f2


In [17]:
df['month'],df['hour'] = df['mohr']//100, df['mohr']%100
print(df['year'].value_counts())
print(df['month'].value_counts())
print(df['hour'].value_counts())

year
2002    10399108
2001    10399108
Name: count, dtype: int64
month
2    10399108
4    10399108
Name: count, dtype: int64
hour
12    10399108
15    10399108
Name: count, dtype: int64


Note: Note: User can observe that this function is more robust in fetching data compared to safe functions. It is advised to keep the query cost in the mind. The above query scanned around **23 Gb** of data in **9 minutes** to retrieve results. Athena charges $5 per TB scanned.