# Fetching NSRDB Data

The fetch_nsrdb function accesses the [National Solar Radiation Database (NSRDB)](https://nsrdb.nrel.gov/) hosted by NREL on a Amazon Web Services (AWS) cloud through the h5py module To access large datasets, an API key can be requested from NREL. Instructions on how to set up the API key can be found here : https://github.com/NREL/hsds-examples/blob/master/notebooks/03_NSRDB_introduction.ipynb. Download data at any latitude longitude (globally) or state-county (because of repetition of county names) pairs within the US. While HSDS allows you to splice datasets, the script allows you to also find means within ranges. Arrange data in a dataframe for multiscale analysis, with the temporal indices as tuples. [Can be saved as .csv/.txt/.json/.pkl]. 

In [1]:
import pandas
from energia.utils.nsrdb import fetch_nsrdb_data

## Using coordinates

Coordinates can be used to download required data as shown below. An attrs list can be provided to download specific data such as: air_temperature, clearsky_dhi, clearsky_dni, clearsky_ghi, cloud_type, coordinates, dew_point, dhi, dni, fill_flag, ghi, meta, relative_humidity, solar_zenith_angle, surface_albedo, surface_pressure, time_index, total_precipitable_water, wind_direction, wind_speed

In [2]:
coordinates, weather_data = fetch_nsrdb_data(
    attrs=['ghi', 'wind_speed'],
    year=2020,
    resolution='hourly',
    lat_lon=(29.56999969482422, -95.05999755859375),
)

In [3]:
weather_data

Unnamed: 0,ghi,wind_speed
2020-01-01 00:00:00+00:00,0.0,0.55
2020-01-01 01:00:00+00:00,0.0,0.25
2020-01-01 02:00:00+00:00,0.0,0.20
2020-01-01 03:00:00+00:00,0.0,0.40
2020-01-01 04:00:00+00:00,0.0,0.50
...,...,...
2020-12-31 19:00:00+00:00,74.5,7.35
2020-12-31 20:00:00+00:00,62.0,7.35
2020-12-31 21:00:00+00:00,45.0,7.55
2020-12-31 22:00:00+00:00,95.5,7.05


In this example we will download weather data for every county at an hourly resolution in Texas using the fetch_nsrd_utils function. The centroids of each county can be downloaded from the following link: https://data.texas.gov/dataset/Texas-Counties-Centroid-Map/ups3-9e8m/data_preview. 

In [None]:
county_df = pandas.read_csv('Texas_Counties_Centroid_Map.csv')
county_list = county_df['CNTY_NM']

In [None]:
for county in county_list:
    fetch_nsrdb_data(
        attrs=['ghi', 'wind_speed'],
        year=2020,
        resolution='hourly',
        lat_lon=(
            county_df[county_df['CNTY_NM'] == county]['X (Lat)'].values[0],
            county_df[county_df['CNTY_NM'] == county]['Y (Long)'].values[0],
        ),
    )[1].to_csv(f'{county}.csv')

## Using Attributes

fetch_nsrdb_data also allows you to skim and fetch data which match different specifications, e.g. wind data for collection point at the highest elevation in the county. The total list of specifications inclue 'max-population', 'max-elevation', 'max-landcover' 'min-population', 'min-elevation', 'min-landcover'. The state and county needs to be specified. Here we are downloading data for the year 2019 for Harris county in Texas at the collection point with minimum elevation.

In [None]:
coordinates, weather_data = fetch_nsrdb_data(
    attrs=[
        'dni',
        'dhi',
        'wind_speed',
        'ghi',
        'air_temperature',
        'dew_point',
        'relative_humidity',
        'surface_pressure',
    ],
    year=2019,
    state='Texas',
    county='Harris',
    resolution='hourly',
    get='min-elevation',
)

In [None]:
weather_data

In [None]:
coordinates

Data can be concatenated for longer temporal periods

In [None]:
weather_houston = pandas.concat(
    [
        fetch_nsrdb_data(
            attrs=[
                'dni',
                'dhi',
                'wind_speed',
                'ghi',
                'air_temperature',
                'dew_point',
                'relative_humidity',
                'surface_pressure',
            ],
            year=2016 + i,
            state='Texas',
            county='Harris',
            resolution='hourly',
            get='min-elevation',
        )[1]
        for i in range(5)
    ]
)
weather_houston.index = pandas.to_datetime(weather_houston.index, utc=True)
weather_houston.index = weather_houston.index.strftime('%m/%d/%Y, %r')
weather_houston = weather_houston[~weather_houston.index.str.contains('02/29')]

## Resolutions

The base resolution is 'half-hourly'. 'hourly' and 'daily' resolutions average out the data over their respective time periods.