# Ways of accessing data in National Solar Radiance Database using Python

### This notebook documents five ways to access data in NSRDB including:
- Three cloud service providers:
    - Azure Blob Storage
    - AWS S3 Buckets
    - Google Cloud Storage
- NREL developer API
- [AWS HDF Group's Highly Scalable Data Service (HSDS)](https://github.com/NREL/hsds-examples/blob/master/notebooks/03_NSRDB_introduction.ipynb)

### Things to know:
-  Original files in NSRDB are stored in `.h5` format, which is one of the Hierarchical Data Formats (HDF) used to store large amount of data. In order to access it, `xarray` is a good package with backend engine `h5netcdf` to open the file. 

- Data is available on three main Cloud Service Providers. To access them, utilize their `FileSystem` with valid URI to open it. 

- The following packages need to be installed into your python environment: 
    - `adlfs`  `xarray`  `planetary_computer`  `s3fs`  `gcsfs`  `pandas`

In [None]:
# Executing this cell will install the required packages for the notebook
%pip install adlfs
%pip install "xarray[complete]"
%pip install planetary_computer
%pip install s3fs
%pip install gcsfs
%pip install pandas

---
### Azure Blob Storage

- using `planetary_computer` to get token to access
- using `AzureBlobFileSystem` to access files in Azure Blob Storage

In [None]:
import xarray as xr
import planetary_computer
from adlfs import AzureBlobFileSystem

# file parameters
year = 2020
storage_account_name = 'nrel'

fs = AzureBlobFileSystem(
    account_name = storage_account_name,
    credential = planetary_computer.sas.get_token("nrel", "nrel-nsrdb").token
)

file = fs.open(f"nrel-nsrdb/v3/nsrdb_{year}.h5")
AZ_ds = xr.open_dataset(file, backend_kwargs={"phony_dims": "sort"}, engine="h5netcdf")
AZ_ds

### AWS S3 Bucket

Using `s3fs` to allow python access to the AWS S3 buckets


In [None]:
import s3fs 

NSRDB_S3_URI = "s3://nrel-pds-nsrdb/philippines/philippines_2017.h5"
fs = s3fs.S3FileSystem(anon=True)
AWS_ds = xr.open_dataset(fs.open(NSRDB_S3_URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')
AWS_ds

### Google Cloud Storage

- [Install the gcloud CLI](https://cloud.google.com/sdk/docs/install)
- run `!gcloud auth application-default login` to get authentication
- Using `gcsfs` to allow python access to Google Cloud Storage

In [None]:
import gcsfs

NSRDB_GCS_URI = "gs://nsrdb-netcdf/philippines/philippines_2017.h5"
fs = gcsfs.GCSFileSystem(anon=True)

GCS_ds = xr.open_dataset(fs.open(NSRDB_GCS_URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')
GCS_ds

### NREL developer Python API

- Get NSRDB API Key: https://developer.nrel.gov/signup/
- Data downloaded format is `.csv`
- Use `Pandas` to read it 

In [None]:
import pandas as pd

# Declare all variables as strings. Spaces must be replaced with '+', i.e., change 'John Smith' to 'John+Smith'.
# Define the lat, long of the location
lat, lon = 39.2606, -80.1139
# You must request an NSRDB api key from the link above
api_key = {your api key}
# Set the attributes to extract (e.g., dhi, ghi, etc.), separated by commas.
attributes = 'ghi,dhi,dni,wind_speed,air_temperature,solar_zenith_angle'
# Choose year of data
year = '2019'
# Set leap year to true or false. True will return leap day data if present, false will not.
leap_year = 'false'
# Set time interval in minutes, i.e., '30' is half hour intervals. Valid intervals are 30 & 60.
interval = '30'
# Specify Coordinated Universal Time (UTC), 'true' will use UTC, 'false' will use the local time zone of the data.
# NOTE: In order to use the NSRDB data in SAM, you must specify UTC as 'false'. SAM requires the data to be in the
# local time zone.
utc = 'false'
# Your full name, use '+' instead of spaces.
your_name = 'Justin+Lin'
# Your reason for using the NSRDB.
reason_for_use = 'beta+testing'
# Your affiliation
your_affiliation = 'HTF'
# Your email address
your_email = 'slin@wvhtf.org'
# Please join our mailing list so we can keep you up-to-date on new developments.
mailing_list = 'false'

# Declare url string
url = 'https://developer.nrel.gov/api/nsrdb/v2/solar/psm3-download.csv?wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap}&interval={interval}&utc={utc}&full_name={name}&email={email}&affiliation={affiliation}&mailing_list={mailing_list}&reason={reason}&api_key={api}&attributes={attr}'.format(year=year, lat=lat, lon=lon, leap=leap_year, interval=interval, utc=utc, name=your_name, email=your_email, mailing_list=mailing_list, affiliation=your_affiliation, reason=reason_for_use, api=api_key, attr=attributes)
# Return all but first 2 lines of csv to get data:
df = pd.read_csv(url, skiprows=2)

df.head()