# Ways of accessing data in National Solar Radiation Database using Python

### This notebook shows five ways to access NSRDB data including:
- Three cloud service providers:
    - Azure Blob Storage
    - AWS S3 Buckets
    - Google Cloud Storage
- NREL developer API
- [AWS HDF Group's Highly Scalable Data Service (HSDS)](https://github.com/NREL/hsds-examples/blob/master/notebooks/03_NSRDB_introduction.ipynb)

This notebook gives example code for each of the 3 CSPs and the NREL API.  The HSDS access is shown in the link above.

Each of the CSP data access demonstrations request the same H5 datafile related to the Philippines.  This is for a consistent comparison of the three providers.

### Things to know:
-  Original files at NSRDB are stored in `.h5` format, which is one of the Hierarchical Data Formats (HDF) used to store large amount of data. In order to access it from python, `xarray` is a good package with backend engine `h5netcdf` to open the file. 

- Data is available on three main Cloud Service Providers. To access them, we will utilize their `FileSystems` with valid URI to open the data files. 

- The following packages need to be installed into your python environment for the examples to execute: 
    - `adlfs`
    - `xarray`
    - `planetary_computer`
    - `s3fs`
    - `gcsfs`
    - `pandas`

In [None]:
# Executing this cell will install the required packages for the notebook
# NOTE: this only need to be run once per machine / account
%pip install adlfs
%pip install "xarray[complete]"
%pip install planetary_computer
%pip install s3fs
%pip install gcsfs
%pip install pandas

---
## Azure Blob Storage

- Use `planetary_computer` to get token to access
- Use `AzureBlobFileSystem` to access files in Azure Blob Storage

In [None]:
import xarray as xr
import planetary_computer
from adlfs import AzureBlobFileSystem

# file parameters
storage_account_name = 'nrel'

fs = AzureBlobFileSystem(
    account_name = storage_account_name,
    credential = planetary_computer.sas.get_token("nrel", "nrel-nsrdb").token
)

file = fs.open(f"nrel-nsrdb/philippines/philippines_2017.h5")
AZ_ds = xr.open_dataset(file, backend_kwargs={"phony_dims": "sort"}, engine="h5netcdf")
AZ_ds

---

## AWS S3 Bucket

Use `s3fs` to allow python to access the AWS S3 buckets


In [None]:
import s3fs 

NSRDB_S3_URI = "s3://nrel-pds-nsrdb/philippines/philippines_2017.h5"
fs = s3fs.S3FileSystem(anon=True)
AWS_ds = xr.open_dataset(fs.open(NSRDB_S3_URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')
AWS_ds

---

## Google Cloud Storage

- [Install the gcloud CLI](https://cloud.google.com/sdk/docs/install)
    - Use the link above and follow the directions for installing Google Cloud CLI.
    - Use the default values for options, if at all possible.
- Create a free trial of Google Cloud.  This will require the use of a credit card, but it will not be charged until after the trial AND you accept the possibility of charges.
- In Google Cloud, create a new project. This can be named anything you like within the naming conventions of Google Cloud. This project will be referenced later in this notebook.

*NOTE: At this point, for Jupyter to recognize the installation of the Google Cloud package, it may be necessary to restart your Jupyter environment.*

#### Initialize your Google Cloud access
- From a PowerShell window, execute the following: ```gcloud init```
    - Select your Google account and the project you created earlier

#### Authorize your cloud account to access the NSRDB resources
When executing the following cell, you will be prompted for your Google Cloud login in your default browser.

In [None]:
!gcloud auth application-default login

#### Use `gcsfs` to allow python access to Google Cloud Storage

In [None]:
import gcsfs

NSRDB_GCS_URI = "gs://nsrdb-netcdf/philippines/philippines_2017.h5"
fs = gcsfs.GCSFileSystem(anon=True)

tempFile = fs.open(NSRDB_GCS_URI)

GCS_ds = xr.open_dataset(tempFile, backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')
GCS_ds

## Small conclusion and observation:
Based on the results above, there are several trade offs when accessing data from three different cloud service providers. AWS has the least configurations and no token is needed, but it took a little bit longer to read the data. In comparison, Azure and Google cloud requires more configuration but with faster speed of accessing the data. Therefore, end users should take all these factors into account when choosing which cloud service provider to use. 

---

### NREL developer Python API
Use the National Renewable Energy Lab's API to access the data

- Get NSRDB API Key: https://developer.nrel.gov/signup/
    - You will need to copy your API key and personal information into the following cell

In [None]:
# You must request an NSRDB api key from the link above
api_key = <your api key>
# Your full name, use '+' instead of spaces.
your_name = <your name>
# Your reason for using the NSRDB, (e.g., 'research', 'commercial', 'education', 'non+profit', 'beta+testing')
reason_for_use = <your reason>
# Your affiliation
your_affiliation = <your affiliation>
# Your email address
your_email = <your email>
# Join the NREL email list?
mailing_list = 'false'

### Download the data

- Data download format is `.csv`
- Use `Pandas` to read it 

In [None]:
import pandas as pd

# Declare all variables as strings. Spaces must be replaced with '+', i.e., change 'John Smith' to 'John+Smith'.
# Define the lat, long of the location
lat, lon = 39.2606, -80.1139

# Specify Coordinated Universal Time (UTC), 'true' will use UTC, 'false' will use the local time zone of the data.
# NOTE: In order to use the NSRDB data in SAM, you must specify UTC as 'false'. SAM requires the data to be in the
# local time zone.
utc = 'false'

# Set the attributes to extract (e.g., dhi, ghi, etc.), separated by commas.
attributes = 'ghi,dhi,dni,wind_speed,air_temperature,solar_zenith_angle'
# Choose year of data
year = '2019'
# Set leap year to true or false. True will return leap day data if present, false will not.
leap_year = 'false'
# Set time interval in minutes, i.e., '30' is half hour intervals. Valid intervals are 30 & 60.
interval = '30'


# Declare url string
url = 'https://developer.nrel.gov/api/nsrdb/v2/solar/psm3-download.csv?' \
    f'wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap_year}&interval={interval}&utc={utc}&' \
    f'full_name={your_name}&email={your_email}&affiliation={your_affiliation}&mailing_list={mailing_list}&' \
    f'reason={reason_for_use}&api_key={api_key}&attributes={attributes}'
# Return all but first 2 lines of csv to get data:
df = pd.read_csv(url, skiprows=2)

df.head()