# NSRDB to EPW Pipeline

## Introduction

This notebook allows the downloading of climate data and automtic conversion into EPW files for any year (where available dataset exists on NSRDB) for almost every location in the Americas.

### Conceptual steps

+ Gather your query location as a [WKT geometry](https://libgeos.org/specifications/wkt/) (in WGS84 CRS, could be Point, Polygon, MultiPolygon, etc. but a minimum working example is a Point) and prepare it as a string
+ Determine the dataset you would like to query, and the appropriate temporal resolutions and the years you neeed.
+ Obtain an API key. You can [sign up for your API key](https://developer.nrel.gov/signup/).
+ Translate the geometry into NSRDB point_ids (automated by the pipeline, no need to worry)
+ Get the weather data about the associated point_ids and parse them into DataFrames and write them as CSVs (automated by the pipeline, no need to worry)
+ Translate these DataFrames into EPW files and write them (automated by the pipeline, no need to worry)

### Credits

The dataset and a interactive web portal is available via [NSRDB Data Viewer](https://nsrdb.nrel.gov/data-viewer). This pipeline takes advantage of the sample query code provided here.

Thanks to [Patrick's script](https://github.com/building-energy/epw/blob/master/epw/epw.py) we have a ready-made workflow for EPW file generation.

## Steps

### 1. Prepare your WKT geometry

Prepare your WKT geometry representing the area of investigation as a string. Further guidance available [here](https://libgeos.org/specifications/wkt/).

A minimum working example is a Point, such as `POINT(-76.48408307172359 42.45094507085529)` is the location of Cornell AAP.

### 2. Determine the right temporal resolution and coverage

By referring to the table below, determine the right temporal resolution and coverage.

Datasets and their coverage:

|Geographies|Name|Temporal Resolution|Geographical Resolution|Years (Inclusive)|
|------|------|------|------|------|
|USA Continental and Mexico|`nsrdb-GOES-conus`|5, 30, 60min|2km|2021-23|
|USA and Americas|`nsrdb-GOES-full-disc`|10, 30, 60min|2km|2018-23|
|USA and Americas|`nsrdb-GOES-aggregated`|30, 60min|4km|1998-23|
|USA and Americas|`nsrdb-GOES-tmy`|60min| |2022-23|

### 3. Obtain an API key
You are suggested to [sign up for your API key](https://developer.nrel.gov/signup/) before working with the script. For lab purposes you can use the key provided (it is my key actually so pay attention to the payload if you are doing batch downloads for larger regions).

### 4. Run the script with the inputs

Provide the inputs for the script to run.

## API KEY

You can [sign up for your API key](https://developer.nrel.gov/signup/) to use it this script.

In [4]:
API_KEY = 'VR0y2pOyC6BMFt1I6gkFMipFc1o4ixgWUbnEhkPH' # use your own key if possible
#with open('archive/key.txt', mode='r', encoding='utf-8') as f:
#    API_KEY = f.read().strip()
assert API_KEY != ''

## Inputs

### Non essential metadata - put in right information for clarity

In [6]:
# Non critial metadata
LOCATION = 'Ithaca' # just naming
STATE = 'STATE' # just naming
COUNTRY = 'United States' # just naming
EMAIL = "cl2749@cornell.edu" # your email, does not really matter if you are downloading csv directly

### Critical inputs - these will determine the result

+ `WKT` is the WKT (Well Known Text) representation of the location you are downloading the EPW files. Technically all WKT geometries are accepted, including points, polygons, and multipolygons. **But if you are not familiar with this concept, simply input the lat-long point of the city/town you are working on**. For example `POINT(-76.48408307172359 42.45094507085529)` (**no comma!**) is for the location of Cornell AAP.
+ `DATASET` is the full name of the dataset you are downloading from. Choose one from the dictionary `dataset_names` above.
+ `INTERVAL` is the temporal resolution. Pay attention to what data is available by referring to the table above. Note that this field is always a **string** type!
+ `YEARS` is the list of years to download data. Pay attention to what data is available by referring to the table above. Note that this field is always a list of **string**s.
+ `RESULTS_DIR` is the folder location to save the downloaded files. Include the dash symbols `/`. For example: `my_location/` is good.

In [8]:
from nsrdb2epw import get_dataset_names

dataset_names = get_dataset_names()
dataset_names

{'CONUS': 'nsrdb-GOES-conus-v4-0-0',
 'full-disc': 'nsrdb-GOES-full-disc-v4-0-0',
 'TMY': 'nsrdb-GOES-tmy-v4-0-0',
 'aggregated': 'nsrdb-GOES-aggregated-v4-0-0'}

In [12]:
# Lat, Long representation of the location, as a minimum working example.
# Example: 'POINT(-76.48408307172359 42.45094507085529)' is the location of Cornell AAP
WKT = 'POLYGON ((-76.550854 42.349219, -76.247357 42.349219, -76.247357 42.570085, -76.550854 42.570085, -76.550854 42.349219))'

DATASET = dataset_names['full-disc'] # see dataset_names and table above. example: dataset_names['full-disc']
INTERVAL = '60' # temporal resolution, example: '60'
YEARS = ['2023'] # example: ['2020', '2021', '2022', '2023']
RESULTS_DIR='results/' # example: 'results/'

## Run (new workflow)

In [None]:
from nsrdb2epw import nsrdb2epw
nsrdb2epw(
    WKT,
    DATASET,
    INTERVAL,
    YEARS,
    API_KEY,
    RESULTS_DIR=RESULTS_DIR,
    LOCATION=LOCATION,
    STATE=STATE,
    COUNTRY=COUNTRY,
    EMAIL=EMAIL
)

Processing name: 2023
Making request for point group 1 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 2 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 3 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 4 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 5 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 6 of 176...
Response data (you should replace this print statement with your processing): (8762, 46)
Processed
Making request for point group 7 of 176...
Response data (you should replace this print statement with your processing): (8762