# Solar energy related datasets 

### 1. **National Solar Radiation Database (`NSRDB`)**: a high temporal and spatial resolution dataset consisting of the three most widely used measurements of solar radiation—global horizontal, direct normal, and diffuse horizontal irradiance—as well as other meteorological data. [More information about NSRDB data](https://nsrdb.nrel.gov/about/what-is-the-nsrdb)
- Physical Solar Model (PSM) version 3:
    - Region: USA (Continental) & Mexico
    - Time Covered: 2018 - 2022
    - Temporal Resolution: 5 mins
    - Spatial Resolution: 2 X 2 km
    - Data Access: NREL endpoint API, NREL HSDS Server, Azure, AWS, GCS

### 2. **The Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (`Sup3rCC`)** : a collection of 4km hourly wind, solar, temperature, humidity, and pressure fields for the contiguous United States under various climate change scenarios. It utilizes a generative machine learning approach called Sup3r (Super-Resolution for Renewable Energy Resource Data) to downscale Global Climate Model (GCM) data. [More information about Sup3rCC](https://www.nrel.gov/analysis/sup3rcc.html)
- Region: Contiguous United States
- Time Covered: 2015 - 2059
- Temporal Resolution: 60 mins
- Spatial Resolution: 4km
- Data Access: AWS, NREL HSDS Server
### 3. **`Solcast`** is a private company dedicated in generating live and forecast solar datasets globally in high temporal and spatial resolution. It provides API toolkit for people to access the data at no cost by creating an account with no credit card required. [More information about Solcast](https://solcast.com/)
- Historical data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: January 2007 to 7 days ago
    - Temporal Resolution: 5, 10, 15, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API
        - Solcast Python SDK: https://solcast.github.io/solcast-api-python-sdk/ 
- Live and Forecast data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: -7 days to +14 days
    - Temporal Resolution: 5, 10, 15, 20, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API

---

## This notebook demonstrates ways of accessing different solar related datasets that uses the `First Energy Fort Martin Solar Site` as an example:
- Latitude: 39.75 N
- Longitude: -79.95 W

---

The following examples require these libraries/packages to be installed:

In [None]:
!pip install pandas
!pip install numpy
!pip install s3path
!pip install alive-progress
!pip install pysam
!pip install xarray
!pip install --user solcast

---

### NSRDB Demonstration
- Year 2022
- Fort Martin Site
- 5 mins temporal resolution
- **API Key needed**: https://developer.nrel.gov/signup/

In [None]:
import pandas as pd
from pandas import DataFrame
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
import numpy as np
import PySAM.PySSC as pssc
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from typing import List
from alive_progress import alive_bar

# url parameters
# the lat and long for the Fort Martin solar site
lat = 39.75
lon = -79.95

interval = 5
leap_year = 'false'
utc = 'false'
mailing_list = 'false'
api_key = "Your_API_Key"
your_name = "Justin+Lin"
reason_for_use = "Research"
your_affiliation = "HTF"
your_email = "slin@wvhtf.org"
dataset = "psm3-5min-download"
attributes = ""

def Get_URLs_From_NSRDB(start_year:int, end_year = None) -> List[str]:
    UrlList = []
    
    if end_year is not None:
        for year in range(start_year, end_year+1):
            url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                    f"wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap_year}&interval={interval}"\
                    f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                    f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
            UrlList.append(url)
    else: 
        url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                f"wkt=POINT({lon}%20{lat})&names={start_year}&leap_day={leap_year}&interval={interval}"\
                f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
        UrlList.append(url)
    
    return UrlList

data_urls = Get_URLs_From_NSRDB(2022)
FM_NSRDB_2022 = pd.read_csv(data_urls[0], low_memory=False)

# print the data for the user to see
FM_NSRDB_2022

---

### Sup3rCC Demonstration
- Years 2023 to 2024
- Monongalia County, WV (where Fort Martin solar site is located)

In [None]:
# import libraries
from s3path import S3Path
import xarray as xr
import s3fs

# s3fs is the package used to access AWS S3 buckets
# the data is public, no need for credentials
fs = s3fs.S3FileSystem(anon=True)
appended_data = []
# the data covers year from 2023 to 2024
start_year = 2023
end_year = 2024
# specify climate data types and attributes 
climate_type = ['solar', 'wind', 'trh', 'pressure']
attributes = [['ghi', 'dni', 'dhi'], 'windspeed_10m', ['temperature_2m', 'relativehumidity_2m'], 'pressure_0m']
climate_attr_dict = {climate_type: attributes for climate_type, attributes in zip(climate_type, attributes)}
# calculate the total number in the loop
attributes_num = len(attributes)
year_num = len(range(start_year, end_year + 1))
loop_total_num = year_num * attributes_num

# specify the components of the S3 URI
# cloud_type = "s3://"
bucket = "nrel-pds-sup3rcc"
folder = "conus_mriesm20_ssp585_r1i1p1f1"
version = "v0.1.0"
file_base = "sup3rcc_conus_mriesm20_ssp585_r1i1p1f1"
file_extension = "h5"

URI_base = S3Path(f'/{bucket}/{folder}/{version}/{file_base}')

with alive_bar(loop_total_num, force_tty=True, title='Running', length=20, bar = 'smooth') as bar:

    for climate in climate_type:
        
        for year in range(start_year, end_year + 1):
            URI = f"{URI_base.as_uri()}_{climate}_{year}.{file_extension}"

            # use `xarray` with engine `h5netcdf` to access data
            ds = xr.open_dataset(fs.open(URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf') # type: ignore

            time_index = pd.to_datetime(ds['time_index'][...].astype(str)) # type: ignore
            meta = pd.DataFrame(ds.meta.data)
            FM_site_index = meta[(meta.county == b'Monongalia') & (meta.elevation == 318)].index[0]
            attrs = [v for k, v in climate_attr_dict.items() if climate == k][0]

            # subset the data at FM site
            if climate in ['solar', 'trh']:
                for att in attrs:
                    subset = ds[att][:, FM_site_index].load()
                    data = pd.DataFrame({f"{att}" : subset}, index = time_index)
                    appended_data.append(data)
            else:
                subset = ds[attrs][:, FM_site_index].load()
                data = pd.DataFrame({f"{attrs}" : subset}, index = time_index)
                appended_data.append(data)
            
            # update the progress bar
            bar()

# concatenate all the data
FM_Sup3rCC_2023_2024 = pd.concat(appended_data)

# rename the attributes with measurements
FM_Sup3rCC_2023_2024.rename({'ghi':"ghi (W/m2)",
                   'dni':'dni (W/m2)', 
                   'dhi':'dhi (W/m2)', 
                   'windspeed_10m': "Windspeed (m/s)", 
                   'temperature_2m': "Temperature (C)", 
                   'relativehumidity_2m': "Relative Humidity (%)",
                   'pressure_0m': "Pressure (hPa)"}, axis=1, inplace=True)

# adjust the scale of measurements
FM_Sup3rCC_2023_2024["Temperature (C)"] = FM_Sup3rCC_2023_2024["Temperature (C)"] / 10000
FM_Sup3rCC_2023_2024["Windspeed (m/s)"] = FM_Sup3rCC_2023_2024["Windspeed (m/s)"] / 10000
FM_Sup3rCC_2023_2024["Relative Humidity (%)"] = FM_Sup3rCC_2023_2024["Relative Humidity (%)"] / 10000

FM_Sup3rCC_2023_2024

---

### Solcast Demonstration
- Fort Martin site: -7 days to present
- 30 mins temporal resolution
- **API Key needed**: https://toolkit.solcast.com.au/register

In [None]:
API_key = "Your_API_Key"

# The URL for the weather site in Solcast
URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/estimated_actuals?format=csv&api_key={API_key}"

FM_SOLCAST_LIVE = pd.read_csv(URL)

FM_SOLCAST_LIVE

In [None]:
# location of the Fort Martin solar site
FM_lat = 39.75
FM_lon = -79.95
hours = 168 # 7 days
period = 'PT5M'
output_parameters = 'air_temp,dni,ghi'
azimuth = 0
tilt = 23
array_type = 'fixed'
format = 'csv'

url = "https://api.solcast.com.au/data/live/radiation_and_weather?"\
      f"latitude={FM_lat}&longitude={FM_lon}&hours={hours}&period={period}&"\
      f"output_parameters{output_parameters}&azimuth={azimuth}&tilt={tilt}&"\
      f"array_type={array_type}&format={format}&api_key={API_key}"

df = pd.read_csv(url)
df

- Solcast Python SDK
- use unmetered location 'Sydney Opera House' for testing

In [None]:
from solcast import live
from solcast.unmetered_locations import UNMETERED_LOCATIONS

# use unmetered location for testing to avoid API requests consumption
sydney = UNMETERED_LOCATIONS['Sydney Opera House']
res = live.radiation_and_weather(
    api_key = API_key,
    latitude = sydney['latitude'], 
    longitude = sydney['longitude'],
    hours = 168,
    period = 'PT10M',
    output_parameters = 'dni,ghi'
)

test_df = res.to_pandas()

test_df

- Fort Martin site: present to +7 days
- 30 mins temporal resolution
- **API Key needed**: https://toolkit.solcast.com.au/register

In [None]:
URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/forecasts?format=csv&api_key={API_key}"

FM_SOLCAST_FORECAST = pd.read_csv(URL)

FM_SOLCAST_FORECAST

- FM site: 20200101 to 20240805
- 60 mins temporal resolution

In [None]:
# This demonstration is reading a local CSV file as the the data source
file_name = './Solcast_FortMartin_20200101_20240805.csv'
FM_SOLCAST_HISTORIC = pd.read_csv(file_name)
FM_SOLCAST_HISTORIC = FM_SOLCAST_HISTORIC[ ['period_end'] + [ col for col in FM_SOLCAST_HISTORIC.columns if col != 'period_end' ] ]
FM_SOLCAST_HISTORIC.drop('period', axis=1, inplace=True)
FM_SOLCAST_HISTORIC.rename({'period_end':'time'}, axis = 1, inplace=True)
FM_SOLCAST_HISTORIC['time'] = pd.to_datetime(FM_SOLCAST_HISTORIC.time).dt.strftime('%Y-%m-%d %H:%M:%S')
FM_SOLCAST_HISTORIC['time'] = pd.to_datetime(FM_SOLCAST_HISTORIC.time)

FM_SOLCAST_HISTORIC

- Solcast Historic Data Parameters Documentation

In [None]:
file_name = './Solcast_Historic_Parameters_Documentation.csv'
SOLCAST_PARAMETERS_DOC = pd.read_csv(file_name)
SOLCAST_PARAMETERS_DOC