# Solar energy related datasets 

### 1. **National Solar Radiation Database (`NSRDB`)**: a high temporal and spatial resolution dataset consisting of the three most widely used measurements of solar radiation—global horizontal, direct normal, and diffuse horizontal irradiance—as well as other meteorological data. [More information to NSRDB data](https://nsrdb.nrel.gov/about/what-is-the-nsrdb)
- Physical Solar Model (PSM) version 3:
    - Region: USA (Continental) & Mexico
    - Time Covered: 2018 - 2022
    - Temporal Resolution: 5 mins
    - Spatial Resolution: 2 X 2 km
    - Data Access: NREL endpoint API, NREL HSDS Server, Azure, AWS, GCS
- **System Advisor Model (`SAM`)** : a performance and financial model designed to estimate the cost of energy for grid-connected power projects based on installation and operating costs and system design in order to facilitate decision making for people involved in the renewable energy industry. [More information to SAM](https://sam.nrel.gov)
### 2. **The Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (`Sup3rCC`)** : a collection of 4km hourly wind, solar, temperature, humidity, and pressure fields for the contiguous United States under various climate change scenarios. It utilizes a generative machine learning approach called Sup3r (Super-Resolution for Renewable Energy Resource Data) to downscale Global Climate Model (GCM) data. [More information to Sup3rCC](https://www.nrel.gov/analysis/sup3rcc.html)
- Region: contiguous United States
- Time Covered: 2015 - 2059
- Temporal Resolution: 60 mins
- Spatial Resolution: 4km
- Data Access: AWS, NREL HSDS Server
### 3. **Solcast** is a company dedicated in generating live and forecast solar datasets globally in high temporal and spatial resolution. It provides API toolkit for people to access the data at no cost by creating an account with no credit card required. [More information to Solcast](https://solcast.com/)
- Historical data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: January 2007 to 7 days ago
    - Temporal Resolution: 5, 10, 15, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API
- Live and Forecast data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: -7 days to +14 days
    - Temporal Resolution: 5, 10, 15, 20, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API
## This notebook demonstrates ways of accessing different solar related datasets that uses `Fort Martin Solar Site` as an example:
- Latitude: 39.75
- Longitude: -79.95
---

### NSRDB

In [16]:
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import PySAM.PySSC as pssc
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from typing import List
from alive_progress import alive_bar

# url parameters
lat = 39.75
lon = -79.95
interval = 5
leap_year = 'false'
utc = 'false'
mailing_list = 'false'
api_key = "2aD0f1cpYYogKvIhgzCCEsuBHnVvfGhcaItjJnAU"
your_name = "Justin+Lin"
reason_for_use = "Research"
your_affiliation = "HTF"
your_email = "slin@wvhtf.org"
dataset = "psm3-5min-download"
attributes = ""

def Get_URLs_From_NSRDB(start_year:int, end_year = None) -> List[str]:
    UrlList = []
    
    if end_year is not None:
        for year in range(start_year, end_year+1):
            url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                    f"wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap_year}&interval={interval}"\
                    f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                    f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
            UrlList.append(url)
    else: 
        url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                f"wkt=POINT({lon}%20{lat})&names={start_year}&leap_day={leap_year}&interval={interval}"\
                f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
        UrlList.append(url)
    
    return UrlList

data_urls = Get_URLs_From_NSRDB(2022)
FM_NSRDB_2022 = pd.read_csv(data_urls[0], low_memory=False)
FM_NSRDB_2022

### PySAM

In [None]:
# FM site parameters
# Set system capacity in MW
system_capacity = 18.89
# Set DC/AC ratio (or power ratio).
dc_ac_ratio = 1.06
# Set tilt of system in degrees
tilt = 23
# Set azimuth angle (in degrees) from north (0 degrees)
azimuth = 0 
# Set the inverter efficency
inv_eff = 96
# Set the system losses, in percent
losses = 13.1
# Specify fixed tilt system (0=Fixed, 1=Fixed Roof, 2=1 Axis Tracker, 3=Backtracted, 4=2 Axis Tracker)
array_type = 0
# Set ground coverage ratio
gcr = 0.55
# Set constant loss adjustment
adjust_constant = 0

info = pd.read_csv(Get_URLs_From_NSRDB(2022)[0], nrows=1)
# get timezone and elevation data for simulation model inputs
timezone, elevation = info['Local Time Zone'], info['Elevation']

def Solar_Power_Simulation(NSRDB_urls:List[str]) -> None:

    appended_data = []

    with alive_bar(len(NSRDB_urls), force_tty=True, title='Simulating', length=20, bar = 'smooth') as bar:

        for url in NSRDB_urls:

            df = pd.read_csv(url, skiprows=2)

            # SAM Model for solar simulation
            ssc = pssc.PySSC()

            # Resource inputs for SAM model:
            # Must be byte strings
            wfd = ssc.data_create()
            ssc.data_set_number(wfd, b'lat', lat)
            ssc.data_set_number(wfd, b'lon', lon)
            ssc.data_set_number(wfd, b'tz', timezone)
            ssc.data_set_number(wfd, b'elev', elevation)
            ssc.data_set_array(wfd, b'year', df['Year'])
            ssc.data_set_array(wfd, b'month', df['Month'])
            ssc.data_set_array(wfd, b'day', df['Day'])
            ssc.data_set_array(wfd, b'hour', df['Hour'])
            ssc.data_set_array(wfd, b'minute', df['Minute'])
            ssc.data_set_array(wfd, b'dn', df['DNI'])
            ssc.data_set_array(wfd, b'df', df['DHI'])
            ssc.data_set_array(wfd, b'wspd', df['Wind Speed'])
            ssc.data_set_array(wfd, b'tdry', df['Temperature'])

            # Create SAM compliant object  
            dat = ssc.data_create()
            ssc.data_set_table(dat, b'solar_resource_data', wfd)
            ssc.data_free(wfd)

            ssc.data_set_number(dat, b'system_capacity', system_capacity)
            ssc.data_set_number(dat, b'dc_ac_ratio', dc_ac_ratio)
            ssc.data_set_number(dat, b'tilt', tilt)
            ssc.data_set_number(dat, b'azimuth', azimuth)
            ssc.data_set_number(dat, b'inv_eff', inv_eff)
            ssc.data_set_number(dat, b'losses', losses)
            ssc.data_set_number(dat, b'array_type', array_type)
            ssc.data_set_number(dat, b'gcr', gcr)
            ssc.data_set_number(dat, b'adjust:constant', adjust_constant)

            # execute and put generation results back into dataframe
            mod = ssc.module_create(b'pvwattsv5')
            ssc.module_exec(mod, dat)
            df['generation'] = np.array(ssc.data_get_array(dat, b'gen'))

            # free the memory
            ssc.data_free(dat)
            ssc.module_free(mod)

            appended_data.append(df)

            bar()

    final_data = pd.concat(appended_data)

    print(f'\033[1mThis dataset has {final_data.shape[0]} rows and {final_data.shape[1]} columns\033[0m')
    return final_data

data_urls = Get_URLs_From_NSRDB(2022)
FM_SAM_2022 = Solar_Power_Simulation(data_urls)
FM_SAM_2022

### Sup3rCC
- year 2023 to 2024
- Monongalia County, WV (where FM solar site is located)

In [None]:
# import libraries
import xarray as xr
import s3fs
import os

# s3fs is the package used to access AWS S3 buckets
# the data is public, no need for credentials
fs = s3fs.S3FileSystem(anon=True)
appended_data = []
# the data covers year from 2023 to 2024
start_year = 2023
end_year = 2024
# specify climate data types and attributes 
climate_type = ['solar', 'wind', 'trh', 'pressure']
attributes = [['ghi', 'dni', 'dhi'], 'windspeed_10m', ['temperature_2m', 'relativehumidity_2m'], 'pressure_0m']
climate_attr_dict = {climate_type: attributes for climate_type, attributes in zip(climate_type, attributes)}
# calculate the total number in the loop
attributes_num = len(attributes)
year_num = len(range(start_year, end_year + 1))
loop_total_num = year_num * attributes_num
# specify the components of the S3 URI
cloud_type = "s3://"
bucket = "nrel-pds-sup3rcc/"
folder = "conus_mriesm20_ssp585_r1i1p1f1/"
version = "v0.1.0/"
file_base = "sup3rcc_conus_mriesm20_ssp585_r1i1p1f1"
file_extension = "h5"
URI_base = os.path.join(cloud_type, bucket, folder, version, file_base)

with alive_bar(loop_total_num, force_tty=True, title='Running', length=20, bar = 'smooth') as bar:

    for climate in climate_type:
        
        for year in range(start_year, end_year + 1):
            URI = f"{URI_base}_{climate}_{year}.{file_extension}"
            # use `xarray` with engine `h5netcdf` to access data
            ds = xr.open_dataset(fs.open(URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf')

            time_index = pd.to_datetime(ds['time_index'][...].astype(str))
            meta = pd.DataFrame(ds.meta.data)
            FM_site_index = meta[(meta.county == b'Monongalia') & (meta.elevation == 318)].index[0]
            attrs = [v for k, v in climate_attr_dict.items() if climate == k][0]

            # subset the data with specified attribute, all time index, and WV Marion County index
            if climate in ['solar', 'trh']:
                for att in attrs:
                    subset = ds[att][:, FM_site_index].load()
                    data = pd.DataFrame({f"{att}" : subset}, index = time_index)
                    appended_data.append(data)
            else:
                subset = ds[attrs][:, FM_site_index].load()
                data = pd.DataFrame({f"{attrs}" : subset}, index = time_index)
                appended_data.append(data)
            bar()

# concatenate all the data and groupby year to get the mean value
FM_Sup3rCC_2023_2024 = pd.concat(appended_data)

FM_Sup3rCC_2023_2024.rename({'ghi':"ghi (W/m2)",
                   'dni':'dni (W/m2)', 
                   'dhi':'dhi (W/m2)', 
                   'windspeed_10m': "Windspeed (m/s)", 
                   'temperature_2m': "Temperature (C)", 
                   'relativehumidity_2m': "Relative Humidity (%)",
                   'pressure_0m': "Pressure (hPa)"}, axis=1, inplace=True)

FM_Sup3rCC_2023_2024["Temperature (C)"] = FM_Sup3rCC_2023_2024["Temperature (C)"] / 10000
FM_Sup3rCC_2023_2024["Windspeed (m/s)"] = FM_Sup3rCC_2023_2024["Windspeed (m/s)"] / 10000
FM_Sup3rCC_2023_2024["Relative Humidity (%)"] = FM_Sup3rCC_2023_2024["Relative Humidity (%)"] / 10000

FM_Sup3rCC_2023_2024

### Solcast
- FM site: -7 days to present

In [26]:
API_key = "Your_API_Key"

URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/estimated_actuals?format=csv&api_key={API_key}"

FM_SOLCAST_LIVE = pd.read_csv(URL)

FM_SOLCAST_LIVE

Unnamed: 0,ghi,ebh,dni,dhi,cloud_opacity,period_end,period
0,756,593,748,164,5,2024-08-12T15:30:00Z,PT30M
1,723,616,839,107,0,2024-08-12T15:00:00Z,PT30M
2,649,549,823,99,0,2024-08-12T14:30:00Z,PT30M
3,563,468,790,95,0,2024-08-12T14:00:00Z,PT30M
4,470,381,749,88,0,2024-08-12T13:30:00Z,PT30M
...,...,...,...,...,...,...,...
332,867,621,674,246,2,2024-08-05T17:30:00Z,PT30M
333,851,602,662,249,2,2024-08-05T17:00:00Z,PT30M
334,847,636,717,211,0,2024-08-05T16:30:00Z,PT30M
335,808,605,710,203,0,2024-08-05T16:00:00Z,PT30M


- FM site: present to +7 days

In [28]:
URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/forecasts?format=csv&api_key={API_key}"

FM_SOLCAST_FORECAST = pd.read_csv(URL)

FM_SOLCAST_FORECAST

Unnamed: 0,ghi,ghi90,ghi10,ebh,dni,dni90,dni10,dhi,air_temp,zenith,azimuth,cloud_opacity,period_end,period
0,756,773,715,593,748,803,617,164,22,38,-122,5,2024-08-12T15:30:00Z,PT30M
1,822,847,761,675,805,883,622,147,23,33,-132,3,2024-08-12T16:00:00Z,PT30M
2,852,887,775,686,785,890,577,166,24,29,-143,4,2024-08-12T16:30:00Z,PT30M
3,866,913,741,681,760,890,455,185,24,26,-158,5,2024-08-12T17:00:00Z,PT30M
4,865,924,701,662,730,888,363,202,25,25,-174,6,2024-08-12T17:30:00Z,PT30M
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
332,354,445,77,181,368,746,0,172,20,60,-98,21,2024-08-19T13:30:00Z,PT30M
333,423,538,101,223,387,794,0,200,21,55,-103,21,2024-08-19T14:00:00Z,PT30M
334,482,623,137,253,388,829,0,230,22,49,-110,23,2024-08-19T14:30:00Z,PT30M
335,535,700,168,281,390,857,0,255,23,44,-116,24,2024-08-19T15:00:00Z,PT30M
