# Solar energy related datasets 

### 1. **National Solar Radiation Database (`NSRDB`)**: a high temporal and spatial resolution dataset consisting of the three most widely used measurements of solar radiation—global horizontal, direct normal, and diffuse horizontal irradiance—as well as other meteorological data. [More information about NSRDB data](https://nsrdb.nrel.gov/about/what-is-the-nsrdb)
- Physical Solar Model (PSM) version 3:
    - Region: USA (Continental) & Mexico
    - Time Covered: 2018 - 2022
    - Temporal Resolution: 5 mins
    - Spatial Resolution: 2 X 2 km
    - Data Access: NREL endpoint API, NREL HSDS Server, Azure, AWS, GCS
- **System Advisor Model (`SAM`)** : a performance and financial model designed to estimate the cost of energy for grid-connected power projects based on installation and operating costs and system design in order to facilitate decision making for people involved in the renewable energy industry. [More information to SAM](https://sam.nrel.gov)
### 2. **The Super-Resolution for Renewable Energy Resource Data with Climate Change Impacts (`Sup3rCC`)** : a collection of 4km hourly wind, solar, temperature, humidity, and pressure fields for the contiguous United States under various climate change scenarios. It utilizes a generative machine learning approach called Sup3r (Super-Resolution for Renewable Energy Resource Data) to downscale Global Climate Model (GCM) data. [More information about Sup3rCC](https://www.nrel.gov/analysis/sup3rcc.html)
- Region: Contiguous United States
- Time Covered: 2015 - 2059
- Temporal Resolution: 60 mins
- Spatial Resolution: 4km
- Data Access: AWS, NREL HSDS Server
### 3. **`Solcast`** is a private company dedicated in generating live and forecast solar datasets globally in high temporal and spatial resolution. It provides API toolkit for people to access the data at no cost by creating an account with no credit card required. [More information about Solcast](https://solcast.com/)
- Historical data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: January 2007 to 7 days ago
    - Temporal Resolution: 5, 10, 15, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API
- Live and Forecast data:
    - Region: Global (note: far ocean and polar regions are coarser resolution)
    - Time Covered: -7 days to +14 days
    - Temporal Resolution: 5, 10, 15, 20, 30 & 60 minutes (period-mean values)
    - Spatial Resolution: 90 meters 
    - Data Access: Solcast Web Toolkit and Solcast API

---

## This notebook demonstrates ways of accessing different solar related datasets that uses the `First Energy Fort Martin Solar Site` as an example:
- Latitude: 39.75 N
- Longitude: -79.95 W

---

The following examples require these libraries/packages to be installed:

In [None]:
!pip install pandas
!pip install numpy
!pip install s3path
!pip install alive-progress
!pip install pysam
!pip install xarray

---

### NSRDB Demonstration
- Year 2022
- Fort Martin Site
- 5 mins temporal resolution
- **API Key needed**: https://developer.nrel.gov/signup/

In [2]:
import pandas as pd
from pandas import DataFrame
pd.set_option('display.max_columns', None)
import numpy as np
import PySAM.PySSC as pssc
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from typing import List
from alive_progress import alive_bar

# url parameters
# the lat and long for the Fort Martin solar site
lat = 39.75
lon = -79.95

interval = 5
leap_year = 'false'
utc = 'false'
mailing_list = 'false'
api_key = "Your_API_Key"
your_name = "Justin+Lin"
reason_for_use = "Research"
your_affiliation = "HTF"
your_email = "slin@wvhtf.org"
dataset = "psm3-5min-download"
attributes = ""

def Get_URLs_From_NSRDB(start_year:int, end_year = None) -> List[str]:
    UrlList = []
    
    if end_year is not None:
        for year in range(start_year, end_year+1):
            url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                    f"wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap_year}&interval={interval}"\
                    f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                    f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
            UrlList.append(url)
    else: 
        url = f"https://developer.nrel.gov/api/nsrdb/v2/solar/{dataset}.csv?"\
                f"wkt=POINT({lon}%20{lat})&names={start_year}&leap_day={leap_year}&interval={interval}"\
                f"&utc={utc}&full_name={your_name}&email={your_email}&affiliation={your_affiliation}"\
                f"&mailing_list={mailing_list}&reason={reason_for_use}&api_key={api_key}&attributes={attributes}"
        UrlList.append(url)
    
    return UrlList

data_urls = Get_URLs_From_NSRDB(2022)
FM_NSRDB_2022 = pd.read_csv(data_urls[0], low_memory=False)

# print the data for the user to see
FM_NSRDB_2022

HTTPError: HTTP Error 403: Forbidden

---

### PySAM  Demonstration

In [None]:
# characteristics of the Fort Martin Solar Energy site
# Set system capacity in MW
system_capacity = 18.89
# Set DC/AC ratio (or power ratio).
dc_ac_ratio = 1.06
# Set tilt of system in degrees
tilt = 23
# Set azimuth angle (in degrees) from north (0 degrees)
azimuth = 0 
# Set the inverter efficency
inv_eff = 96
# Set the system losses, in percent
losses = 13.1
# Specify fixed tilt system (0=Fixed, 1=Fixed Roof, 2=1 Axis Tracker, 3=Backtracted, 4=2 Axis Tracker)
array_type = 0
# Set ground coverage ratio
gcr = 0.55
# Set constant loss adjustment
adjust_constant = 0

info = pd.read_csv(Get_URLs_From_NSRDB(2022)[0], nrows=1)
# get timezone and elevation data for simulation model inputs
timezone, elevation = info['Local Time Zone'], info['Elevation']

def Solar_Power_Simulation(NSRDB_urls:List[str]) -> DataFrame:

    appended_data = []

    with alive_bar(len(NSRDB_urls), force_tty=True, title='Simulating', length=20, bar = 'smooth') as bar:

        for url in NSRDB_urls:

            df = pd.read_csv(url, skiprows=2)

            # SAM Model for solar simulation
            ssc = pssc.PySSC()

            # Resource inputs for SAM model:
            # Must be byte strings
            wfd = ssc.data_create()
            ssc.data_set_number(wfd, b'lat', lat)
            ssc.data_set_number(wfd, b'lon', lon)
            ssc.data_set_number(wfd, b'tz', timezone)
            ssc.data_set_number(wfd, b'elev', elevation)
            ssc.data_set_array(wfd, b'year', df['Year'])
            ssc.data_set_array(wfd, b'month', df['Month'])
            ssc.data_set_array(wfd, b'day', df['Day'])
            ssc.data_set_array(wfd, b'hour', df['Hour'])
            ssc.data_set_array(wfd, b'minute', df['Minute'])
            ssc.data_set_array(wfd, b'dn', df['DNI'])
            ssc.data_set_array(wfd, b'df', df['DHI'])
            ssc.data_set_array(wfd, b'wspd', df['Wind Speed'])
            ssc.data_set_array(wfd, b'tdry', df['Temperature'])

            # Create SAM compliant object
            dat = ssc.data_create()
            ssc.data_set_table(dat, b'solar_resource_data', wfd)
            ssc.data_free(wfd)

            ssc.data_set_number(dat, b'system_capacity', system_capacity)
            ssc.data_set_number(dat, b'dc_ac_ratio', dc_ac_ratio)
            ssc.data_set_number(dat, b'tilt', tilt)
            ssc.data_set_number(dat, b'azimuth', azimuth)
            ssc.data_set_number(dat, b'inv_eff', inv_eff)
            ssc.data_set_number(dat, b'losses', losses)
            ssc.data_set_number(dat, b'array_type', array_type)
            ssc.data_set_number(dat, b'gcr', gcr)
            ssc.data_set_number(dat, b'adjust:constant', adjust_constant)

            # execute and put generation results back into dataframe
            mod = ssc.module_create(b'pvwattsv5')
            ssc.module_exec(mod, dat)
            df['generation'] = np.array(ssc.data_get_array(dat, b'gen'))

            # free the memory
            ssc.data_free(dat)
            ssc.module_free(mod)

            appended_data.append(df)

            # update the progress bar
            bar()

    final_data = pd.concat(appended_data)

    print(f'\033[1mThis dataset has {final_data.shape[0]} rows and {final_data.shape[1]} columns\033[0m')
    return final_data

data_urls = Get_URLs_From_NSRDB(2022)
FM_SAM_2022 = Solar_Power_Simulation(data_urls)

FM_SAM_2022

---

### Sup3rCC Demonstration
- year 2023 to 2024
- Monongalia County, WV (where Fort Martin solar site is located)

In [None]:
# import libraries
from s3path import S3Path
import xarray as xr
import s3fs

# s3fs is the package used to access AWS S3 buckets
# the data is public, no need for credentials
fs = s3fs.S3FileSystem(anon=True)
appended_data = []
# the data covers year from 2023 to 2024
start_year = 2023
end_year = 2024
# specify climate data types and attributes 
climate_type = ['solar', 'wind', 'trh', 'pressure']
attributes = [['ghi', 'dni', 'dhi'], 'windspeed_10m', ['temperature_2m', 'relativehumidity_2m'], 'pressure_0m']
climate_attr_dict = {climate_type: attributes for climate_type, attributes in zip(climate_type, attributes)}
# calculate the total number in the loop
attributes_num = len(attributes)
year_num = len(range(start_year, end_year + 1))
loop_total_num = year_num * attributes_num

# specify the components of the S3 URI
# cloud_type = "s3://"
bucket = "nrel-pds-sup3rcc"
folder = "conus_mriesm20_ssp585_r1i1p1f1"
version = "v0.1.0"
file_base = "sup3rcc_conus_mriesm20_ssp585_r1i1p1f1"
file_extension = "h5"

URI_base = S3Path(f'/{bucket}/{folder}/{version}/{file_base}')

with alive_bar(loop_total_num, force_tty=True, title='Running', length=20, bar = 'smooth') as bar:

    for climate in climate_type:
        
        for year in range(start_year, end_year + 1):
            URI = f"{URI_base.as_uri()}_{climate}_{year}.{file_extension}"

            # use `xarray` with engine `h5netcdf` to access data
            ds = xr.open_dataset(fs.open(URI), backend_kwargs={"phony_dims": "sort"}, engine='h5netcdf') # type: ignore

            time_index = pd.to_datetime(ds['time_index'][...].astype(str)) # type: ignore
            meta = pd.DataFrame(ds.meta.data)
            FM_site_index = meta[(meta.county == b'Monongalia') & (meta.elevation == 318)].index[0]
            attrs = [v for k, v in climate_attr_dict.items() if climate == k][0]

            # subset the data at FM site
            if climate in ['solar', 'trh']:
                for att in attrs:
                    subset = ds[att][:, FM_site_index].load()
                    data = pd.DataFrame({f"{att}" : subset}, index = time_index)
                    appended_data.append(data)
            else:
                subset = ds[attrs][:, FM_site_index].load()
                data = pd.DataFrame({f"{attrs}" : subset}, index = time_index)
                appended_data.append(data)
            
            # update the progress bar
            bar()

# concatenate all the data
FM_Sup3rCC_2023_2024 = pd.concat(appended_data)

# rename the attributes with measurements
FM_Sup3rCC_2023_2024.rename({'ghi':"ghi (W/m2)",
                   'dni':'dni (W/m2)', 
                   'dhi':'dhi (W/m2)', 
                   'windspeed_10m': "Windspeed (m/s)", 
                   'temperature_2m': "Temperature (C)", 
                   'relativehumidity_2m': "Relative Humidity (%)",
                   'pressure_0m': "Pressure (hPa)"}, axis=1, inplace=True)

# adjust the scale of measurements
FM_Sup3rCC_2023_2024["Temperature (C)"] = FM_Sup3rCC_2023_2024["Temperature (C)"] / 10000
FM_Sup3rCC_2023_2024["Windspeed (m/s)"] = FM_Sup3rCC_2023_2024["Windspeed (m/s)"] / 10000
FM_Sup3rCC_2023_2024["Relative Humidity (%)"] = FM_Sup3rCC_2023_2024["Relative Humidity (%)"] / 10000

FM_Sup3rCC_2023_2024

---

### Solcast Demonstration
- Fort Martin site: -7 days to present
- 30 mins temporal resolution
- **API Key needed**: https://toolkit.solcast.com.au/register

In [None]:
API_key = "Your_API_Key"

URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/estimated_actuals?format=csv&api_key={API_key}"

FM_SOLCAST_LIVE = pd.read_csv(URL)

FM_SOLCAST_LIVE

In [3]:
# location of the Fort Martin solar site
FM_lat = 39.75
FM_lon = -79.95
hours = 168 # 7 days
period = 'PT5M'
output_parameters = 'air_temp,dni,ghi'
azimuth = 0
tilt = 23
array_type = 'fixed'
format = 'csv'
api_key = 'FEWfo6A2AMBZFkdiUPLTpUIcbtKfWhMX'

url = "https://api.solcast.com.au/data/live/radiation_and_weather?"\
      f"latitude={FM_lat}&longitude={FM_lon}&hours={hours}&period={period}&"\
      f"output_parameters{output_parameters}&azimuth={azimuth}&tilt={tilt}&"\
      f"array_type={array_type}&format={format}&api_key={api_key}"

df = pd.read_csv(url)
df

Unnamed: 0,air_temp,dni,ghi,period_end,period
0,25,0,466,2024-08-16T16:00:00Z,PT5M
1,25,0,437,2024-08-16T15:55:00Z,PT5M
2,25,0,424,2024-08-16T15:50:00Z,PT5M
3,25,0,350,2024-08-16T15:45:00Z,PT5M
4,25,0,315,2024-08-16T15:40:00Z,PT5M
...,...,...,...,...,...
2012,24,0,352,2024-08-09T16:20:00Z,PT5M
2013,24,0,207,2024-08-09T16:15:00Z,PT5M
2014,23,0,107,2024-08-09T16:10:00Z,PT5M
2015,23,0,146,2024-08-09T16:05:00Z,PT5M


- Fort Martin site: present to +7 days
- 30 mins temporal resolution
- **API Key needed**: https://toolkit.solcast.com.au/register

In [None]:
URL = f"https://api.solcast.com.au/weather_sites/388d-25ae-c428-bcf3/forecasts?format=csv&api_key={API_key}"

FM_SOLCAST_FORECAST = pd.read_csv(URL)

FM_SOLCAST_FORECAST

- FM site: 20200101 to 20240805
- 60 mins temporal resolution

In [4]:
# This demonstration is reading a local CSV file as the the data source
file_name = 'Solcast_FortMartin_20200101_20240805.csv'
FM_SOLCAST_HISTORIC = pd.read_csv(file_name)
FM_SOLCAST_HISTORIC = FM_SOLCAST_HISTORIC[ ['period_end'] + [ col for col in FM_SOLCAST_HISTORIC.columns if col != 'period_end' ] ]
FM_SOLCAST_HISTORIC.drop('period', axis=1, inplace=True)
FM_SOLCAST_HISTORIC.rename({'period_end':'time'}, axis = 1, inplace=True)
FM_SOLCAST_HISTORIC['time'] = pd.to_datetime(FM_SOLCAST_HISTORIC.time).dt.strftime('%Y-%m-%d %H:%M:%S')
FM_SOLCAST_HISTORIC['time'] = pd.to_datetime(FM_SOLCAST_HISTORIC.time)

FM_SOLCAST_HISTORIC

Unnamed: 0,time,air_temp,albedo,azimuth,clearsky_dhi,clearsky_dni,clearsky_ghi,clearsky_gti,cloud_opacity,dewpoint_temp,dhi,dni,ghi,gti,precipitable_water,precipitation_rate,relative_humidity,surface_pressure,snow_depth,snow_water_equivalent,snow_soiling_rooftop,snow_soiling_ground,wind_direction_100m,wind_direction_10m,wind_speed_100m,wind_speed_10m,zenith,cape,snowfall_rate,wind_gust
0,2020-01-01 01:00:00,1,0.15,-45,0,0,0,0,23.8,-2.6,0,0,0,0,7.1,0.1,77.4,968.1,0.1,0.0,0,0,257,255,8.6,5.1,158,44,0.0,12.0
1,2020-01-01 02:00:00,1,0.15,-67,0,0,0,0,23.8,-3.0,0,0,0,0,6.5,0.0,76.6,968.5,0.4,0.0,4,3,257,255,8.4,5.0,148,38,0.0,11.4
2,2020-01-01 03:00:00,1,0.15,-81,0,0,0,0,23.8,-3.5,0,0,0,0,5.9,0.0,74.6,968.8,0.6,0.1,8,6,257,256,8.2,4.9,137,34,0.0,11.6
3,2020-01-01 04:00:00,1,0.15,-92,0,0,0,0,23.8,-4.5,0,0,0,0,5.4,0.0,69.0,968.7,0.6,0.1,6,6,261,260,6.9,4.1,126,25,0.0,11.7
4,2020-01-01 05:00:00,1,0.15,-101,0,0,0,0,24.9,-5.3,0,0,0,0,5.1,0.0,64.4,968.9,0.6,0.1,6,4,262,262,5.8,3.4,114,16,0.0,11.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40291,2024-08-05 20:00:00,24,0.15,57,0,0,0,0,5.0,17.3,0,0,0,0,38.4,0.0,67.8,979.3,0.0,0.0,0,0,205,183,4.6,2.0,101,36,0.0,1.9
40292,2024-08-05 21:00:00,23,0.15,45,0,0,0,0,7.1,17.0,0,0,0,0,40.5,0.0,69.8,979.3,0.0,0.0,0,0,202,183,4.8,2.0,110,4,0.0,2.0
40293,2024-08-05 22:00:00,22,0.15,31,0,0,0,0,7.6,16.8,0,0,0,0,42.0,0.0,72.6,979.1,0.0,0.0,0,0,208,194,4.7,1.7,117,0,0.0,2.0
40294,2024-08-05 23:00:00,21,0.15,16,0,0,0,0,5.6,16.6,0,0,0,0,42.2,0.0,75.0,979.0,0.0,0.0,0,0,222,206,4.6,1.7,122,0,0.0,2.1
