## READ in data API:
This is a better way to access data thought it goes point by point.
API call is functional and this is proof of concept for moving to iterate over all data points!



In [2]:
import sys, os
import pandas as pd
import numpy as np

from IPython.display import display

In [12]:
# Declare all variables as strings. Spaces must be replaced with '+', i.e., change 'John Smith' to 'John+Smith'.
# Define the lat, long of the location and the year
lat, lon, year = 33.2164, -118.2437, 2010 #Long is always negative Los Angeles USA: 33.2164, -118.2437
# You must request an NSRDB api key from the link above
api_key = 'KGQPxhHwezmhBj96AirFO6eKuzfP5DvQI7gFDWOk'
# Set the attributes to extract (e.g., dhi, ghi, etc.), separated by commas.
attributes = 'ghi,air_temperature'
# Choose year of data
year = '2010'
# Set leap year to true or false. True will return leap day data if present, false will not.
leap_year = 'false'
# Set time interval in minutes, i.e., '30' is half hour intervals. Valid intervals are 30 & 60.
interval = '30'
# Specify Coordinated Universal Time (UTC), 'true' will use UTC, 'false' will use the local time zone of the data.
# NOTE: In order to use the NSRDB data in SAM, you must specify UTC as 'false'. SAM requires the data to be in the
# local time zone.
utc = 'false'
# Your full name, use '+' instead of spaces.
your_name = 'andrew+xavier'
# Your reason for using the NSRDB.
reason_for_use = 'beta+testing'
# Your affiliation
your_affiliation = 'columbia+university'
# Your email address
your_email = 'ahx2001@columbia.edu'
# Please join our mailing list so we can keep you up-to-date on new developments.
mailing_list = 'true'

# Declare url string
url = 'https://developer.nrel.gov/api/nsrdb/v2/solar/psm3-download.csv?wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap}&interval={interval}&utc={utc}&full_name={name}&email={email}&affiliation={affiliation}&mailing_list={mailing_list}&reason={reason}&api_key={api}&attributes={attr}'.format(year=year, lat=lat, lon=lon, leap=leap_year, interval=interval, utc=utc, name=your_name, email=your_email, mailing_list=mailing_list, affiliation=your_affiliation, reason=reason_for_use, api=api_key, attr=attributes)
# Return just the first 2 lines to get metadata:
info = pd.read_csv(url, nrows=1)
# See metadata for specified properties, e.g., timezone and elevation
timezone, elevation = info['Local Time Zone'], info['Elevation']


In [13]:
# Return all but first 2 lines of csv to get data:
df = pd.read_csv('https://developer.nrel.gov/api/nsrdb/v2/solar/psm3-download.csv?wkt=POINT({lon}%20{lat})&names={year}&leap_day={leap}&interval={interval}&utc={utc}&full_name={name}&email={email}&affiliation={affiliation}&mailing_list={mailing_list}&reason={reason}&api_key={api}&attributes={attr}'.format(year=year, lat=lat, lon=lon, leap=leap_year, interval=interval, utc=utc, name=your_name, email=your_email, mailing_list=mailing_list, affiliation=your_affiliation, reason=reason_for_use, api=api_key, attr=attributes), skiprows=2)

# Set the time index in the pandas dataframe:
df = df.set_index(pd.date_range('1/1/{yr}'.format(yr=year), freq=interval+'Min', periods=525600/int(interval)))

# take a look
print('shape:', df.shape)
df.head()

shape: (17520, 7)


Unnamed: 0,Year,Month,Day,Hour,Minute,GHI,Temperature
2010-01-01 00:00:00,2010,1,1,0,0,0,15
2010-01-01 00:30:00,2010,1,1,0,30,0,15
2010-01-01 01:00:00,2010,1,1,1,0,0,15
2010-01-01 01:30:00,2010,1,1,1,30,0,15
2010-01-01 02:00:00,2010,1,1,2,0,0,15


## Cleaning the data for modelings
The data is too granular to use as it is. As such, I will generate a new column that calculates the power generation using temperature and GHI levels. Then, I will sum them over each day, then average by month giving average total "power" per day per month. We cannot take the average temperature of each day and the average GHI of each day as seen in the EDA section (the GHI levels and temperature levels do not follow the same patterns.)

### Calculations for Energy:
#### The following is based off the equations for PV output given by Homer: 
https://www.homerenergy.com/products/pro/docs/3.11/how_homer_calculates_the_pv_array_power_output.html

Power is given by:
$$P = Y_{pv}f_{pv}(\frac{G_T}{G_{SD}})[1+\alpha*(T_c - T_{c,STC})]$$

Y_PV = the rated capacity of the PV array, meaning its power output under standard test conditions [kW]

f_PV = the PV derating factor [%]

G_T = the solar radiation incident on the PV array in the current time step [kW/m2]. 

G_T,STC = the incident radiation at standard test conditions [1 kW/m2]

α = the temperature coefficient of power [%/°C]

Tc = the PV cell temperature in the current time step [°C]

Tc,STC = the PV cell temperature under standard test conditions [25°C]

#### Simplify

Since we only care about variation in GHI and temperature, we consider $Y_{pv}$, $f_{pv}$ $G_{T,STC}$ to be constants that will only scale our results and as we only care at relative differences in our final output, we can consider these to be set to 1 for simplicity. As such, our equations simplifies to:

$G_T[1+\alpha*(T_c - T_{c,STC})]$

Note by doing this, we have change to a scalar multiple of the true Power output!

From: https://wiki.openmod-initiative.org/wiki/Standard_test_conditions#:~:text=STC%20is%20an%20industry%2Dwide,5)%20spectrum. 


We know $T_{c,STC}$ is just 25 degrees Celcius
Our equation becomes:
$$G_T[1+\alpha*(T_c - 25\degree C)]$$
$G_T$ is caculated by looking at the radiation incident on the panel which varries by the angle of the sunlight to the panel. However, given that the setup of the solar panel will have either a constant angle or adjust uniformly throughout the day each year and that the sun's path in the sky over each year will follow the same path, we know $G_T$ is proportional to GHI. As such our relative power $P_r$ is calculated by


$$P_r = GHI*[1+\alpha*(T_c - 25\degree C)]$$

Source for $G_T$ from GHI and angles: https://www.homerenergy.com/products/pro/docs/3.11/how_homer_calculates_the_radiation_incident_on_the_pv_array.html

$\alpha$ is given by the manufature of the solar panel and for this example, we will use the value given for Polycrystalline silicon panels: -0.48[%/°C].

Source for $\alpha$ value: https://www.homerenergy.com/products/pro/docs/3.11/pv_temperature_coefficient_of_power.html

The final piece we need is $T_c$. This is the difficult part and will be described below in the Next section

### Getting the temperature of the Cell
From the last section, we know the final piece to our equation to measure the importance of GHI and  ambient temperature is to find the temperature of the cell which is partly a function of the ambient temperature.


28


## Modeling
I use a simple linear regression model to create my predictions. 