# HI-SEAS Solar Insolation Model

Author: Abhipray Sahoo
Date: 04/30/2017

### Goal: 
Estimate the solar radiation incident on the ground at HI-SEAS in order to predict power generated by solar panels. The model should be able to give best estimate for the solar radiation given meterological conditions. 

### Dataset:
A - NASA's HI-SEAS meterological data and solar irradiance. 

B - Additional climate data from Dark Sky API for the same dates as A

Uncomment cells to re-activate


## Load NASA dataset

In [6]:
%matplotlib inline

import os
import pandas as pd
import numpy as np
import seaborn as sns

DATA_PATH = '../hi-seas-data'

X_aug = pd.read_pickle('nasa_forecast_combined.pkl')

In [7]:
# Drop any nans 
X_aug = X_aug.dropna()
X_aug.describe()

Unnamed: 0,id,unix_secs,date,time,irradiance,speed,humidity,temperature,direction,pressure,humidity_fc,windSpeed_fc,windBearing_fc,precipIntensity_fc,precipProbability_fc,pressure_fc,visibility_fc,cloudCover_fc
count,32007,32007,32007,32007,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007,32007.0,32007.0,32007.0,32007.0,32007.0
unique,32007,32007,120,8181,14111.0,37.0,94.0,38.0,17668.0,37.0,68.0,848.0,356,173.0,69.0,779.0,200.0,67.0
top,33132,1478754303,2016-12-25,16:20:18,1.22,5.62,1.01,45.0,0.11,749.3146,0.82,2.6,153,0.0,0.0,1015.5,10.0,0.31
freq,1,1,288,24,2199.0,4567.0,1959.0,2801.0,93.0,4567.0,1174.0,161.0,333,22866.0,22866.0,178.0,22626.0,6648.0


## Learn Linear Regression with only correlated features

From the correlational analysis, only temperature, wind direction and cloud coverage have some linear correlation with irradiance.

1. Per feature scaling between min max
2. PCA on training data to increase variance
3. Train a linear regression model

In [42]:
# Observation: Cloud coverage doesn't seem to correlate much? Apparently Hawaii is cloudy 
# for large parts of the year. Also depends a lot on cloud type? Some clouds reflect a lot of the
# light and performance actually increases. 
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.decomposition import PCA

tmp = X_aug.dropna()
print(tmp.shape)
y = np.array(tmp['irradiance']).astype(np.float)
X = np.array(tmp[['temperature', 'direction', 'cloudCover_fc', 'precipIntensity_fc']]).astype(np.float)

x_scaler = MinMaxScaler()
# y_scaler = MinMaxScaler()

X_scaled = x_scaler.fit_transform(X)
# y_scaled = y_scaler.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25)

# Perform PCA on training data
pca = PCA()
X_pca = pca.fit_transform(X_train)

linreg = LinearRegression()
linreg.fit(X_pca, y_train)

(32007, 18)


LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [43]:
# Test and Evaluate
X_test_pca = pca.transform(X_test)
r2 = linreg.score(X_test_pca, y_test)
y_pred = linreg.predict(X_test_pca)
mse = mean_squared_error(y_pred, y_test)

print('Mean Squared Error: {}\nR2 coefficient: {}'.format(mse, r2))

Mean Squared Error: 47803.197936858305
R2 coefficient: 0.5325002168533792


In [52]:
## Save the model for inference
from sklearn.externals import joblib
from sklearn.pipeline import Pipeline

pipe = Pipeline([('pca', pca), ('linear regression', linreg)])
joblib.dump(pipe, 'linear_reg.pkl', compress=1)

['linear_reg.pkl']

In [49]:
# Random example predictions
n_example = 15
for i in range(n_example):
    idx = np.random.randint(X_test.shape[0])
    ex_pred = pipe.predict(X_test[idx][None, :])
    expected = y_test[:, None][idx]
    print("[{}] [{}] Predicted: {} Expected: {}".format(i, idx, ex_pred, expected))

[0] [3100] Predicted: [ 223.19948706] Expected: [ 1.25]
[1] [1657] Predicted: [ 181.66884277] Expected: [ 7.51]
[2] [7292] Predicted: [-62.90998584] Expected: [ 1.18]
[3] [3045] Predicted: [ 69.14352426] Expected: [ 1.21]
[4] [1820] Predicted: [ 281.06801749] Expected: [ 28.09]
[5] [6667] Predicted: [ 583.37835899] Expected: [ 518.69]
[6] [2809] Predicted: [-201.51811033] Expected: [ 1.25]
[7] [5834] Predicted: [ 773.00362441] Expected: [ 936.9]
[8] [6391] Predicted: [-97.12594529] Expected: [ 1.22]
[9] [2995] Predicted: [-53.50810466] Expected: [ 117.96]
[10] [6992] Predicted: [ 98.3956896] Expected: [ 725.98]
[11] [1753] Predicted: [ 21.738586] Expected: [ 1.23]
[12] [5949] Predicted: [ 588.83024439] Expected: [ 154.37]
[13] [4470] Predicted: [ 450.10153327] Expected: [ 196.09]
[14] [5381] Predicted: [ 659.79872669] Expected: [ 833.32]


In [55]:
model = joblib.load('linear_reg.pkl')

def get_solar_radiation_linreg(temperature, direction, cloud_cover, precip_intensity):
    # Make into feature vector
    feat = [temperature, direction, cloud_cover, precip_intensity]
    y_pred = model.predict(feat)[0]
    if y_pred < 0:
        y_pred = 1.1
    return y_pred

# Example call
get_solar_radiation_linreg(*X_test[0])    



218.66214604120407