# HI-SEAS Solar Insolation Non Linear Support Vector Regression Model

Author: Abhipray Sahoo
Date: 04/30/2017

### Goal: 
Estimate the solar radiation incident on the ground at HI-SEAS in order to predict power generated by solar panels. The model should be able to give best estimate for the solar radiation given meterological conditions. 

### Dataset:
A - NASA's HI-SEAS meterological data and solar irradiance. 

B - Additional climate data from Dark Sky API for the same dates as A

Uncomment cells to re-activate


## Load NASA dataset

In [4]:
%matplotlib inline

import os
import pandas as pd
import numpy as np
import seaborn as sns

DATA_PATH = '../hi-seas-data'

X_aug = pd.read_pickle('nasa_forecast_combined.pkl')

In [5]:
# Drop any nans 
X_aug = X_aug.dropna()
X_aug.describe()

Unnamed: 0,id,unix_secs,date,time,irradiance,speed,humidity,temperature,direction,pressure,humidity_fc,windSpeed_fc,windBearing_fc,precipIntensity_fc,precipProbability_fc,pressure_fc,visibility_fc,cloudCover_fc
count,32007,32007,32007,32007,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007,32007.0,32007.0,32007.0,32007.0,32007.0
unique,32007,32007,120,8181,14111.0,37.0,94.0,38.0,17668.0,37.0,68.0,848.0,356,173.0,69.0,779.0,200.0,67.0
top,33132,1478754303,2016-12-11,16:20:18,1.22,5.62,1.01,45.0,0.11,749.3146,0.82,2.6,153,0.0,0.0,1015.5,10.0,0.31
freq,1,1,288,24,2199.0,4567.0,1959.0,2801.0,93.0,4567.0,1174.0,161.0,333,22866.0,22866.0,178.0,22626.0,6648.0


## Learn Non Linear Support Vector Regression

From the correlational analysis, only temperature, wind direction and cloud coverage have some linear correlation with irradiance.

1. Per feature scaling between min max
2. PCA on training data to maximize variance
3. Train a linear SVR model

In [26]:
from sklearn.svm import SVR
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA

tmp = X_aug.dropna()
print(tmp.shape)
y = np.array(tmp['irradiance'])
X = np.array(tmp[['temperature', 'direction', 'cloudCover_fc']]).astype(np.float)

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=9)

# Perform PCA on training data
pca = PCA()
X_pca = pca.fit_transform(X_train)

svr = SVR(kernel='poly', degree=6)
svr.fit(X_pca, y_train)

(32007, 18)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=6, epsilon=0.1, gamma='auto',
  kernel='poly', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [27]:
X_test_pca = pca.transform(X_test)
r2 = svr.score(X_test_pca, y_test)
y_pred = svr.predict(X_test_pca)
mse = mean_squared_error(y_pred, y_test)

print('Mean Squared Error: {}\nR2 coefficient: {}'.format(mse, r2))

Mean Squared Error: 145492.98546414357
R2 coefficient: -0.4249211643846862


In [34]:
# Five random example predictions
for i in range(5):
    idx = np.random.randint(X_test_pca.shape[0])
    ex_pred = svr.predict(X_test_pca[idx][None, :])[0]
    expected = y_test[idx]
    print("[{}] [{}] Predicted: {} Expected: {}".format(i, idx, ex_pred, expected))

[0] [7866] Predicted: 2.5801687986805852 Expected: 854.21
[1] [5127] Predicted: 2.5799998945207596 Expected: 3.26
[2] [4771] Predicted: 2.57999992394791 Expected: 1.23
[3] [211] Predicted: 2.580104353119218 Expected: 355.71
[4] [3257] Predicted: 2.5812482162468813 Expected: 8.09
