# HI-SEAS Solar Insolation RNN Model

Author: Abhipray Sahoo
Date: 04/30/2017

### Goal: 
Estimate the solar radiation incident on the ground at HI-SEAS in order to predict power generated by solar panels. The model should be able to give best estimate for the solar radiation given meterological conditions. 

### Dataset:
A - NASA's HI-SEAS meterological data and solar irradiance. 

B - Additional climate data from Dark Sky API for the same dates as A

Uncomment cells to re-activate


## Load NASA dataset

In [3]:
%matplotlib inline

import os
import pandas as pd
import numpy as np
import seaborn as sns

DATA_PATH = '../hi-seas-data'

X_aug = pd.read_pickle('nasa_forecast_combined.pkl')

In [4]:
# Drop any nans 
X_aug = X_aug.dropna()
X_aug.describe()

Unnamed: 0,id,unix_secs,date,time,irradiance,speed,humidity,temperature,direction,pressure,humidity_fc,windSpeed_fc,windBearing_fc,precipIntensity_fc,precipProbability_fc,pressure_fc,visibility_fc,cloudCover_fc
count,32007,32007,32007,32007,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007.0,32007,32007.0,32007.0,32007.0,32007.0,32007.0
unique,32007,32007,120,8181,14111.0,37.0,94.0,38.0,17668.0,37.0,68.0,848.0,356,173.0,69.0,779.0,200.0,67.0
top,33132,1478754303,2016-11-07,16:20:18,1.22,5.62,1.01,45.0,0.11,749.3146,0.82,2.6,153,0.0,0.0,1015.5,10.0,0.31
freq,1,1,288,24,2199.0,4567.0,1959.0,2801.0,93.0,4567.0,1174.0,161.0,333,22866.0,22866.0,178.0,22626.0,6648.0


## Train small Fully Connected Network

1. Per feature scaling between min max
2. Train a fully connected network

In [28]:
from sklearn.svm import SVR
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA

tmp = X_aug.dropna()
print(tmp.shape)
y = np.array(tmp['irradiance'])
X = np.array(tmp.drop(['irradiance', 'date', 'unix_secs', 'id', 'time'], 1)).astype(np.float)
# X = np.array(tmp[['temperature', 'direction', 'cloudCover_fc']]).astype(np.float)

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=9)

(32007, 18)


In [29]:
from keras.models import Sequential
from keras.layers import Dense, Input, Dropout
from keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=20, verbose=0, mode='auto')

model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=(X.shape[1], )))
model.add(Dense(1024, activation='relu', input_shape=(X.shape[1], )))
model.add(Dropout(0.2))
model.add(Dense(1, activation='relu'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse'])
model.fit(X_scaled, y, validation_split=0.1, shuffle=True, batch_size=32, epochs=1000,
         callbacks=[early_stopping])

Train on 28806 samples, validate on 3201 samples
Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000


<keras.callbacks.History at 0x127c1d5c0>

In [30]:
model.evaluate(X_test, y_test)



[29994.166870391779, 29994.166870391779]

In [31]:
print(model.predict(X_train[0][None,:]))
print(y[0])

[[ 0.]]
1.27


In [32]:
# Five random example predictions
n_examples = 15
for i in range(n_examples):
    idx = np.random.randint(X_test.shape[0])
    ex_pred = model.predict(X_test[idx][None, :])[0]
    expected = y_test[idx]
    print("[{}] [{}] Predicted: {} Expected: {}".format(i, idx, ex_pred, expected))

[0] [7900] Predicted: [ 622.67474365] Expected: 862.66
[1] [3128] Predicted: [ 0.] Expected: 1.24
[2] [1385] Predicted: [ 366.76721191] Expected: 98.76
[3] [6393] Predicted: [ 689.703125] Expected: 792.96
[4] [4265] Predicted: [ 141.3946991] Expected: 312.47
[5] [3877] Predicted: [ 17.78536415] Expected: 1.21
[6] [2566] Predicted: [ 182.32449341] Expected: 170.04
[7] [584] Predicted: [ 671.10797119] Expected: 588.05
[8] [7546] Predicted: [ 0.] Expected: 1.23
[9] [5879] Predicted: [ 660.17358398] Expected: 201.05
[10] [3819] Predicted: [ 513.12811279] Expected: 365.58
[11] [6030] Predicted: [ 0.] Expected: 1.24
[12] [1138] Predicted: [ 841.75317383] Expected: 556.59
[13] [884] Predicted: [ 0.] Expected: 1.24
[14] [4451] Predicted: [ 135.06750488] Expected: 1.99


## Model the Time Series using RNN

In [None]:
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back), 0]
        dataX.append(a)
        dataY.append(dataset[i + look_back, 0])
    return numpy.array(dataX), numpy.array(dataY)