## Imports

In [2]:
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler

## Prep the Data

First, we read in the datasets.

In [3]:
wind_df = pd.read_csv("../data/wind.csv")
solar_df = pd.read_csv("../data/solar.csv")

print("Previews of the datasets:")
print(wind_df.head(10))
print("_________________________________")
print(solar_df.head(10))

Previews of the datasets:
   id        lat        long  wind_speed farm_type  capacity  capacity_factor  \
0   0  23.510410 -117.147260        6.07  offshore        16            0.169   
1   1  24.007446  -93.946777        7.43  offshore        16            0.302   
2   2  25.069138  -97.482483        8.19  offshore        16            0.375   
3   3  25.069443  -97.463135        8.19  offshore        16            0.375   
4   4  25.069763  -97.443756        8.19  offshore        16            0.376   
5   5  25.070091  -97.424377        8.19  offshore        16            0.375   
6   6  25.070404  -97.404999        8.19  offshore        16            0.375   
7   7  25.086678  -97.482849        8.18  offshore        16            0.375   
8   8  25.087006  -97.463470        8.19  offshore        16            0.376   
9   9  25.087318  -97.444092        8.19  offshore        16            0.376   

   power_generation  estimated_cost  
0          23687.04        20800000  
1     

Now, we must shuffle the datasets to reduce bias.

In [4]:
wind_df = wind_df.sample(frac=1)
solar_df = solar_df.sample(frac=1)

print("Previews of the shuffled datasets:")
print(wind_df.head(10))
print("_________________________________")
print(solar_df.head(10))

Previews of the shuffled datasets:
          id        lat        long  wind_speed farm_type  capacity  \
16674  16674  34.956932  -97.285461        7.63   onshore        10   
31096  31096  36.798687 -103.728027        8.75   onshore        16   
86303  86303  42.880878 -107.222290        8.94   onshore        16   
97140  97140  44.293499  -96.403290        8.33   onshore        16   
18893  18893  34.831398 -105.516205        8.24   onshore        16   
9247    9247  30.578217  -80.625122        7.21  offshore        16   
55981  55981  40.764744  -89.805542        6.77   onshore        16   
73821  73821  41.626110  -83.305084        7.25   onshore        10   
98080  98080  42.294132  -74.849792        7.20   onshore        12   
6021    6021  29.833126  -90.057617        5.99   onshore        16   

       capacity_factor  power_generation  estimated_cost  
16674            0.439          38456.40        13000000  
31096            0.390          54662.40        20800000  
86303 

Looking at each dataset, we can identify which variables we want to use for our models.

In [5]:
# Wind data
wind_X = wind_df.loc[:, [False, True, True, True, False, True, True, False, False]]
wind_y = wind_df.loc[:, [False, False, False, False, False, False, False, True, True]]

# Solar data
solar_X = solar_df.loc[:, [False, True, True, True, False, True, True, False, False]]
solar_y = solar_df.loc[:, [False, False, False, False, False, False, False, True, True]]

Now we split into training and testing sets, reserving about 80% for training and 20% for testing.

In [6]:
# Wind data
wind_X_train = wind_X[:100000]
wind_X_test = wind_X[100000:]
wind_y_train = wind_y[:100000]
wind_y_test = wind_y[100000:]

# Solar data
solar_X_train = solar_X[:9500]
solar_X_test = solar_X[9500:]
solar_y_train = solar_y[:9500]
solar_y_test = solar_y[9500:]

Some models perform better when inputs are within a certain range, like [-1, 1] for example. We scale the data points appropriately.

In [7]:
scaler = StandardScaler()

# Wind data
scaler.fit(wind_X_train)
wind_X_train = scaler.transform(wind_X_train)
wind_X_test = scaler.transform(wind_X_test)

# Solar data
scaler.fit(solar_X_train)
solar_X_train = scaler.transform(solar_X_train)
solar_X_test = scaler.transform(solar_X_test)

## Training the Models

Now that the data is pre-processed accordingly, the models can be trained and fit. Here we are using one hidden layer with three neurons. This is because we have five inputs and two outputs. A good base sees that the number if hidden layers is one, and the neurons in the layer in the mean of the number of input neurons and output neurons.

In [None]:
# Wind network
wind_reg = MLPRegressor(solver='lbfgs', hidden_layer_sizes=(3,), random_state=0, max_iter=10000000)
wind_reg.fit(wind_X_train,wind_y_train)

# Solar network
solar_reg = MLPRegressor(solver='lbfgs', hidden_layer_sizes=(3,), random_state=0, max_iter=10000000)
solar_reg.fit(solar_X_train,solar_y_train)

## Testing the Models

With trained models, we can now test them and make predictions.

In [9]:
# Wind
wind_test = wind_reg.predict(wind_X_test)
print("Predicted outputs for wind data:")
print(wind_test)
print()

# Solar
solar_test = solar_reg.predict(solar_X_test)
print("Predicted outputs for solar data:")
print(solar_test)

Predicted outputs for wind data:
[[   48471.98659213 20800000.08280838]
 [   27821.83910046 12999999.49475553]
 [   64951.41492174 20799999.9461454 ]
 ...
 [   46267.09931196 20800000.09463431]
 [   37284.20825834 15599999.6789072 ]
 [   60126.02035649 20799999.97129241]]

Predicted outputs for solar data:
[[2.31356074e+01 2.93817331e+04]
 [1.08407627e+04 1.32998520e+07]
 [3.17267784e+01 3.98875099e+04]
 ...
 [4.33695709e+06 5.32000006e+09]
 [1.08418042e+04 1.33000457e+07]
 [3.20134992e+01 3.99386307e+04]]
