# Predicting energy production of windmills using weather forecasts

This project proposes a method of modelling weather forecast data with the express purpose of predicting the amount of energy produced by windmills. We will be using energy data from windmills located in Orkney, Scotland aswell as weather forecast data from the region, specifying windspeeds and directions at a given timestep. 

The windmills in Orkney are in charge of the majority of the energy production in the region, with over 500 windmills. Orkney being a cluster of islands off of the northern coast of Scotland, are subject to a lot of heavy wind making it an ideal place for windmills. In periods of incredibly strong wind and low local demand for energy, the network of windmills in Orkney produce an excess of energy which can be sold off to energy companies. 

Selling off spare energy is an active descision which requires knowing when excess energy will be generated and when to stop, as to not be subjected to fees and tarifs. This makes the ability to correctly predict when energy generation will be high an important fiscal tool. 

We therefore propose to use a varying array of supervised methods of regression to predict future energy generation. 

In [2]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit
from sklearn import svm
from sklearn.preprocessing import Normalizer
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline


from windTransformer import WindVectorTransformer
from windData import WindDataCollector


import numpy as np
import math
import pandas as pd

from influxdb import InfluxDBClient # install via "pip install influxdb"

import datetime



In [15]:
def eval_metrics(actual, pred):
	rmse = np.sqrt(mean_squared_error(actual, pred))
	mae = mean_absolute_error(actual, pred)
	r2 = r2_score(actual, pred)
	return rmse, mae, r2


In [9]:
dataCollector = WindDataCollector()

gen_df = dataCollector.getGenerationData(now = datetime.datetime.utcnow().strftime("'%Y-%m-%dT%H:%M:%SZ'"), delta="90")
wind_df = dataCollector.getWindData(now = datetime.datetime.utcnow().strftime("'%Y-%m-%dT%H:%M:%SZ'"), delta="90")

gen_df_alligned = pd.merge_asof(wind_df,gen_df,left_index=True, right_index=True)[["Total"]]

In [10]:
train_length = int(len(gen_df_alligned)*0.9)

train_X = wind_df.iloc[:train_length]
test_X = wind_df.iloc[train_length:]

train_y = gen_df_alligned.iloc[:train_length]
test_y = gen_df_alligned.iloc[train_length:]


## Support Vector Machines CAAP

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

In [11]:
pipeline = Pipeline(steps=[
	("WindVector_transform",WindVectorTransformer()),
	("svm_model", svm.SVR())
])
parameters = {
    'svm_model__kernel':["rbf"],
    'svm_model__C':[1.0],
    'svm_model__gamma':[0.01, 0.1, 0.2, 0.5]
}

In [12]:
tscv = TimeSeriesSplit(n_splits=5)
pipeline = GridSearchCV(pipeline, param_grid=parameters, n_jobs=15, cv= tscv)

In [13]:
pipeline.fit(train_X, np.ravel(train_y))

bestParams = pipeline.best_params_

predicted_qualities = pipeline.predict(test_X)

In [16]:
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print("SVR model (gamma={}, kernel={}, C={})".format(bestParams["svm_model__gamma"], bestParams["svm_model__kernel"], bestParams["svm_model__C"]))
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)


SVR model (gamma=0.01, kernel=rbf, C=1.0)
  RMSE: 4.429316821775306
  MAE: 3.431138703780933
  R2: 0.7945538736046037


## Linear regression

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

In [17]:
pipeline = Pipeline(steps=[
	("WindVector_transform",WindVectorTransformer()),
	("linear_model", LinearRegression())
])
parameters = {
    'linear_model__fit_intercept':[True, False],
    'linear_model__normalize':[True, False],
}

In [18]:
tscv = TimeSeriesSplit(n_splits=5)
pipeline = GridSearchCV(pipeline, param_grid=parameters, n_jobs=15, cv= tscv)

pipeline.fit(train_X, np.ravel(train_y))

bestParams = pipeline.best_params_

predicted_qualities = pipeline.predict(test_X)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print(bestParams)
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)

{'linear_model__fit_intercept': True, 'linear_model__normalize': True}
  RMSE: 9.998957599932082
  MAE: 8.933681670708502
  R2: -0.046969232705093455


## MLP

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

In [19]:
pipeline = Pipeline(steps=[
	("WindVector_transform",WindVectorTransformer()),
	("MLP_model", MLPRegressor())
])
parameters = {
    'MLP_model__activation':['identity', 'logistic', 'tanh', 'relu'],
    'MLP_model__solver':['lbfgs', 'sgd', 'adam'],
    'MLP_model__learning_rate': ["constant"],
    'MLP_model__learning_rate_init':[0.001, 0.01, 0.1, 0.2, 0.5],
    'MLP_model__max_iter':[10, 50, 100, 200, 500]
}

In [20]:
tscv = TimeSeriesSplit(n_splits=5)
pipeline = GridSearchCV(pipeline, param_grid=parameters, n_jobs=15, cv= tscv)

pipeline.fit(train_X, np.ravel(train_y))

bestParams = pipeline.best_params_

predicted_qualities = pipeline.predict(test_X)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print(bestParams)
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').