# Predicting energy production of windmills using weather forecasts

This project proposes a method of modelling weather forecast data with the express purpose of predicting the amount of energy produced by windmills. We will be using energy data from windmills located in Orkney, Scotland aswell as weather forecast data from the region, specifying windspeeds and directions at a given timestep. 

The windmills in Orkney are in charge of the majority of the energy production in the region, with over 500 windmills. Orkney being a cluster of islands off of the northern coast of Scotland, are subject to a lot of heavy wind making it an ideal place for windmills. In periods of incredibly strong wind and low local demand for energy, the network of windmills in Orkney produce an excess of energy which can be sold off to energy companies. 

Selling off spare energy is an active descision which requires knowing when excess energy will be generated and when to stop, as to not be subjected to fees and tarifs. This makes the ability to correctly predict when energy generation will be high an important fiscal tool. 

We therefore propose to use a varying array of supervised methods of regression to predict future energy generation. 

In [4]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit
from sklearn import svm
from sklearn.preprocessing import Normalizer
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

from sklearn.compose import ColumnTransformer
from windTransformer import WindVectorTransformer
from windData import WindDataCollector
from windTransformer import WindDegreeTransformer


import numpy as np
import math
import pandas as pd

from influxdb import InfluxDBClient # install via "pip install influxdb"

import datetime

In [5]:
def eval_metrics(actual, pred):
	rmse = np.sqrt(mean_squared_error(actual, pred))
	mae = mean_absolute_error(actual, pred)
	r2 = r2_score(actual, pred)
	return rmse, mae, r2


In [6]:
start_time = datetime.datetime(2021, 1, 1, 0, 0, 0).strftime("'%Y-%m-%dT%H:%M:%SZ'")

dataCollector = WindDataCollector()

gen_df = dataCollector.getGenerationData(now = start_time, delta="90")
wind_df = dataCollector.getWindData(now = start_time, delta="90")

gen_df_alligned = pd.merge_asof(wind_df,gen_df,left_index=True, right_index=True)

In [12]:
train_length = int(len(gen_df_alligned)*0.9)

train_X = wind_df.iloc[:train_length][[
    "Direction",
    "Speed",
]]
test_X = wind_df.iloc[train_length:][[
    "Direction",
    "Speed",
]]

norm = Normalizer().fit(train_X)
train_X = norm.transform(train_X)
text_X = norm.transform(test_X)

train_y = gen_df_alligned.iloc[:train_length]["Total"]
test_y = gen_df_alligned.iloc[train_length:]["Total"]


ValueError: could not convert string to float: 'SSW'

## MLP

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

In [None]:
pipeline = Pipeline(steps=[
    ("column_transformers", ColumnTransformer([
       ("direction_degree_transformer", WindDegreeTransformer(), ["Direction"]),
    #    ("polynomial_features", PolynomialFeatures() , ["Speed"])
    ])),
	("MLP_model", MLPRegressor())
])
parameters = {
    'MLP_model__hidden_layer_sizes':[(32,16)],
    'MLP_model__activation':['relu'],
    'MLP_model__solver':['sgd'],
    'MLP_model__learning_rate': ["constant"],
    'MLP_model__learning_rate_init':[0.01],
    'MLP_model__batch_size':[32],
    'MLP_model__max_iter':[80],
}

In [None]:
tscv = TimeSeriesSplit(n_splits=5)
pipeline = GridSearchCV(pipeline, param_grid=parameters, n_jobs=15, cv= tscv)

pipeline.fit(train_X, np.ravel(train_y))

bestParams = pipeline.best_params_

predicted_qualities = pipeline.predict(test_X)

(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

print(bestParams)
print("  RMSE: %s" % rmse)
print("  MAE: %s" % mae)
print("  R2: %s" % r2)

ValueError: 
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 392, in _get_column_indices
    all_columns = X.columns
AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/pipeline.py", line 378, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/pipeline.py", line 336, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/joblib/memory.py", line 349, in __call__
    return self.func(*args, **kwargs)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/pipeline.py", line 870, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 687, in fit_transform
    self._validate_column_callables(X)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 374, in _validate_column_callables
    transformer_to_input_indices[name] = _get_column_indices(X, columns)
  File "/Users/Thomas/opt/anaconda3/envs/wild/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 394, in _get_column_indices
    raise ValueError(
ValueError: Specifying the columns using strings is only supported for pandas DataFrames


In [58]:
print(pipeline.score(test_X, test_y))

-0.017389418604520035
