To accomplish this task, we will use the Scikit-learn library, which is the most popular machine learning library in Python. This library contains several classification methods, such as logistic regression, support vector machines, decision trees, and neural networks.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.preprocessing import PolynomialFeatures

Our first step will be to load the data from the npy files.

In [2]:
x = np.load("inputs.npy")
y = np.load("labels.npy").ravel()

Next, we will split the data into training and testing sets. The training set will be used to train our chosen model, and the test set will be used to evaluate the final performance of our model.

In [3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

Add polynomial features

In [4]:
poly_features = PolynomialFeatures(degree=2)
x_poly_train = poly_features.fit_transform(x_train)
x_poly_test = poly_features.transform(x_test)

Next, we will choose a suitable model for our task. We will then tune the hyperparameters of the model, such as the regularization parameter and learning rate. We will use the training set to evaluate the model's performance, and the test set to assess its generalization ability.

In [5]:
lin_reg = LinearRegression()
lin_reg_poly = LinearRegression()
forest_reg = RandomForestRegressor(random_state=0)
tree_reg = DecisionTreeRegressor(random_state=0)
svm_reg = SVR()
ridge_reg = Ridge(alpha=0.01)
elastic_reg = ElasticNet(alpha=0.01)

Fit the models

In [None]:
lin_reg.fit(x_train, y_train)
forest_reg.fit(x_train, y_train)
tree_reg.fit(x_train, y_train)
svm_reg.fit(x_train, y_train)
lin_reg_poly.fit(x_poly_train, y_train)
ridge_reg.fit(x_train, y_train)
elastic_reg.fit(x_train, y_train)

Make predictions on the test set

In [7]:
y_pred_lin_reg = lin_reg.predict(x_test)
y_pred_forest_reg = forest_reg.predict(x_test)
y_pred_tree_reg = tree_reg.predict(x_test)
y_pred_svm_reg = svm_reg.predict(x_test)
y_pred_lin_poly_reg = lin_reg_poly.predict(x_poly_test)
y_pred_ridge_reg = ridge_reg.predict(x_test)
y_pred_elastic_reg = elastic_reg.predict(x_test)

Evaluate the models using R2 score

In [8]:
r2_lin_reg = r2_score(y_test, y_pred_lin_reg)
r2_forest_reg = r2_score(y_test, y_pred_forest_reg)
r2_tree_reg = r2_score(y_test, y_pred_tree_reg)
r2_svm_reg = r2_score(y_test, y_pred_svm_reg)
r2_lin_poly_reg = r2_score(y_test, y_pred_lin_poly_reg)
r2_ridge_reg = r2_score(y_test, y_pred_ridge_reg)
r2_elastic_reg = r2_score(y_test, y_pred_elastic_reg)

Print the R2 scores

In [9]:
print("R2 score for Linear Regression:", r2_lin_reg)
print("R2 score for Random Forest Regressor:", r2_forest_reg)
print("R2 score for Decision Tree Regressor:", r2_tree_reg)
print("R2 score for Support Vector Machine Regressor:", r2_svm_reg)
print("R2 score for Linear Regression with Polynomial Features:", r2_lin_poly_reg)
print("R2 score for Ridge Regression:", r2_ridge_reg)
print("R2 score for Elastic Net Regression:", r2_elastic_reg)

R2 score for Linear Regression: 0.8077385461717139
R2 score for Random Forest Regressor: 0.5359454177405851
R2 score for Decision Tree Regressor: 0.2252831575440949
R2 score for Support Vector Machine Regressor: 0.44705646229285234
R2 score for Linear Regression with Polynomial Features: 0.8239102808377585
R2 score for Ridge Regression: 0.8087179499700691
R2 score for Elastic Net Regression: 0.912611435559477


Finally, we have used cross-validation to evaluate the model's R2 score. We expected a R2 score of at least 0.85. We achieved this score with the Elastice Net Regression, indicating that these models are suitable for our task.


We can be confident that our model is ready to predict the amount of electricity produced by the windfarm.