# PART 4 : Prediction of the amount of electricity produced

We would like to predict the amount of electricity produced by a windfarm, as a
function of the information gathered in a number of physical sensors (e.g. speed of
the wind, temperature, ...).

In [14]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.metrics import r2_score

Loading data and standardization

In [15]:
X_train = np.load('X_train.npy')
X_test = np.load('X_test.npy')
y_train = np.load('y_train.npy')
y_test = np.load('y_test.npy')

std = StandardScaler()
X_train_std = std.fit_transform(X_train)
X_test_std = std.transform(X_test)

Our 1st model is going to be the ridge linear model.
We are going to use the RandomizedSearchCV to find the best hyperparameters for our model

In [16]:
def ridge_regression(X_train, y_train, X_test, y_test):
    random_search = RandomizedSearchCV(Ridge(),
                                        {'alpha': np.logspace(-4, 4, 50)},
                                        n_iter=50,
                                        scoring='r2',
                                        random_state=2804)
    random_search.fit(X_train, y_train)
    best_model = random_search.best_estimator_
    y_test_pred = best_model.predict(X_test)

    test_r2_score = r2_score(y_test, y_test_pred)
    return best_model, test_r2_score

In [None]:
b_ridge, ridge_r2 = ridge_regression(X_train_std, y_train, X_test_std, y_test)

print(f"Ridge Regression Test R^2 Score: {ridge_r2}")

### Interpretation of results of Ridge method

We obtain a score of 0.59. This is a lot less than what we wanted (0.85).

In [None]:
def lasso_regression(X_train, y_train, X_test, y_test):
    grid_search = RandomizedSearchCV(Lasso(max_iter=10000),
                                {'alpha': np.logspace(-4, -1, 15)},
                                n_iter=15,
                                scoring='r2',
                                random_state=2804)
    grid_search.fit(X_train, y_train)
    best_model = grid_search.best_estimator_
    y_test_pred = best_model.predict(X_test)
    
    test_r2_score = r2_score(y_test, y_test_pred)
    return best_model, test_r2_score

In [None]:
b_lasso, lasso_r2 = lasso_regression(X_train_std, y_train, X_test_std, y_test)

print(f"Lasso Regression Test R^2 Score: {lasso_r2}")

### Interpretation of results of Lasso method and comparison

We obtain a score of 0.88. This is a lot more than the ridge method.
We can explain this difference because the Lasso method select the most relevant features for predicting the target variable.
This difference in performance highlights the importance of choosing the appropriate method.