Before starting, make sure you import the dataset into google colab by pressing the file button on the left and selecting mergedFile.csv

That file can be downloaded here: https://drive.google.com/file/d/1IVWERcgPYE7YzwltaqrDJAW8VDo2rNcp/view?usp=sharing

Demo and presentation video: https://drive.google.com/file/d/10nr94PDvFLBCVJ3Z-7tShI3FiYYLRd_I/view?usp=sharing

In [None]:
#import essential libraries
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np


In [None]:
# get data from file
dataset = pd.read_csv("mergedFile.csv")
dataset.dropna(inplace = True)

#X is weather inputs, splitting into 3 different sized datasets in order to see how well the different models react to different sets of data
X0 = dataset[['temp', 'windspeed', 'feelslike']]
X1 = dataset[['temp', 'windspeed', 'feelslike', 'humidity']]
X2 = dataset[['temp', 'windspeed', 'feelslike', 'humidity', 'precip', 'windspeed']]
#we could take multiple y's and create different models on each one, but total is most important
y = dataset['GlobalActivePower']

After succesfully loading the data, now you can perform the train/test split and then scale the data so that it is ready for the regression models.

In [None]:
#Split the data into training / test splits
X_train0, X_test0, y_train0, y_test0 = train_test_split(X0, y, test_size=0.2, random_state=11)
X_train1, X_test1, y_train1, y_test1 = train_test_split(X1, y, test_size=0.2, random_state=11)
X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y, test_size=0.2, random_state=11)

#Scale the features
scaler0 = StandardScaler()
X_train_scaled0 = scaler0.fit_transform(X_train0)
X_test_scaled0 = scaler0.transform(X_test0)

scaler1 = StandardScaler()
X_train_scaled1 = scaler1.fit_transform(X_train1)
X_test_scaled1 = scaler1.transform(X_test1)

scaler2 = StandardScaler()
X_train_scaled2 = scaler2.fit_transform(X_train2)
X_test_scaled2 = scaler2.transform(X_test2)


The block below performs the ridge regression. After its done, it outputs the RMSE for each predictor set along with the coefficients that it determined for each predictor.

If curious, you can change the alpha values which affects the learning rate

In [None]:
#0 small dataset
ridge0 = Ridge(alpha=0.01)    #alpha is the regularization strength, we can change this as needed

ridge0.fit(X_train_scaled0, y_train0)

y_pred0 = ridge0.predict(X_test_scaled0)

#1 medium dataset
ridge1 = Ridge(alpha=0.01)

ridge1.fit(X_train_scaled1, y_train1)

y_pred1 = ridge1.predict(X_test_scaled1)

#2 large dataset
ridge2 = Ridge(alpha=0.01)

ridge2.fit(X_train_scaled2, y_train2)

y_pred2 = ridge2.predict(X_test_scaled2)

#find RMSE
rmse0 = np.sqrt(mean_squared_error(y_test0, y_pred0))
rmse1 = np.sqrt(mean_squared_error(y_test1, y_pred1))
rmse2 = np.sqrt(mean_squared_error(y_test2, y_pred2))
print("RMSE0: ", rmse0)
print("RMSE1: ", rmse1)
print("RMSE2: ", rmse2)

#print the coefficients for each predictor
print("\n [temp, windspeed, feelslike]\n", ridge0.coef_)

print("\n ['temp', 'windspeed', 'feelslike', 'humidity']\n", ridge1.coef_)

print("\n ['temp', 'windspeed', 'feelslike', 'humidity', 'precip', 'windspeed']\n", ridge2.coef_)

RMSE0:  51.78426523707939
RMSE1:  51.78538838731269
RMSE2:  51.775649807365

 [temp, windspeed, feelslike]
 [-14.90301928   2.32341238   1.24845071]

 ['temp', 'windspeed', 'feelslike', 'humidity']
 [-12.60148844   2.37009581  -0.44911697   1.14499342]

 ['temp', 'windspeed', 'feelslike', 'humidity', 'precip', 'windspeed']
 [-12.40893367   1.21858853  -0.53020171   1.33637422  -0.86424023
   1.21858853]


The block below performs the lasso regression. After it's done, it will print the RMSE as well as the coefficients for each variable.

Similar to ridge regression, the alpha can also be changed here for limited effect.

In [None]:
#perform lasso regression
lasso0 = Lasso(alpha=0.01)
lasso0.fit(X_train_scaled0, y_train0)
y_predLasso0 = lasso0.predict(X_test_scaled0)

lasso2 = Lasso(alpha=0.01)
lasso2.fit(X_train_scaled2, y_train2)
y_predLasso2 = lasso2.predict(X_test_scaled2)

#print results
print("RMSE0: ",np.sqrt(mean_squared_error(y_test0, y_predLasso0)))
print("['temp', 'windspeed', 'feelslike']\n", lasso0.coef_)

print("\n")
print("RMSE2", np.sqrt(mean_squared_error(y_test2, y_predLasso2)))
print("['temp', 'windspeed', 'feelslike', 'humidity', 'precip', 'windspeed']\n", lasso2.coef_)

RMSE0:  51.78474971568032
['temp', 'windspeed', 'feelslike']
 [-13.65483375   2.2486321    0.        ]


RMSE2 51.77553991437759
['temp', 'windspeed', 'feelslike', 'humidity', 'precip', 'windspeed']
 [-12.48691594   2.26107685  -0.45062647   1.32230615  -0.85181616
   0.16797043]
