# NAME: OLUWATOMIWA ADELEKE
# HAMOYE_ID: 157090959641f000

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

It has 12 primary predictive features and two dependent variables.

Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');
Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

##### Importing Libaries

In [34]:
# importing libaries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import requests

%matplotlib inline

In [35]:
# load in the dataset into a pandas dataframe
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/00471/Data_for_UCI_named.csv')

In [36]:
#checking out few informations about the data using head functions
df.head(10)

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable
5,6.999209,9.109247,3.784066,4.267788,4.429669,-1.857139,-0.670397,-1.902133,0.261793,0.07793,0.542884,0.469931,-0.017385,stable
6,6.710166,3.765204,6.929314,8.818562,2.397419,-0.61459,-1.208826,-0.574004,0.17789,0.397977,0.402046,0.37663,0.005954,unstable
7,6.953512,1.379125,5.7194,7.870307,3.224495,-0.748998,-1.186517,-1.28898,0.371385,0.633204,0.732741,0.380544,0.016634,unstable
8,4.689852,4.007747,1.478573,3.733787,4.0413,-1.410344,-1.238204,-1.392751,0.269708,0.250364,0.164941,0.482439,-0.038677,stable
9,9.841496,1.413822,9.769856,7.641616,4.727595,-1.991363,-0.857637,-1.878594,0.376356,0.544415,0.792039,0.116263,0.012383,unstable


In [37]:
#checking out few informations about the data randomly using sample functions
df.sample(10)

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
576,0.9459,8.989633,6.393949,1.248278,4.135918,-0.517693,-1.958675,-1.659551,0.639012,0.549927,0.473618,0.115896,-0.021719,stable
3838,1.115596,7.308295,2.255499,6.057037,3.177657,-1.94427,-0.596276,-0.63711,0.654084,0.190668,0.753781,0.167548,-0.025272,stable
5327,9.54868,2.960753,1.315809,4.098552,3.23235,-1.691573,-0.90097,-0.639806,0.78475,0.113063,0.905644,0.149287,-0.035123,stable
8867,4.181031,3.134867,6.518596,8.740066,3.590285,-1.50752,-0.95205,-1.130714,0.805201,0.361625,0.736716,0.981867,0.070391,unstable
1256,5.635358,3.450787,3.907925,9.878558,3.713517,-1.666307,-0.522393,-1.524816,0.371773,0.840958,0.615648,0.74725,0.034932,unstable
5800,8.396269,8.660746,2.178458,2.963568,4.713401,-1.432445,-1.431356,-1.8496,0.970997,0.917364,0.66157,0.944067,0.054221,unstable
8247,6.653486,4.596156,3.994691,9.61907,4.44956,-1.731592,-1.98524,-0.732728,0.30956,0.323635,0.799119,0.702422,0.035135,unstable
3909,7.96691,7.938467,7.390104,9.480514,3.122507,-1.542557,-0.64089,-0.93906,0.303979,0.641188,0.908932,0.791567,0.081697,unstable
6039,9.568972,8.795044,0.732851,9.421741,4.082692,-1.861903,-0.566627,-1.654163,0.914798,0.556697,0.217123,0.173896,0.037633,unstable
9703,3.857986,2.646076,5.912322,9.307202,2.557142,-0.709996,-0.675244,-1.171902,0.435257,0.905199,0.513951,0.930928,0.038672,unstable


In [38]:
df.shape

(10000, 14)

In [39]:
# checking the overall information about the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [40]:
#  descriptive statistics for numeric variables
df.describe()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5.25,5.250001,5.250004,5.249997,3.75,-1.25,-1.25,-1.25,0.525,0.525,0.525,0.525,0.015731
std,2.742548,2.742549,2.742549,2.742556,0.75216,0.433035,0.433035,0.433035,0.274256,0.274255,0.274255,0.274255,0.036919
min,0.500793,0.500141,0.500788,0.500473,1.58259,-1.999891,-1.999945,-1.999926,0.050009,0.050053,0.050054,0.050028,-0.08076
25%,2.874892,2.87514,2.875522,2.87495,3.2183,-1.624901,-1.625025,-1.62496,0.287521,0.287552,0.287514,0.287494,-0.015557
50%,5.250004,5.249981,5.249979,5.249734,3.751025,-1.249966,-1.249974,-1.250007,0.525009,0.525003,0.525015,0.525002,0.017142
75%,7.62469,7.624893,7.624948,7.624838,4.28242,-0.874977,-0.875043,-0.875065,0.762435,0.76249,0.76244,0.762433,0.044878
max,9.999469,9.999837,9.99945,9.999443,5.864418,-0.500108,-0.500072,-0.500025,0.999937,0.999944,0.999982,0.99993,0.109403


In [41]:
# checking for duplicates value
df.duplicated().sum()

0

In [42]:
# checking for missing values
df.isnull().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [43]:
#stab column will be dropped due to direct relationship with stabf
# stabf will remain as the sole dependent variable
df.drop('stab', axis=1, inplace=True)

In [44]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stabf   10000 non-null  object 
dtypes: float64(12), object(1)
memory usage: 1015.8+ KB


###### Preprocessing

In [45]:
# assigning x and y values
x = df.drop(columns= 'stabf' ) 
y = df[ 'stabf' ] 

In [46]:
#splitting the dataset into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2 , random_state= 1 )

In [47]:
# Using Standard scaler to transform the train and test set
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
norm_train_df = scaler.fit_transform(x_train)
norm_train_df = pd.DataFrame(norm_train_df, columns = x_train.columns)

norm_test_df = scaler.transform(x_test)
norm_test_df = pd.DataFrame(norm_test_df, columns=x_test.columns)

###### MOdel Selection

###### Question 14
What is the accuracy on the test set using the random forest classifier? In 4 decimal places.

In [19]:
# using random forest classifier
from sklearn.ensemble import RandomForestClassifier
r_forest = RandomForestClassifier(random_state=1)
r_forest.fit(norm_train_df, y_train)
r_forest_pred = r_forest.predict(norm_test_df)

In [20]:
# checking the accurarcy of random forest classifier
from sklearn.metrics import classification_report
print(classification_report(y_test, r_forest_pred, digits=4))

              precision    recall  f1-score   support

      stable     0.9191    0.8778    0.8980       712
    unstable     0.9341    0.9573    0.9456      1288

    accuracy                         0.9290      2000
   macro avg     0.9266    0.9176    0.9218      2000
weighted avg     0.9288    0.9290    0.9286      2000



In [21]:
#extra tree classifier
from sklearn.ensemble import ExtraTreesClassifier
tree = ExtraTreesClassifier(random_state=1)
tree.fit(norm_train_df, y_train)
tree_pred = tree.predict(norm_test_df)

In [22]:
#using the classifiation report for extra tree
from sklearn.metrics import classification_report
print(classification_report(y_test, tree_pred, zero_division=True, digits=6))

              precision    recall  f1-score   support

      stable   0.940994  0.851124  0.893805       712
    unstable   0.921829  0.970497  0.945537      1288

    accuracy                       0.928000      2000
   macro avg   0.931411  0.910810  0.919671      2000
weighted avg   0.928652  0.928000  0.927121      2000



In [24]:
#xgboost
from xgboost import XGBClassifier
extreme = XGBClassifier(random_state=1)
extreme.fit(norm_train_df, y_train)
extreme_pred = extreme.predict(norm_test_df)

In [25]:
#Classification
print(classification_report(y_test, extreme_pred, digits=4))

              precision    recall  f1-score   support

      stable     0.9206    0.8469    0.8822       712
    unstable     0.9190    0.9596    0.9389      1288

    accuracy                         0.9195      2000
   macro avg     0.9198    0.9033    0.9105      2000
weighted avg     0.9195    0.9195    0.9187      2000



In [26]:
#lightgbm
from lightgbm import LGBMClassifier
light = LGBMClassifier(random_state=1)
light.fit(norm_train_df, y_train)
light_pred = light.predict(norm_test_df)

In [27]:
#Classification
print(classification_report(y_test, light_pred, digits=4))

              precision    recall  f1-score   support

      stable     0.9297    0.8919    0.9104       712
    unstable     0.9415    0.9627    0.9520      1288

    accuracy                         0.9375      2000
   macro avg     0.9356    0.9273    0.9312      2000
weighted avg     0.9373    0.9375    0.9372      2000



In [28]:
#Hyperparameters

n_estimators = [50, 100, 300, 500, 1000]
min_samples_split = [2, 3, 5, 7, 9]
min_samples_leaf = [1, 2, 4, 6, 8]
max_features = ['auto', 'sqrt', 'log2', None] 
hyperparameter_grid = {'n_estimators': n_estimators,
                       'min_samples_leaf': min_samples_leaf,
                       'min_samples_split': min_samples_split,
                       'max_features': max_features}

In [30]:
#Randomised Search Cross Validation
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import ExtraTreesClassifier

tree2 = ExtraTreesClassifier(random_state=1)
clf = RandomizedSearchCV(tree2, hyperparameter_grid, cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1, random_state=1)
search_result = clf.fit(norm_train_df, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [31]:
#checking for the best parameter for the model
search_result.best_params_

{'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 8,
 'max_features': None}

In [32]:
#experimenting with this parameter to test the model's performance
tuned_tree = ExtraTreesClassifier(n_estimators=1000, min_samples_split=2, 
                                 min_samples_leaf=8, max_features=None, random_state=1)
tuned_tree.fit(norm_train_df, y_train)
tuned_tree_pred = tuned_tree.predict(norm_test_df)

In [33]:
#classification report for this hyperparameter tuning
print(classification_report(y_test, tuned_tree_pred, digits=4))


              precision    recall  f1-score   support

      stable     0.9211    0.8694    0.8945       712
    unstable     0.9300    0.9589    0.9442      1288

    accuracy                         0.9270      2000
   macro avg     0.9256    0.9141    0.9193      2000
weighted avg     0.9268    0.9270    0.9265      2000

