Stability of the Grid System (Classification Project)

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

It has 12 primary predictive features and two dependent variables

Predictive features:

1.'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);

2.'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);

3.'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

Dependent variables:

1.'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);

2.'stabf': a categorical (binary) label ('stable' or 'unstable').

In [48]:
import numpy as np
import pandas as pd

In [49]:
df = pd.read_csv('Data_for_UCI_named.csv')

In [50]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [51]:
df.isna().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

In [52]:
# dropping stab
df.drop('stab', axis = 1, inplace = True)

In [53]:
df['stabf'].value_counts()

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [54]:
df

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.959060,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.781760,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.277210,-0.920492,0.163041,0.766689,0.839444,0.109853,unstable
3,0.716415,7.669600,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.797110,0.455450,0.656947,0.820923,unstable
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,2.930406,9.487627,2.376523,6.187797,3.343416,-0.658054,-1.449106,-1.236256,0.601709,0.779642,0.813512,0.608385,unstable
9996,3.392299,1.274827,2.954947,6.894759,4.349512,-1.663661,-0.952437,-1.733414,0.502079,0.567242,0.285880,0.366120,stable
9997,2.364034,2.842030,8.776391,1.008906,4.299976,-1.380719,-0.943884,-1.975373,0.487838,0.986505,0.149286,0.145984,stable
9998,9.631511,3.994398,2.757071,7.821347,2.514755,-0.966330,-0.649915,-0.898510,0.365246,0.587558,0.889118,0.818391,unstable


In [55]:
# feature variable
X = df.drop('stabf', axis = 1)
# target variable
y = df['stabf']

In [15]:
# splitting the data into an 80-20 train-test split with a random state of “1”
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1)

In [19]:
# importing StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_X_train=pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns)
scaled_X_test=pd.DataFrame(scaler.transform(X_test), columns=X_test.columns)

In [20]:
# function to train model
def train_model(model, X =scaled_X_train, y = y_train):
    return model.fit(X, y)                      # return train model

In [26]:
y_train.value_counts() 

unstable    5092
stable      2908
Name: stabf, dtype: int64

In [28]:
#Random Forest classifier
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(random_state=1)
#fit train set
rfc.fit(scaled_X_train, y_train)
# predict test set
rfc_pred = rfc.predict(scaled_X_test)

# accuracy of test set
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix
rfc_accuracy = accuracy_score(y_true=y_test, y_pred=rfc_pred)
print(f'Accuracy for Random Forest classifier is : {rfc_accuracy}')

Accuracy for Random Forest classifier is : 0.929


In [29]:
# importing extra tree classifier
from sklearn.ensemble import ExtraTreesClassifier
etc = ExtraTreesClassifier(random_state=1)

#fit train set
etc.fit(scaled_X_train, y_train)

# predict test set
etc_pred = etc.predict(scaled_X_test)

# accuracy of test set
etc_accuracy = accuracy_score(y_true=y_test, y_pred=etc_pred)
print(f'Accuracy for Extra Tree Classifier is {etc_accuracy}')

Accuracy for Extra Tree Classifier is 0.928


In [30]:
#importing XGBoost classifier
from xgboost import XGBClassifier
xgb = XGBClassifier(random_state=1)

#fit train set
xgb.fit(scaled_X_train, y_train)

# predict test set
xgb_pred = xgb.predict(scaled_X_test)

# accuracy of test set
xgb_accuracy = accuracy_score(y_true=y_test, y_pred=xgb_pred)

print(f'Accuracy for XGBoost classifier is : {xgb_accuracy}')

Accuracy for XGBoost classifier is : 0.9195


In [39]:
#importing LGBM classifier
from lightgbm import LGBMClassifier
lgbm = LGBMClassifier(random_state=1)

#fit train set
lgbm.fit(scaled_X_train, y_train)

# predict test set
lgbm_pred = lgbm.predict(scaled_X_test)

# accuracy of test set
lgbm_accuracy = accuracy_score(y_true=y_test, y_pred=lgbm_pred)
print(f'Accuracy for LGBM classifier is : {lgbm_accuracy}')

Accuracy for LGBM classifier is : 0.9375


In [32]:
# feature importance function
def feature_importance(model, feature, col_name):
    feature_imp = pd.Series(model.feature_importances_, feature.columns).sort_values()
    feature_imp_df = pd.DataFrame(feature_imp).reset_index()
    feature_imp_df.columns = ['Features', col_name]
    feature_imp_df[col_name].round(3)
    return feature_imp_df

In [34]:
feature_importance(etc, scaled_X_train, 'Feature Importance')

Unnamed: 0,Features,Feature Importance
0,p1,0.039507
1,p2,0.040371
2,p4,0.040579
3,p3,0.040706
4,g1,0.089783
5,g2,0.093676
6,g4,0.094019
7,g3,0.096883
8,tau3,0.113169
9,tau4,0.115466


In [37]:
# hyperparameter
# the number of trees in the forest/number of boosting rounds
n_estimators = [50, 100, 300, 500, 1000]
# the minimum number of samples required to split an internal node
min_samples_split = [2, 3, 5, 7, 9]
# the minimum number of samples required to be at a leaf node
min_samples_leaf = [1, 2, 4, 6, 8]
# the number of features to consider when looking for the best split
max_features = ['auto', 'sqrt', 'log2', None] 
# grid hyperparameter
hyperparameter = {'n_estimators': n_estimators,
                       'min_samples_leaf': min_samples_leaf,
                       'min_samples_split': min_samples_split,
                       'max_features': max_features}

In [56]:
# randomized search CV 
radomized_cv = RandomizedSearchCV(etc, hyperparameter,cv = 5, n_iter = 10, 
                                  scoring = 'accuracy', n_jobs = -1, verbose = 1, random_state = 1)
search_param = train_model(radomized_cv)
# best hyperparameters from the randomized search CV
search_param.best_params_

Fitting 5 folds for each of 10 candidates, totalling 50 fits


{'max_features': None,
 'min_samples_leaf': 8,
 'min_samples_split': 2,
 'n_estimators': 1000}

In [46]:
# extra trees classifier model(extc) accuracy with no hyperparameter tuning
model_accuracy(etc)

Classification report for ExtraTreesClassifier(random_state=1) is:
              precision    recall  f1-score   support

      stable    0.94099   0.85112   0.89381       712
    unstable    0.92183   0.97050   0.94554      1288

    accuracy                        0.92800      2000
   macro avg    0.93141   0.91081   0.91967      2000
weighted avg    0.92865   0.92800   0.92712      2000


Accuracy is 0.928


In [47]:
# using the best parameters to train the dataset using extra trees classifier model
hypertuned_etc = ExtraTreesClassifier(**search_param.best_params_, random_state = 1)
hypertuned_etc.fit(scaled_X_train, y_train)
hypertund_y_pred = hypertuned_etc.predict(scaled_X_test)
accuracy = round(accuracy_score(y_test, hypertund_y_pred), 4)
print(f'Accuracy is {accuracy}')

Accuracy is 0.927
