# HAMOYE STAGE C INTERNSHIP TASK
## Ashinze Emmanuel Chidi
## ID: 14f9fdffde81f000

Stability of the Grid System

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

It has 12 primary predictive features and two dependent variables.

Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');
Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set

In [1]:
#importing basic  libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
#read the dataset
df = pd.read_csv('/content/sample_data/Data_for_UCI_named.csv')
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [3]:
#check info of the data set
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [4]:
#descriptive statistics
df.describe(include='all')

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000
unique,,,,,,,,,,,,,,2
top,,,,,,,,,,,,,,unstable
freq,,,,,,,,,,,,,,6380
mean,5.25,5.250001,5.250004,5.249997,3.75,-1.25,-1.25,-1.25,0.525,0.525,0.525,0.525,0.015731,
std,2.742548,2.742549,2.742549,2.742556,0.75216,0.433035,0.433035,0.433035,0.274256,0.274255,0.274255,0.274255,0.036919,
min,0.500793,0.500141,0.500788,0.500473,1.58259,-1.999891,-1.999945,-1.999926,0.050009,0.050053,0.050054,0.050028,-0.08076,
25%,2.874892,2.87514,2.875522,2.87495,3.2183,-1.624901,-1.625025,-1.62496,0.287521,0.287552,0.287514,0.287494,-0.015557,
50%,5.250004,5.249981,5.249979,5.249734,3.751025,-1.249966,-1.249974,-1.250007,0.525009,0.525003,0.525015,0.525002,0.017142,
75%,7.62469,7.624893,7.624948,7.624838,4.28242,-0.874977,-0.875043,-0.875065,0.762435,0.76249,0.76244,0.762433,0.044878,


In [5]:
#check for missing values
df.isna().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [6]:
#check duplicates
df.duplicated().sum()

0

In [7]:
#drop 'stab' 
df = df.drop('stab', axis =1)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stabf   10000 non-null  object 
dtypes: float64(12), object(1)
memory usage: 1015.8+ KB


In [9]:
#check correllation
df.corr()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
tau1,1.0,0.015586,-0.00597,-0.017265,0.027183,-0.015485,-0.015924,-0.015807,0.010521,0.01535,-0.001279,0.005494
tau2,0.015586,1.0,0.014273,-0.001965,-0.004769,0.006573,0.007673,-0.005963,-0.001742,0.015383,0.016508,-0.011764
tau3,-0.00597,0.014273,1.0,0.004354,0.016953,-0.003134,-0.00878,-0.017531,-0.011605,0.007671,0.014702,-0.011497
tau4,-0.017265,-0.001965,0.004354,1.0,-0.003173,0.010553,0.006169,-0.011211,-0.004149,0.008431,0.00326,-0.000491
p1,0.027183,-0.004769,0.016953,-0.003173,1.0,-0.573157,-0.584554,-0.579239,0.000721,0.015405,0.001069,-0.015451
p2,-0.015485,0.006573,-0.003134,0.010553,-0.573157,1.0,0.002388,-0.006844,0.015603,-0.018032,0.007555,0.019817
p3,-0.015924,0.007673,-0.00878,0.006169,-0.584554,0.002388,1.0,0.012953,-0.003219,-0.011575,-0.005897,-0.010485
p4,-0.015807,-0.005963,-0.017531,-0.011211,-0.579239,-0.006844,0.012953,1.0,-0.013636,0.00285,-0.003515,0.017505
g1,0.010521,-0.001742,-0.011605,-0.004149,0.000721,0.015603,-0.003219,-0.013636,1.0,0.007559,-0.005836,0.012431
g2,0.01535,0.015383,0.007671,0.008431,0.015405,-0.018032,-0.011575,0.00285,0.007559,1.0,-0.012809,-0.014909


In [10]:
#select the dependend and independent 
x = df.drop(columns=['stabf'])

y = df['stabf']

In [11]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [12]:
#select training and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

print('X_train shape: {}'.format(x_train.shape))
print('y_train shape: {}'.format(y_train.shape))
print('X_test shape: {}'.format(x_test.shape))
print('y_test shape: {}'.format(y_test.shape))

X_train shape: (8000, 12)
y_train shape: (8000,)
X_test shape: (2000, 12)
y_test shape: (2000,)


In [13]:
#scale the dataset and separate into dataframes
scaler = StandardScaler()
scaler.fit(x_train)

x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)
x_train_scaled = pd.DataFrame(x_train_scaled, columns = x_train.columns)
x_test_scaled = pd.DataFrame(x_test_scaled, columns = x_test.columns)


# Random Forest

In [14]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state = 1)
rf.fit(x_train_scaled, y_train)
rf_pred = rf.predict(x_test_scaled)

In [15]:

from sklearn.metrics import accuracy_score,recall_score, precision_score, f1_score, confusion_matrix, classification_report

accuracy = accuracy_score(y_test, rf_pred)
print('Accuracy: {}'.format(round(accuracy,5)))

precision = precision_score(y_test, rf_pred, pos_label='stable')
print('Precision: {}'.format(precision))  

recall = recall_score(y_test, rf_pred, pos_label='stable')
print('Recall: {}'.format(recall))

f1 = f1_score(y_test, rf_pred, pos_label='stable')
print('F1: {}'.format(f1))

print('Classification Report:\n', classification_report(y_test,rf_pred,digits =5))

confusion_rf = confusion_matrix(y_test, rf_pred)
print('Confusion Matrix:\n', confusion_rf)

Accuracy: 0.929
Precision: 0.9191176470588235
Recall: 0.8778089887640449
F1: 0.8979885057471264
Classification Report:
               precision    recall  f1-score   support

      stable    0.91912   0.87781   0.89799       712
    unstable    0.93409   0.95730   0.94555      1288

    accuracy                        0.92900      2000
   macro avg    0.92660   0.91755   0.92177      2000
weighted avg    0.92876   0.92900   0.92862      2000

Confusion Matrix:
 [[ 625   87]
 [  55 1233]]


In [16]:
print("Training set score: {:.3f}".format(rf.score(x_train_scaled, y_train)))
print("Test set score: {:.3f}".format(rf.score(x_test_scaled, y_test)))

Training set score: 1.000
Test set score: 0.929


# Extra Trees Classifier

In [17]:
from sklearn.ensemble import ExtraTreesClassifier

etc = ExtraTreesClassifier(random_state = 1)
etc.fit(x_train_scaled, y_train)
etc_pred = etc.predict(x_test_scaled)

In [18]:
accuracy = accuracy_score(y_test, etc_pred)
print('Accuracy: {}'.format(accuracy))

precision = precision_score(y_test, etc_pred, pos_label='stable')
print('Precision: {}'.format(precision))  

recall = recall_score(y_test, etc_pred, pos_label='stable')
print('Recall: {}'.format(recall))

f1 = f1_score(y_test, etc_pred, pos_label='stable')
print('F1: {}'.format(f1))

print('Classification Report:\n', classification_report(y_test,etc_pred,digits =5))

confusion_rf = confusion_matrix(y_test, etc_pred)
print('Confusion Matrix:\n', confusion_rf)

Accuracy: 0.928
Precision: 0.9409937888198758
Recall: 0.851123595505618
F1: 0.8938053097345133
Classification Report:
               precision    recall  f1-score   support

      stable    0.94099   0.85112   0.89381       712
    unstable    0.92183   0.97050   0.94554      1288

    accuracy                        0.92800      2000
   macro avg    0.93141   0.91081   0.91967      2000
weighted avg    0.92865   0.92800   0.92712      2000

Confusion Matrix:
 [[ 606  106]
 [  38 1250]]


# Improve ETC

In [19]:
n_estimators = [50, 100, 300, 500, 1000]

min_samples_split = [2, 3, 5, 7, 9]

min_samples_leaf = [1, 2, 4, 6, 8]

max_features = ['auto', 'sqrt', 'log2', None] 

hyperparameter_grid = {'n_estimators': n_estimators,

                       'min_samples_leaf': min_samples_leaf,

                       'min_samples_split': min_samples_split,

                       'max_features': max_features}

In [20]:
from sklearn.model_selection import RandomizedSearchCV

In [21]:
randomcv = RandomizedSearchCV(estimator = etc, 
                              param_distributions = hyperparameter_grid, cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1,
                              random_state = 1)

In [22]:
search = randomcv.fit(x_train_scaled, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [23]:
search.best_params_

{'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 8,
 'max_features': None}

In [24]:
etc2 = ExtraTreesClassifier(max_features = None, 
                            min_samples_leaf= 8,
                            min_samples_split= 2,
                            n_estimators= 1000, 
                            random_state = 1)

etc2.fit(x_train_scaled, y_train)
etc2_pred = etc2.predict(x_test_scaled)

In [25]:
print('Classification Report:\n', classification_report(y_test,etc2_pred, digits =4))

Classification Report:
               precision    recall  f1-score   support

      stable     0.9211    0.8694    0.8945       712
    unstable     0.9300    0.9589    0.9442      1288

    accuracy                         0.9270      2000
   macro avg     0.9256    0.9141    0.9193      2000
weighted avg     0.9268    0.9270    0.9265      2000



In [26]:
importance = etc2.feature_importances_

In [27]:
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))

Feature: 0, Score: 0.13724
Feature: 1, Score: 0.14051
Feature: 2, Score: 0.13468
Feature: 3, Score: 0.13542
Feature: 4, Score: 0.00368
Feature: 5, Score: 0.00534
Feature: 6, Score: 0.00543
Feature: 7, Score: 0.00496
Feature: 8, Score: 0.10256
Feature: 9, Score: 0.10758
Feature: 10, Score: 0.11306
Feature: 11, Score: 0.10954


In [28]:
x_train_scaled.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
0,0.367327,-0.986042,0.650447,1.547527,-0.29149,0.061535,1.293862,-0.845074,0.160918,0.339859,0.585568,0.492239
1,-0.064659,0.089437,1.035079,-1.641494,0.619865,-0.067235,-1.502925,0.486613,-0.293143,-1.558488,1.429649,-1.443521
2,-1.46785,1.298418,-0.502536,1.166046,-0.180521,0.490603,0.68256,-0.855302,1.39935,1.451534,-1.045743,0.492489
3,0.820081,0.52992,1.299657,-1.141975,-0.812854,-0.763632,1.521579,0.65878,-0.958319,1.361958,1.60414,0.275303
4,0.665424,-1.425627,0.3123,0.919137,-1.614296,0.760315,1.422019,0.639243,1.676895,0.69566,1.137504,-1.312575


# XGBoost

In [29]:
from xgboost import XGBClassifier

xgb= XGBClassifier(random_state = 1)

xgb.fit(x_train_scaled, y_train)
xgb_pred = xgb.predict(x_test_scaled)

In [30]:
accuracy = accuracy_score(y_test, xgb_pred)
print('Accuracy: {}'.format(accuracy))

precision = precision_score(y_test, xgb_pred, pos_label='stable')
print('Precision: {}'.format(precision))  

recall = recall_score(y_test, xgb_pred, pos_label='stable')
print('Recall: {}'.format(recall))

f1 = f1_score(y_test, xgb_pred, pos_label='stable')
print('F1: {}'.format(f1))

print('Classification Report:\n', classification_report(y_test,xgb_pred,digits =5))

confusion = confusion_matrix(y_test, xgb_pred)
print('Confusion Matrix:\n', confusion)

Accuracy: 0.9195
Precision: 0.9206106870229007
Recall: 0.8469101123595506
F1: 0.8822238478419898
Classification Report:
               precision    recall  f1-score   support

      stable    0.92061   0.84691   0.88222       712
    unstable    0.91896   0.95963   0.93885      1288

    accuracy                        0.91950      2000
   macro avg    0.91978   0.90327   0.91054      2000
weighted avg    0.91955   0.91950   0.91869      2000

Confusion Matrix:
 [[ 603  109]
 [  52 1236]]


# LightGBM Classifier

In [31]:
from lightgbm import LGBMClassifier

lgbm= LGBMClassifier(random_state = 1)

lgbm.fit(x_train_scaled, y_train)
lgbm_pred = lgbm.predict(x_test_scaled)

In [32]:
accuracy = accuracy_score(y_test, lgbm_pred)
print('Accuracy: {}'.format(accuracy))

precision = precision_score(y_test, lgbm_pred, pos_label='stable')
print('Precision: {}'.format(precision))  

recall = recall_score(y_test, lgbm_pred, pos_label='stable')
print('Recall: {}'.format(recall))

f1 = f1_score(y_test, lgbm_pred, pos_label='stable')
print('F1: {}'.format(f1))

print('Classification Report:\n', classification_report(y_test,lgbm_pred,digits =5))

confusion = confusion_matrix(y_test, lgbm_pred)
print('Confusion Matrix:\n', confusion)

Accuracy: 0.9375
Precision: 0.9297218155197657
Recall: 0.8918539325842697
F1: 0.910394265232975
Classification Report:
               precision    recall  f1-score   support

      stable    0.92972   0.89185   0.91039       712
    unstable    0.94153   0.96273   0.95202      1288

    accuracy                        0.93750      2000
   macro avg    0.93563   0.92729   0.93120      2000
weighted avg    0.93733   0.93750   0.93720      2000

Confusion Matrix:
 [[ 635   77]
 [  48 1240]]
