# Stability of Grid System

Instructions for Tag-Along Project
Stability of the Grid System

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

It has 12 primary predictive features and two dependent variables.

Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');
Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set. Answer the following questions:

In [1]:
from urllib.request import urlretrieve
import pandas as pd

In [2]:
url ='https://archive.ics.uci.edu/ml/machine-learning-databases/00471/Data_for_UCI_named.csv'

In [3]:
urlretrieve(url,'Hsd.csv')

In [4]:
df = pd.read_csv('Hsd.csv')
df.head().style.background_gradient()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [5]:
df.shape

(10000, 14)

In [6]:
df.isnull().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [7]:
df.describe().style.background_gradient()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5.25,5.250001,5.250004,5.249997,3.75,-1.25,-1.25,-1.25,0.525,0.525,0.525,0.525,0.015731
std,2.742548,2.742549,2.742549,2.742556,0.75216,0.433035,0.433035,0.433035,0.274256,0.274255,0.274255,0.274255,0.036919
min,0.500793,0.500141,0.500788,0.500473,1.58259,-1.999891,-1.999945,-1.999926,0.050009,0.050053,0.050054,0.050028,-0.08076
25%,2.874892,2.87514,2.875522,2.87495,3.2183,-1.624901,-1.625025,-1.62496,0.287521,0.287552,0.287514,0.287494,-0.015557
50%,5.250004,5.249981,5.249979,5.249734,3.751025,-1.249966,-1.249974,-1.250007,0.525009,0.525003,0.525015,0.525002,0.017142
75%,7.62469,7.624893,7.624948,7.624838,4.28242,-0.874977,-0.875043,-0.875065,0.762435,0.76249,0.76244,0.762433,0.044878
max,9.999469,9.999837,9.99945,9.999443,5.864418,-0.500108,-0.500072,-0.500025,0.999937,0.999944,0.999982,0.99993,0.109403


In [8]:
df.corr()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab
tau1,1.0,0.015586,-0.00597,-0.017265,0.027183,-0.015485,-0.015924,-0.015807,0.010521,0.01535,-0.001279,0.005494,0.275761
tau2,0.015586,1.0,0.014273,-0.001965,-0.004769,0.006573,0.007673,-0.005963,-0.001742,0.015383,0.016508,-0.011764,0.290975
tau3,-0.00597,0.014273,1.0,0.004354,0.016953,-0.003134,-0.00878,-0.017531,-0.011605,0.007671,0.014702,-0.011497,0.2807
tau4,-0.017265,-0.001965,0.004354,1.0,-0.003173,0.010553,0.006169,-0.011211,-0.004149,0.008431,0.00326,-0.000491,0.278576
p1,0.027183,-0.004769,0.016953,-0.003173,1.0,-0.573157,-0.584554,-0.579239,0.000721,0.015405,0.001069,-0.015451,0.010278
p2,-0.015485,0.006573,-0.003134,0.010553,-0.573157,1.0,0.002388,-0.006844,0.015603,-0.018032,0.007555,0.019817,0.006255
p3,-0.015924,0.007673,-0.00878,0.006169,-0.584554,0.002388,1.0,0.012953,-0.003219,-0.011575,-0.005897,-0.010485,-0.003321
p4,-0.015807,-0.005963,-0.017531,-0.011211,-0.579239,-0.006844,0.012953,1.0,-0.013636,0.00285,-0.003515,0.017505,-0.020786
g1,0.010521,-0.001742,-0.011605,-0.004149,0.000721,0.015603,-0.003219,-0.013636,1.0,0.007559,-0.005836,0.012431,0.282774
g2,0.01535,0.015383,0.007671,0.008431,0.015405,-0.018032,-0.011575,0.00285,0.007559,1.0,-0.012809,-0.014909,0.293601


In [9]:
def stat(cols):
    if cols['stab'] <=0:
        return 'stable'
    else:
        return 'unstable'
df['stab'] = df.apply(stat,axis=1)

In [10]:
df.drop(columns=['stabf'],inplace=True)

In [11]:
df['stab'].value_counts(normalize=True)

unstable    0.638
stable      0.362
Name: stab, dtype: float64

In [12]:
X =df.drop('stab',axis=1)
y =df['stab']

In [13]:
from sklearn.model_selection import train_test_split,RandomizedSearchCV

In [14]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1)

In [15]:
from sklearn.preprocessing import StandardScaler

In [16]:
scaler = StandardScaler()

In [17]:
X_scaled_train = scaler.fit_transform(X_train)
X_scaled_test = scaler.transform(X_test)

In [18]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score,recall_score,precision_score,classification_report,confusion_matrix

### XGboost

In [24]:
le = LabelEncoder()
y_train_encod = le.fit_transform(y_train)
y_test_encod = le.transform(y_test)
model = XGBClassifier()
model.fit(X_scaled_train,y_train_encod)
y_pred = model.predict(X_scaled_test)
accuracy_score(y_test_encod,y_pred)

0.9455

In [25]:
def create_model (model): 
    le = LabelEncoder()
  
    model = model
    model.fit(X_scaled_train,y_train)
    y_pred =model.predict(X_scaled_test)
    
    accuracy =accuracy_score(y_test,y_pred)
    
    return accuracy

### LightGBM model

In [26]:
create_model(LGBMClassifier())

0.9395

### Random Forest Model

In [27]:
create_model(RandomForestClassifier())

0.9255

### ExtraTree Classifier

In [28]:
create_model(ExtraTreesClassifier())

0.927

### Hyperparameter Tuning

In [29]:
etr = ExtraTreesClassifier()

In [32]:
help(ExtraTreesClassifier())

Help on ExtraTreesClassifier in module sklearn.ensemble._forest object:

class ExtraTreesClassifier(ForestClassifier)
 |  ExtraTreesClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=False, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None)
 |  
 |  An extra-trees classifier.
 |  
 |  This class implements a meta estimator that fits a number of
 |  randomized decision trees (a.k.a. extra-trees) on various sub-samples
 |  of the dataset and uses averaging to improve the predictive accuracy
 |  and control over-fitting.
 |  
 |  Read more in the :ref:`User Guide <forest>`.
 |  
 |  Parameters
 |  ----------
 |  n_estimators : int, default=100
 |      The number of trees in the forest.
 |  
 |      .. versionchanged:: 0.22
 |         The default val

In [31]:
param_distribution ={"n_estimators":[64,100,150,200,300,500],
                    'max_features':['sqrt','log2',None,'auto'],
                    'min_samples_split':[1,2,3,4,5,6,7,8,9],
                    'min_samples_leaf':[1,2,3,4,5,6,7,8,9]}

In [33]:
random_model = RandomizedSearchCV(estimator=etr,param_distributions=param_distribution,
                                  cv=10,n_iter=10,n_jobs=-1,scoring='accuracy',verbose=1,random_state=1)

In [34]:
random_model.fit(X_scaled_train,y_train)

Fitting 10 folds for each of 10 candidates, totalling 100 fits


In [35]:
best_model =random_model.best_estimator_

In [36]:
best_model

In [37]:
create_model(best_model)

0.9345

The accuracy of our new optimal model was higher. Since it moved from 0.927 to 0.935

### Feature Importance

In [39]:
best_model.feature_importances_

array([0.12929314, 0.13123506, 0.12672113, 0.12830687, 0.01286634,
       0.01514669, 0.01529447, 0.0144875 , 0.10164368, 0.10700382,
       0.10986452, 0.10813679])

In [40]:
X_train.columns

Index(['tau1', 'tau2', 'tau3', 'tau4', 'p1', 'p2', 'p3', 'p4', 'g1', 'g2',
       'g3', 'g4'],
      dtype='object')

In [45]:
feature_importance_df = pd.DataFrame({'features':X_train.columns,
                                     'coefficient':best_model.feature_importances_}).sort_values('coefficient').set_index('features')

In [46]:
feature_importance_df

Unnamed: 0_level_0,coefficient
features,Unnamed: 1_level_1
p1,0.012866
p4,0.014487
p2,0.015147
p3,0.015294
g1,0.101644
g2,0.107004
g4,0.108137
g3,0.109865
tau3,0.126721
tau4,0.128307


We can see that p1 is the least important feature while tau2 is the most important feature