Predictive features:

* 'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
* 'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
* 'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

Dependent variables:

* 'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
* 'stabf': a categorical (binary) label ('stable' or 'unstable').

* Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

* Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set

In [1]:
#import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
#load dataset
df = pd.read_csv('grid_stability.csv')

In [3]:
#check first few rows
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [4]:
#dropping stab column to avoid multicolinearity
df = df.drop('stab',axis=1)

In [5]:
#checking new dataframe features
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,unstable


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stabf   10000 non-null  object 
dtypes: float64(12), object(1)
memory usage: 1015.8+ KB


In [7]:
#check for missing values
df.isnull().any()

tau1     False
tau2     False
tau3     False
tau4     False
p1       False
p2       False
p3       False
p4       False
g1       False
g2       False
g3       False
g4       False
stabf    False
dtype: bool

In [8]:
from sklearn.model_selection import train_test_split

In [9]:
X = df.drop('stabf',axis=1)
y= df['stabf']

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [11]:
from sklearn.preprocessing import StandardScaler

In [12]:
scaler = StandardScaler()

In [13]:
#encoding target variable
encoded_ytrain = pd.get_dummies(pd.DataFrame(y_train))

In [14]:
#encoding target variable
encoded_ytest = pd.get_dummies(pd.DataFrame(y_test))

In [15]:
#scaling features
scaled_xtrain = scaler.fit_transform(X_train)
scaled_xtest = scaler.fit_transform(X_test)

In [16]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

In [17]:
rfc = RandomForestClassifier(n_estimators = 100,random_state=1)
rfc.fit(scaled_xtrain,encoded_ytrain)

In [18]:
y_pred = rfc.predict(scaled_xtest)

In [19]:
from sklearn.metrics import accuracy_score

In [20]:
#accuracy of random forest model
accuracy = accuracy_score(encoded_ytest,y_pred)
print(accuracy.round(4))

0.924


In [21]:
etc = ExtraTreesClassifier(n_estimators=100, random_state=1)
etc.fit(scaled_xtrain,encoded_ytrain)

In [22]:
y_pred = etc.predict(scaled_xtest)

In [23]:
#accuracy of extra tree classifier model
accuracy = accuracy_score(encoded_ytest,y_pred)
print(accuracy)

0.9205


In [24]:
import xgboost
from xgboost import XGBClassifier
xgb = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=1)
xgb.fit(scaled_xtrain, encoded_ytrain)

In [25]:
y_pred = xgb.predict(scaled_xtest)

In [26]:
#accuracy of xgboost model
accuracy = accuracy_score(encoded_ytest,y_pred)
print(accuracy.round(4))

0.9395


In [27]:
import lightgbm
from lightgbm import LGBMClassifier

In [28]:
lgb = LGBMClassifier(n_estimators=100, learning_rate=0.1, random_state=1)

In [29]:
encoded_ytrain.head()

Unnamed: 0,stabf_stable,stabf_unstable
2694,0,1
5140,0,1
2568,0,1
3671,0,1
7427,0,1


In [30]:
import warnings

# ignore all warnings
warnings.filterwarnings("ignore")

lgb.fit(scaled_xtrain,encoded_ytrain.drop('stabf_stable',axis=1))

In [31]:
y_pred = lgb.predict(scaled_xtest)

In [32]:
#accuracy of light gradient boosting model
accuracy = accuracy_score(encoded_ytest.drop('stabf_stable',axis=1),y_pred)
print(accuracy.round(4))

0.9365


In [33]:
from sklearn.model_selection import RandomizedSearchCV

In [34]:
param_grid = {
    'n_estimators': [100, 300, 500,1000],
    'min_samples_split': [2, 5, 7],
    'min_samples_leaf': [4, 6, 8],
    'max_features': ['auto', 'log2']
}

In [35]:
etc = ExtraTreesClassifier()

In [36]:
#hyperparameter tuning of extra tree classifier model
rs = RandomizedSearchCV(estimator= etc, param_distributions=param_grid,
                                   cv=5,n_iter=10, scoring = 'accuracy',
                                    n_jobs = -1, verbose = 1, random_state = 1)

In [37]:
rs.fit(scaled_xtrain,encoded_ytrain)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [38]:
print('Best Parameters:', rs.best_params_)

Best Parameters: {'n_estimators': 1000, 'min_samples_split': 2, 'min_samples_leaf': 4, 'max_features': 'log2'}


In [39]:
etc = ExtraTreesClassifier(n_estimators=1000, min_samples_split= 2, min_samples_leaf=4, max_features='log2',random_state = 1)

In [40]:
etc.fit(scaled_xtrain,encoded_ytrain)

In [41]:
y_pred = etc.predict(scaled_xtest)

In [42]:
#accuracy of optimal extra tree classifier 
accuracy = accuracy_score(encoded_ytest,y_pred)
print(accuracy)
#The accuracy of the improved extra tree classifier is higher

0.9255


In [44]:
# Obtain the feature importance scores
importances = etc.feature_importances_
print(importances)

[0.13511233 0.13845208 0.13066988 0.13268879 0.0142295  0.01780055
 0.01769912 0.01758956 0.09208005 0.1001093  0.10393372 0.09963512]


In [49]:
importances_df = pd.DataFrame(importances,index=X_train.columns,columns=['Feature_importance'])
importances_df


Unnamed: 0,Feature_importance
tau1,0.135112
tau2,0.138452
tau3,0.13067
tau4,0.132689
p1,0.014229
p2,0.017801
p3,0.017699
p4,0.01759
g1,0.09208
g2,0.100109


* The most important feature is tau2
* The least important feature is p1