# Instructions for Tag-Along Project


## Stability of the Grid System

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

It has 12 primary predictive features and two dependent variables.

## Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');
Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set. Answer the following questions:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('C:/Users/HP/Downloads/Data_for_UCI_named.csv')

In [3]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [5]:
df.columns

Index(['tau1', 'tau2', 'tau3', 'tau4', 'p1', 'p2', 'p3', 'p4', 'g1', 'g2',
       'g3', 'g4', 'stab', 'stabf'],
      dtype='object')

In [6]:
#drop stab
df = df.drop('stab', axis = 1)

In [7]:
df.columns

Index(['tau1', 'tau2', 'tau3', 'tau4', 'p1', 'p2', 'p3', 'p4', 'g1', 'g2',
       'g3', 'g4', 'stabf'],
      dtype='object')

In [8]:
df['stabf'].value_counts()

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [9]:
df['stabf'] = df['stabf'].map({'unstable':0,'stable': 1 })

In [10]:
df

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.959060,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.781760,1
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.277210,-0.920492,0.163041,0.766689,0.839444,0.109853,0
3,0.716415,7.669600,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.797110,0.455450,0.656947,0.820923,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,2.930406,9.487627,2.376523,6.187797,3.343416,-0.658054,-1.449106,-1.236256,0.601709,0.779642,0.813512,0.608385,0
9996,3.392299,1.274827,2.954947,6.894759,4.349512,-1.663661,-0.952437,-1.733414,0.502079,0.567242,0.285880,0.366120,1
9997,2.364034,2.842030,8.776391,1.008906,4.299976,-1.380719,-0.943884,-1.975373,0.487838,0.986505,0.149286,0.145984,1
9998,9.631511,3.994398,2.757071,7.821347,2.514755,-0.966330,-0.649915,-0.898510,0.365246,0.587558,0.889118,0.818391,0


In [11]:
#splitting data into train, test
from sklearn.model_selection import train_test_split

In [12]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=1)

In [13]:
len(df_train), len(df_test)

(8000, 2000)

In [14]:
df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

In [15]:
y_train = df_train['stabf']
y_test = df_test['stabf']

In [16]:
del df_train['stabf']
del df_test['stabf']

In [17]:
len(df_train), len(df_test), len(y_train), len(y_test)

(8000, 2000, 8000, 2000)

In [18]:
df_train.shape

(8000, 12)

In [19]:
#transform train and test set using standard scaler
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(df_train)

x_train_scaled = scaler.transform(df_train)
x_test_scaled = scaler.transform(df_test)

In [20]:
#put the scaled sets into a daataframe

x_train_scaled = pd.DataFrame(x_train_scaled, columns = df_train.columns)
x_test_scaled = pd.DataFrame(x_test_scaled, columns = df_test.columns)

In [21]:
x_train_scaled

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
0,0.367327,-0.986042,0.650447,1.547527,-0.291490,0.061535,1.293862,-0.845074,0.160918,0.339859,0.585568,0.492239
1,-0.064659,0.089437,1.035079,-1.641494,0.619865,-0.067235,-1.502925,0.486613,-0.293143,-1.558488,1.429649,-1.443521
2,-1.467850,1.298418,-0.502536,1.166046,-0.180521,0.490603,0.682560,-0.855302,1.399350,1.451534,-1.045743,0.492489
3,0.820081,0.529920,1.299657,-1.141975,-0.812854,-0.763632,1.521579,0.658780,-0.958319,1.361958,1.604140,0.275303
4,0.665424,-1.425627,0.312300,0.919137,-1.614296,0.760315,1.422019,0.639243,1.676895,0.695660,1.137504,-1.312575
...,...,...,...,...,...,...,...,...,...,...,...,...
7995,1.551314,0.007408,-1.177640,1.016898,-0.397177,0.759820,-0.636951,0.572703,-1.209413,0.313976,-1.625728,-0.637401
7996,1.015925,-0.223483,-1.489381,-1.479078,0.451468,-0.731994,0.990355,-1.048148,-1.094647,-0.755209,0.734821,-0.304433
7997,0.657609,-0.722756,-1.405888,-0.274301,-0.012584,1.438694,-0.364266,-1.046683,1.253539,0.293100,-1.550587,0.810344
7998,-0.059316,-1.260532,-1.010471,-0.877808,-0.779769,0.828824,0.516923,0.018984,-0.182448,-0.388255,-0.726781,1.667916


# Model Build
## RandomForestClassifier

In [22]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state = 1)

#fit on train set
rf.fit(x_train_scaled, y_train)

In [23]:
y_pred = rf.predict(x_test_scaled)

In [24]:
y_pred

array([0, 0, 1, ..., 1, 0, 0], dtype=int64)

In [27]:
#model accuracy
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {}'.format(round(accuracy*100), 4))

from sklearn.metrics import recall_score, precision_score, f1_score, confusion_matrix, classification_report

#precision
precision = precision_score(y_test, y_pred)
print('Precision: {}'.format(round(precision*100), 2))  

#recall
recall = recall_score(y_test, y_pred)
print('Recall: {}'.format(round(recall*100), 2))

#F1 score
f1 = f1_score(y_test, y_pred)
print('F1: {}'.format(round(f1*100), 2))

#classification report
print('Classification Report:\n', classification_report(y_test,y_pred, digits =4))

#confusion matrix
cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:\n', cm)
     

Accuracy: 93
Precision: 93
Recall: 87
F1: 90
Classification Report:
               precision    recall  f1-score   support

           0     0.9315    0.9612    0.9461      1288
           1     0.9255    0.8722    0.8980       712

    accuracy                         0.9295      2000
   macro avg     0.9285    0.9167    0.9221      2000
weighted avg     0.9294    0.9295    0.9290      2000

Confusion Matrix:
 [[1238   50]
 [  91  621]]


In [28]:
print("Train score: {:.3f}".format(rf.score(x_train_scaled, y_train)))
print("Test score: {:.3f}".format(rf.score(x_test_scaled, y_test)))

Train score: 1.000
Test score: 0.929


## ExtraTreeClassifier

In [29]:
from sklearn.ensemble import ExtraTreesClassifier

ex_tree = ExtraTreesClassifier(random_state = 1)

#fit on the train set
ex_tree.fit(x_train_scaled, y_train)

In [30]:
y_pred1 = ex_tree.predict(x_test_scaled)

## Evaluating Model Performance for ExtraTreeClassifier

In [31]:
#model accuracy
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred1)
print('Accuracy: {}'.format(round(accuracy*100), 4))

from sklearn.metrics import recall_score, precision_score, f1_score, confusion_matrix, classification_report

#precision
precision = precision_score(y_test, y_pred1)
print('Precision: {}'.format(round(precision*100), 2))  

#recall
recall = recall_score(y_test, y_pred1)
print('Recall: {}'.format(round(recall*100), 2))

#F1 score
f1 = f1_score(y_test, y_pred1)
print('F1: {}'.format(round(f1*100), 2))

#classification report
print('Classification Report:\n', classification_report(y_test,y_pred1, digits =4))

#confusion matrix
cm = confusion_matrix(y_test, y_pred1)
print('Confusion Matrix:\n', cm)
     

Accuracy: 93
Precision: 95
Recall: 84
F1: 89
Classification Report:
               precision    recall  f1-score   support

           0     0.9182    0.9759    0.9462      1288
           1     0.9509    0.8427    0.8935       712

    accuracy                         0.9285      2000
   macro avg     0.9345    0.9093    0.9199      2000
weighted avg     0.9298    0.9285    0.9274      2000

Confusion Matrix:
 [[1257   31]
 [ 112  600]]


In [32]:
print("Train score: {:.3f}".format(ex_tree.score(x_train_scaled, y_train)))
print("Test score: {:.3f}".format(ex_tree.score(x_test_scaled, y_test)))


Train score: 1.000
Test score: 0.928


## XGBoost

In [33]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)

In [34]:
from xgboost import XGBClassifier

XGB= XGBClassifier(random_state = 1)

#fit on train set
XGB.fit(x_train_scaled, y_train)
     

In [35]:
y_pred2 = XGB.predict(x_test_scaled)

In [36]:
#model accuracy
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred2)
print('Accuracy: {}'.format(round(accuracy*100), 4))

from sklearn.metrics import recall_score, precision_score, f1_score, confusion_matrix, classification_report

#precision
precision = precision_score(y_test, y_pred2)
print('Precision: {}'.format(round(precision*100), 2))  

#recall
recall = recall_score(y_test, y_pred2)
print('Recall: {}'.format(round(recall*100), 2))

#F1 score
f1 = f1_score(y_test, y_pred2)
print('F1: {}'.format(round(f1*100), 2))

#classification report
print('Classification Report:\n', classification_report(y_test,y_pred2, digits =4))

#confusion matrix
cm = confusion_matrix(y_test, y_pred2)
print('Confusion Matrix:\n', cm)
     

Accuracy: 95
Precision: 94
Recall: 91
F1: 92
Classification Report:
               precision    recall  f1-score   support

           0     0.9510    0.9651    0.9580      1288
           1     0.9351    0.9101    0.9224       712

    accuracy                         0.9455      2000
   macro avg     0.9430    0.9376    0.9402      2000
weighted avg     0.9453    0.9455    0.9453      2000

Confusion Matrix:
 [[1243   45]
 [  64  648]]


In [38]:
from lightgbm import LGBMClassifier

lgbm= LGBMClassifier(random_state = 1)

#fit on train set
lgbm.fit(x_train_scaled, y_train)

In [39]:
y_pred3 = lgbm.predict(x_test_scaled)

In [40]:
#model accuracy
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred3)
print('Accuracy: {}'.format(round(accuracy*100), 4))

from sklearn.metrics import recall_score, precision_score, f1_score, confusion_matrix, classification_report

#precision
precision = precision_score(y_test, y_pred3)
print('Precision: {}'.format(round(precision*100), 2))  

#recall
recall = recall_score(y_test, y_pred3)
print('Recall: {}'.format(round(recall*100), 2))

#F1 score
f1 = f1_score(y_test, y_pred3)
print('F1: {}'.format(round(f1*100), 2))

#classification report
print('Classification Report:\n', classification_report(y_test,y_pred2, digits =4))

#confusion matrix
cm = confusion_matrix(y_test, y_pred3)
print('Confusion Matrix:\n', cm)
     

Accuracy: 94
Precision: 93
Recall: 90
F1: 91
Classification Report:
               precision    recall  f1-score   support

           0     0.9510    0.9651    0.9580      1288
           1     0.9351    0.9101    0.9224       712

    accuracy                         0.9455      2000
   macro avg     0.9430    0.9376    0.9402      2000
weighted avg     0.9453    0.9455    0.9453      2000

Confusion Matrix:
 [[1238   50]
 [  71  641]]


## Tuning ExtraTreesClassifier

In [41]:
#combination of hyperparameters
n_estimators = [50, 100, 300, 500, 1000]

min_samples_split = [2, 3, 5, 7, 9]

min_samples_leaf = [1, 2, 4, 6, 8]

max_features = ['auto', 'sqrt', 'log2', None] 

hyperparameter_grid = {'n_estimators': n_estimators,

                       'min_samples_leaf': min_samples_leaf,

                       'min_samples_split': min_samples_split,

                       'max_features': max_features}


In [43]:
from sklearn.model_selection import RandomizedSearchCV


#set up randomsearch with 5folds

randomcv = RandomizedSearchCV(estimator = ex_tree, 
                              param_distributions = hyperparameter_grid, cv=5, n_iter=10, 
                              scoring = 'accuracy', n_jobs = -1, verbose = 1,
                              random_state = 1)


In [44]:
ran_search = randomcv.fit(x_train_scaled, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [45]:
ran_search.best_params_

{'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 8,
 'max_features': None}

In [46]:
#check feature importances
importance = ex_tree.feature_importances_

In [47]:
#print feature importances
for i,v in enumerate(importance):
    print('Feature: %0d, Score: %.5f' % (i,v))

Feature: 0, Score: 0.11740
Feature: 1, Score: 0.11844
Feature: 2, Score: 0.11317
Feature: 3, Score: 0.11547
Feature: 4, Score: 0.03951
Feature: 5, Score: 0.04037
Feature: 6, Score: 0.04071
Feature: 7, Score: 0.04058
Feature: 8, Score: 0.08978
Feature: 9, Score: 0.09368
Feature: 10, Score: 0.09688
Feature: 11, Score: 0.09402


In [48]:
x_train_scaled.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
0,0.367327,-0.986042,0.650447,1.547527,-0.29149,0.061535,1.293862,-0.845074,0.160918,0.339859,0.585568,0.492239
1,-0.064659,0.089437,1.035079,-1.641494,0.619865,-0.067235,-1.502925,0.486613,-0.293143,-1.558488,1.429649,-1.443521
2,-1.46785,1.298418,-0.502536,1.166046,-0.180521,0.490603,0.68256,-0.855302,1.39935,1.451534,-1.045743,0.492489
3,0.820081,0.52992,1.299657,-1.141975,-0.812854,-0.763632,1.521579,0.65878,-0.958319,1.361958,1.60414,0.275303
4,0.665424,-1.425627,0.3123,0.919137,-1.614296,0.760315,1.422019,0.639243,1.676895,0.69566,1.137504,-1.312575


In [49]:
#get best score
ran_search.best_score_

0.9241249999999999

In [50]:
#Evaluate ExtraTreesClassifier on test set using  best params
ex_tree = ExtraTreesClassifier(max_features = None, 
                            min_samples_leaf= 8,
                            min_samples_split= 2,
                            n_estimators= 1000, 
                            random_state = 1)

#fit on train set
ex_tree.fit(x_train_scaled, y_train)

In [51]:
y_pred4 = ex_tree.predict(x_test_scaled)

In [52]:
print('Classification Report:\n', classification_report(y_test,y_pred4, digits =4))

Classification Report:
               precision    recall  f1-score   support

           0     0.9300    0.9589    0.9442      1288
           1     0.9211    0.8694    0.8945       712

    accuracy                         0.9270      2000
   macro avg     0.9256    0.9141    0.9193      2000
weighted avg     0.9268    0.9270    0.9265      2000

