#Stage-c Tag along project-- Pragati Thakur
####Stability of the Grid system

#Note-
All the questions which were to be answered through coding are present over here!

#Instructions for Tag-Along Project

###Stability of the Grid System

Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy sources, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

#Dataset: EGSS Data

It has 12 primary predictive features and two dependent variables.

Predictive features:

1. 'tau1' to 'tau4': the reaction time of each network participant, a
real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
2. 'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
3. 'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

Dependent variables:

1. 'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
2. 'stabf': a categorical (binary) label ('stable' or 'unstable').

Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

Split the data into an 80-20 train-test split with a random state of “1”. Use the standard scaler to transform the train set (x_train, y_train) and the test set (x_test). Use scikit learn to train a random forest and extra trees classifier. And use xgboost and lightgbm to train an extreme boosting model and a light gradient boosting model. Use random_state = 1 for training all models and evaluate on the test set. Answer the following questions:

In [102]:
#Mounting google drive to get csv file
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [103]:
#importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [104]:
#Reading the dataset and glancing at the data.
df=pd.read_csv("/content/drive/MyDrive/Hamoye/Stage-C/Data_for_UCI_named.csv")
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


#Preprocessing and cleaning the dataset

In [105]:
# checking for missing values
df.isna().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

#Question-2

In [106]:
Precision = (255/ ( 255+1380))
Recall =  (255/(255+45))
F1_Score = 2 * (Precision*Recall)/(Precision + Recall)
print(round(F1_Score,4))



0.2636


#All the modelling part questions answers

In [107]:
# Beacause of direct relationship between 'stab' and 'stabf' we are dropping this column
df = df.drop(columns = 'stab')

In [108]:
# Splitting the data into the Predictors and Labels
X = df.drop(['stabf'],axis = 1)
y = df['stabf']

In [109]:
df.stabf.value_counts()

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [110]:
#splitting the data to use for modelling
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=1)

In [111]:
#looking into data after splitting
y_train.value_counts()

unstable    5092
stable      2908
Name: stabf, dtype: int64

In [112]:
y_test.value_counts()

unstable    1288
stable       712
Name: stabf, dtype: int64

In [113]:
#as the data has a lot of varying values in different columns we will use scalers to standardize the data
#as per the given problem statement we are supposed to use StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
transformed_X_train = scaler.fit_transform(X_train)
transformed_X_train = pd.DataFrame(transformed_X_train, columns = X_train.columns)
# Transforming the X_test (feature)data
transformed_X_test = scaler.transform(X_test)
transformed_X_test = pd.DataFrame(transformed_X_test, columns = X_test.columns)

In [114]:
#now lets look at the data after scaling
transformed_X_train.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
0,0.367327,-0.986042,0.650447,1.547527,-0.29149,0.061535,1.293862,-0.845074,0.160918,0.339859,0.585568,0.492239
1,-0.064659,0.089437,1.035079,-1.641494,0.619865,-0.067235,-1.502925,0.486613,-0.293143,-1.558488,1.429649,-1.443521
2,-1.46785,1.298418,-0.502536,1.166046,-0.180521,0.490603,0.68256,-0.855302,1.39935,1.451534,-1.045743,0.492489
3,0.820081,0.52992,1.299657,-1.141975,-0.812854,-0.763632,1.521579,0.65878,-0.958319,1.361958,1.60414,0.275303
4,0.665424,-1.425627,0.3123,0.919137,-1.614296,0.760315,1.422019,0.639243,1.676895,0.69566,1.137504,-1.312575


In [115]:
transformed_X_test.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4
0,0.593951,-0.412733,1.503924,1.116943,0.403423,-1.492971,-0.785033,1.566781,-0.901007,1.167203,-1.50733,1.084726
1,0.20219,0.374416,-0.1888,-0.522268,-0.225967,-1.058483,0.420047,1.028627,-1.625721,-0.39566,1.414651,1.226011
2,-1.079044,-0.313745,-0.884634,0.01708,-0.943122,0.112653,0.801335,0.733004,1.457108,-1.438495,0.651821,-1.682168
3,-0.08312,-1.107327,0.372805,-1.708152,0.75399,-1.637972,0.403805,-0.088036,0.083322,-1.672322,-0.357714,1.055865
4,0.873921,1.438466,0.086662,1.715037,-0.15388,-0.007015,-0.197053,0.472315,0.136549,-1.469731,0.956396,-0.819727


In [116]:
# importing our classifier and fitting in the the training data
from sklearn.ensemble import RandomForestClassifier
Random_C = RandomForestClassifier(random_state=1)
Random_C.fit(transformed_X_train,y_train)

In [117]:
predict = Random_C.predict(transformed_X_test)

In [118]:
#importing all the metircs which are needed for model evaluation and to check performance
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix

print("Accuracy score {}".format(round(accuracy_score(y_test, predict), 4)))
print("Precision score for label stable %.3f" % (precision_score(y_test, predict, pos_label='stable')))
print("Recall score for label stable {}".format(round(recall_score(y_test, predict, pos_label='stable'), 4)))
print("F1 score %.3f" % (f1_score(y_test, predict, pos_label='stable')))

Accuracy score 0.929
Precision score for label stable 0.919
Recall score for label stable 0.8778
F1 score 0.898


In [119]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test=le.fit_transform(y_test)

In [120]:
#xgboost
from xgboost import XGBClassifier
extreme = XGBClassifier(random_state =1)
extreme.fit(transformed_X_train, y_train)
extreme_pred = extreme.predict(transformed_X_test)

In [121]:
#classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, extreme_pred))

              precision    recall  f1-score   support

           0       0.94      0.91      0.92       712
           1       0.95      0.97      0.96      1288

    accuracy                           0.95      2000
   macro avg       0.94      0.94      0.94      2000
weighted avg       0.95      0.95      0.95      2000



In [122]:
round(accuracy_score(y_test,extreme_pred),4)

0.9455

In [123]:
import lightgbm as lgbm
lgbm = lgbm.LGBMClassifier(random_state=1)
lgbm.fit(transformed_X_train,y_train)
lgbm_predict  = lgbm.predict(transformed_X_test)

In [124]:
round(accuracy_score(y_test, lgbm_predict),4)

0.9395

In [125]:
from sklearn.ensemble import ExtraTreesClassifier

Tree_CLass = ExtraTreesClassifier (random_state = 1)

We are using given parameters as per the instructions

In [126]:
n_estimators = [50, 100, 300, 500, 1000]

min_samples_split = [2, 3, 5, 7, 9]

min_samples_leaf = [1, 2, 4, 6, 8]

max_features = ['auto', 'sqrt', 'log2', None]

hyperparameter_grid = {'n_estimators': n_estimators,

                       'min_samples_leaf': min_samples_leaf,

                       'min_samples_split': min_samples_split,

                       'max_features': max_features}

In [127]:
from sklearn.model_selection import RandomizedSearchCV

In [128]:
Rand_search = RandomizedSearchCV(estimator = Tree_CLass, param_distributions= hyperparameter_grid, random_state=1,cv = 5, n_iter=10,scoring='accuracy',n_jobs=1, verbose=1)

In [129]:
search = Rand_search.fit(transformed_X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [131]:
search.best_params_

{'n_estimators': 1000,
 'min_samples_split': 2,
 'min_samples_leaf': 8,
 'max_features': None}

In [130]:
best_Tree_Class = ExtraTreesClassifier(n_estimators=1000, min_samples_split=2,
                                 min_samples_leaf=8, max_features=None)
best_Tree_Class.fit(transformed_X_train, y_train)
best_Tree_Class = best_Tree_Class.predict(transformed_X_test)

In [132]:
print(classification_report(y_test,best_Tree_Class, digits=4))
print('\n')
print("Accuracy score {}".format(accuracy_score(y_test, best_Tree_Class)))

              precision    recall  f1-score   support

           0     0.9214    0.8722    0.8961       712
           1     0.9314    0.9589    0.9449      1288

    accuracy                         0.9280      2000
   macro avg     0.9264    0.9155    0.9205      2000
weighted avg     0.9278    0.9280    0.9275      2000



Accuracy score 0.928


In [133]:
# comapring the result we got with original tree classifier without tuning

Tree_CLass.fit(transformed_X_train,y_train)
Tree_predict = Tree_CLass.predict(transformed_X_test)

print(classification_report(y_test,Tree_predict))

              precision    recall  f1-score   support

           0       0.94      0.85      0.89       712
           1       0.92      0.97      0.95      1288

    accuracy                           0.93      2000
   macro avg       0.93      0.91      0.92      2000
weighted avg       0.93      0.93      0.93      2000



In [134]:
#feature importance
feature_importance = search.best_estimator_.feature_importances_
print ('Feature Importance :\n', feature_importance)

Feature Importance :
 [0.13723975 0.1405075  0.13468029 0.13541676 0.00368342 0.00533686
 0.00542927 0.00496249 0.10256244 0.10757765 0.11306268 0.10954089]


In [136]:
sorted (zip(feature_importance,X), reverse = True)

[(0.14050750384993677, 'tau2'),
 (0.13723974766109256, 'tau1'),
 (0.1354167630909727, 'tau4'),
 (0.13468028520386593, 'tau3'),
 (0.11306267999167334, 'g3'),
 (0.10954089174337298, 'g4'),
 (0.10757764577478764, 'g2'),
 (0.10256244080927947, 'g1'),
 (0.005429268421191957, 'p3'),
 (0.005336864710946151, 'p2'),
 (0.004962486591192238, 'p4'),
 (0.003683422151688322, 'p1')]