<h3>Electrical grid stability stimulated data</h3>

Stability of the Grid System
Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy source, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.
It has 12 primary predictive features and two dependent variables.



In [1]:
import numpy as np #linear algebra
import pandas as pd #data preprocessing
import matplotlib.pyplot as plt #graphical representation
import warnings
warnings.filterwarnings('ignore')# this to remove warnings

In [2]:
df= pd.read_csv(r'C:\Users\marry\Downloads\Electricity.csv', low_memory= False)#calling out the dataset

In [3]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [4]:
df.shape

(10000, 14)

In [5]:
df.isnull().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [6]:
#from the decription of the dataset, we actually have two target variables(stab and stabf)
#stab is a numerical variable while stabf is a binary class, 
#therefore we drop stab in order for the model to predict the binary class
df.drop(columns = 'stab', axis =1,  inplace=True)

In [7]:
df.head()#now we check the head of the variable to see if it has being drop

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,unstable


In [8]:
df['stabf'].value_counts()# we can do a value count for stabf column to view the number of stable and unstable

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [9]:
#now i will like to select the columns i will be working with as my feature columns and target columns
x= df.drop(columns = 'stabf')
y= df['stabf']

In [10]:
#Splitting the data into training and testing sets using sklearn module
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

In [11]:
y_test.unique()#this shows thst the predicted value is an object tye

array(['unstable', 'stable'], dtype=object)

In [12]:
#scaling the data using standardscaler. this is process of removing the mean from each column value divided by the std
#note!! when using Standardscaler no need for normalization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [13]:
#fitting on the training data
#notice, am fitting the x_traning data
scaler=StandardScaler().fit(x_train)

In [14]:
#transform both the testing and training  data
x_trainscaled=scaler.transform(x_train)
x_testscaled=scaler.transform(x_test)

In [15]:
#passing the scaled data into a pandas dataframe
x_trainscaled1=pd.DataFrame(x_trainscaled, columns=x_train.columns)
x_testscaled1=pd.DataFrame(x_testscaled, columns=x_test.columns)

#### RandomForestClassifier

In [16]:
#now for the question 
#import the convenntion libaries for Randonforestclassifier
from sklearn.ensemble import RandomForestClassifier
rfc= RandomForestClassifier( random_state=1)
rfc.fit(x_trainscaled1, y_train)
rfc_pred=rfc.predict(x_testscaled1)

In [17]:
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix
cnf_mat = confusion_matrix(y_test, rfc_pred, labels=['unstable', 'stable'])
cnf_mat

array([[1233,   55],
       [  87,  625]], dtype=int64)

In [18]:
print("Accuracy score {}".format(round(accuracy_score(y_test, rfc_pred), 4)))
print("Precision score for label stable %.4f" % (precision_score(y_test, rfc_pred, pos_label='stable')))
print("Recall score for label stable {}".format(round(recall_score(y_test, rfc_pred, pos_label='stable'), 4)))
print("F1 score %.4f" % (f1_score(y_test, rfc_pred, pos_label='stable')))


Accuracy score 0.929
Precision score for label stable 0.9191
Recall score for label stable 0.8778
F1 score 0.8980


#### ExtraTreesClassifier

In [19]:
#extra tree classifier
from sklearn.ensemble import ExtraTreesClassifier
etc = ExtraTreesClassifier()
etc.fit(x_trainscaled1, y_train)
etc_pred = etc.predict(x_testscaled1)

In [20]:
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix
etc_pred = etc.predict(x_testscaled1)
cnf_etc = confusion_matrix(y_test, etc_pred)
cnf_etc

array([[ 596,  116],
       [  35, 1253]], dtype=int64)

In [21]:
print("Accuracy score {}".format(round(accuracy_score(y_test, etc_pred), 4)))
print("Precision score for label stable %.4f" % (precision_score(y_test, etc_pred, pos_label='stable')))
print("Recall score for label stable {}".format(round(recall_score(y_test, etc_pred, pos_label='stable'), 4)))
print("F1 score %.4f" % (f1_score(y_test, etc_pred, pos_label='stable')))


Accuracy score 0.9245
Precision score for label stable 0.9445
Recall score for label stable 0.8371
F1 score 0.8876


#### XGBClassifier

In [22]:
#xgboost for gradient boosting
from xgboost import XGBClassifier
xgb = XGBClassifier(random_state=1)
xgb.fit(x_trainscaled1, y_train)
xgb_pred = xgb.predict(x_testscaled1)

In [23]:
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix
xgb_pred = xgb.predict(x_testscaled1)
cnf_xgb = confusion_matrix(y_test, xgb_pred)
cnf_xgb

array([[ 648,   64],
       [  45, 1243]], dtype=int64)

In [24]:
print("Accuracy score {}".format(round(accuracy_score(y_test, xgb_pred), 4)))
print("Precision score for label stable %.4f" % (precision_score(y_test, xgb_pred, pos_label='stable')))
print("Recall score for label stable {}".format(round(recall_score(y_test, xgb_pred, pos_label='stable'), 4)))
print("F1 score %.4f" % (f1_score(y_test, xgb_pred, pos_label='stable')))

Accuracy score 0.9455
Precision score for label stable 0.9351
Recall score for label stable 0.9101
F1 score 0.9224


In [25]:
#light using gradient boosting
from lightgbm import LGBMClassifier
lgb= LGBMClassifier()
lgb.fit(x_trainscaled1, y_train)
lgb_pred = lgb.predict(x_testscaled1)

In [26]:
from sklearn.metrics import recall_score, accuracy_score, precision_score, f1_score, confusion_matrix
lgb_pred = lgb.predict(x_testscaled1)
cnf_lgb = confusion_matrix(y_test, lgb_pred)
cnf_lgb

array([[ 635,   77],
       [  48, 1240]], dtype=int64)

In [27]:
print("Accuracy score {}".format(round(accuracy_score(y_test, xgb_pred), 4)))
print("Precision score for label stable %.4f" % (precision_score(y_test, xgb_pred, pos_label='stable')))
print("Recall score for label stable {}".format(round(recall_score(y_test, xgb_pred, pos_label='stable'), 4)))
print("F1 score %.4f" % (f1_score(y_test, xgb_pred, pos_label='stable')))

Accuracy score 0.9455
Precision score for label stable 0.9351
Recall score for label stable 0.9101
F1 score 0.9224


In [None]:
#To improve the Extra Trees Classifier, you will use the following parameters 

In [28]:
n_estimators = [50, 100, 300, 500, 1000]
min_samples_split = [2, 3, 5, 7, 9]
min_samples_leaf = [1, 2, 4, 6, 8]
max_features = ['auto', 'sqrt', 'log2', None] 
hyperparameter_grid = {'n_estimators': n_estimators,
                       'min_samples_leaf': min_samples_leaf,
                       'min_samples_split': min_samples_split,
                       'max_features': max_features}

In [29]:
from sklearn.model_selection import RandomizedSearchCV
rsv = RandomizedSearchCV(ExtraTreesClassifier(),hyperparameter_grid, cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1, random_state = 1)
search = rsv.fit(x_trainscaled1, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   52.5s
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:   58.8s finished


In [30]:
#checking for the best parameter for the model
search.best_params_.values()

dict_values([1000, 2, 8, None])

#### some of the theory questions

1. ``Which of the following is not an Ensemble model?``

    Decision Tree



2. ``Why do we use weak learners in boosting?``

    To make the algorithm stronger


3. ``A data scientist is evaluating different binary classification models. A false positive result is 5 times more expensive (from a business perspective) than a false negative result. The models should be evaluated based on the following criteria:
1) Must have a recall rate of at least 80%
2) Must have a false positive rate of 10% or less
3) Must minimize business costs
After creating each binary classification model, the data scientist generates the corresponding confusion matrix. Which confusion matrix represents the model that satisfies the requirements?``

TN = 98%, FP = 2%, FN = 18%, TP = 82%
