# <font color =blue> UCI Electrical Grid Stability Simulated dataset

## Stability of the Grid System
Electrical grids require a balance between electricity supply and demand in order to be stable. Conventional systems achieve this balance through demand-driven electricity production. For future grids with a high share of inflexible (i.e., renewable) energy source, the concept of demand response is a promising solution. This implies changes in electricity consumption in relation to electricity price changes. In this work, we’ll build a binary classification model to predict if a grid is stable or unstable using the UCI Electrical Grid Stability Simulated dataset.

### ABOUT DATA

It has 12 primary predictive features and two dependent variables.

### Predictive features:

'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

### Dependent variables:

'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
'stabf': a categorical (binary) label ('stable' or 'unstable').

Dataset: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

In [12]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler


In [13]:
df= pd.read_csv(r"D:\Hamoye stage C\UCI Electrical Grid Stability Simulated dataset\Data_for_UCI_named.csv")

In [14]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,0.04986,unstable


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [16]:
df.isnull().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

## <font color =blue>  No missing values in the dataset

**Drop Stab column as it is direct correlated with 'stabf'**

In [17]:
df = df.drop('stab',axis=1)

In [18]:
df.head()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stabf
0,2.95906,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.78176,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.27721,-0.920492,0.163041,0.766689,0.839444,0.109853,unstable
3,0.716415,7.6696,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.79711,0.45545,0.656947,0.820923,unstable


In [19]:
#check distribution of target variable

df['stabf'].value_counts()

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [20]:
# #encode the categorical values
# label = LabelEncoder()
# df['stabf'] = label.fit(df['stabf']).transform(df['stabf'])
# df.head()

In [21]:
X= df.drop(columns=['stabf'])
y= df['stabf']

In [22]:
#split the data into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size= 0.2 , random_state= 1 )
print('x_train sahpe :{}'.format(x_train.shape))
print('y_train sahpe :{}'.format(y_train.shape))
print('x_test sahpe :{}'.format(x_test.shape))
print('y_test sahpe :{}'.format(y_test.shape))

x_train sahpe :(8000, 12)
y_train sahpe :(8000,)
x_test sahpe :(2000, 12)
y_test sahpe :(2000,)


In [23]:
#transform train and test set using standard scaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train,y_train)
x_test_scaled = scaler.transform(x_test)


In [24]:
# put the scaled sets into a daataframe

x_train_scaled = pd.DataFrame(x_train_scaled, columns = x_train.columns)
x_test_scaled = pd.DataFrame(x_test_scaled, columns = x_test.columns)

# <font color = Red> *Train model using RandomForestClassifier*

In [25]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state = 1)

# fit the train set 
rf.fit(x_train_scaled,y_train)

RandomForestClassifier(random_state=1)

In [26]:
# predict 

rf_pred = rf.predict(x_test_scaled)
rf_pred

array(['unstable', 'unstable', 'stable', ..., 'stable', 'stable',
       'unstable'], dtype=object)

### ***Measuring Model Performance for RandomForestClassifier***

### <font color = blue> *Accuracy of RandomForestClassifier*

In [27]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test,rf_pred)

print( 'Accuracy of RandomForestClassifier : {:.4f}' .format(accuracy))

Accuracy of RandomForestClassifier : 0.9290


### <font color = blue> *Precision of RandomForestClassifier*

In [28]:
from sklearn.metrics import recall_score, precision_score, f1_score, confusion_matrix, classification_report

#precision
precision = precision_score(y_test, rf_pred, pos_label='stable')
print('Precision of RandomForestClassifier: {:.2f}'.format((precision*100)))  


Precision of RandomForestClassifier: 91.91


### <font color = blue> *Recall for  RandomForestClassifier*

In [29]:
recall = recall_score(y_test, rf_pred, pos_label='stable')
print('Recall for  RandomForestClassifier: {}'.format(round(recall*100), 2))

Recall for  RandomForestClassifier: 88


### <font color = blue> *F1 score of RandomForestClassifier*

In [30]:
f1 = f1_score(y_test, rf_pred, pos_label='stable')
print('F1: {}'.format(round(f1*100), 2))

F1: 90


### <font color = blue> *classification Report  for RandomForestClassifier*

In [31]:
print('Classification Report:\n', classification_report(y_test,rf_pred, digits =4))

Classification Report:
               precision    recall  f1-score   support

      stable     0.9191    0.8778    0.8980       712
    unstable     0.9341    0.9573    0.9456      1288

    accuracy                         0.9290      2000
   macro avg     0.9266    0.9176    0.9218      2000
weighted avg     0.9288    0.9290    0.9286      2000



### <font color = blue> *confusion matrix for RandomForestClassifier*

In [32]:
#confusion matrix
rf_cnf_mat = confusion_matrix(y_test, rf_pred, labels=['unstable', 'stable'])
print('Confusion Matrix:\n', rf_cnf_mat)

Confusion Matrix:
 [[1233   55]
 [  87  625]]


### <font color = blue> *Training set score & Test set score RandomForestClassifier*

In [33]:
print("Training set score: {:.3f}".format(rf.score(x_train_scaled, y_train)))
print("Test set score: {:.3f}".format(rf.score(x_test_scaled, y_test)))

Training set score: 1.000
Test set score: 0.929


# <font color = Red> *Train model using XGBoost*

In [34]:
import warnings

from xgboost import XGBClassifier

xgb = XGBClassifier(random_state = 1)

# fit on train set
xgb.fit(x_train_scaled,y_train)





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=4, num_parallel_tree=1, random_state=1,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [35]:
# predict the model
xgb_pred= xgb.predict(x_test_scaled)

In [37]:
### ***Measuring Model Performance for XGBoost***

In [38]:
### Accuracy of XGBoost*

accuracy = accuracy_score(y_test,xgb_pred)
print( 'Accuracy of XGBoost: {:.4f}' .format(accuracy))

## Precision of ExtraTreesClassifier*

precision = precision_score(y_test, xgb_pred, pos_label='stable')
print('Precision of XGBoost: {}'.format(round(precision*100), 2))  


### <font color = blue> *Recall for ExtraTreesClassifier*

recall = recall_score(y_test, xgb_pred, pos_label='stable')
print('Recall for  XGBoost: {}'.format(round(recall*100), 2))

### <font color = blue> *F1 score of ExtraTreesClassifier*

f1 = f1_score(y_test, xgb_pred, pos_label='stable')
print('F1: {}'.format(round(f1*100), 2))

### <font color = blue> *classification Report  for ExtraTreesClassifier*

print('Classification Report:\n', classification_report(y_test,xgb_pred, digits =4))

### <font color = blue> *confusion matrix for ExtraTreesClassifier*

etc_cnf_mat = confusion_matrix(y_test, xgb_pred, labels=['unstable', 'stable'])
print('Confusion Matrix:\n', etc_cnf_mat)




Accuracy of XGBoost: 0.9455
Precision of XGBoost: 94
Recall for  XGBoost: 91
F1: 92
Classification Report:
               precision    recall  f1-score   support

      stable     0.9351    0.9101    0.9224       712
    unstable     0.9510    0.9651    0.9580      1288

    accuracy                         0.9455      2000
   macro avg     0.9430    0.9376    0.9402      2000
weighted avg     0.9453    0.9455    0.9453      2000

Confusion Matrix:
 [[1243   45]
 [  64  648]]


In [36]:
### <font color = blue> *Training set score & Test set score ExtraTreesClassifier*
print("Training set score: {:.3f}".format(xgb.score(x_train_scaled, y_train)))
print("Test set score: {:.3f}".format(xgb.score(x_test_scaled, y_test)))

Training set score: 1.000
Test set score: 0.946


# <font color = Red> *Train model using LightGBM*

In [119]:
from lightgbm import LGBMClassifier

lgbm= LGBMClassifier(random_state = 1)

#fit on train set
lgbm.fit(x_train_scaled, y_train)

LGBMClassifier(random_state=1)

In [120]:
# predict the model
lgbm_pred= lgbm.predict(x_test_scaled)

### ***Measuring Model Performance for LightGBM***

In [148]:


### <font color = blue> *Accuracy of LightGBM*

accuracy = accuracy_score(y_test,lgbm_pred)
print( 'Accuracy of LightGBM: {:.4f}' .format(accuracy))

### <font color = blue> *Precision of ExtraTreesClassifier*

precision = precision_score(y_test, lgbm_pred, pos_label='stable')
print('Precision of LightGBM: {}'.format(round(precision*100), 2))  


### <font color = blue> *Recall for ExtraTreesClassifier*

recall = recall_score(y_test, lgbm_pred, pos_label='stable')
print('Recall for  LightGBM: {}'.format(round(recall*100), 2))

### <font color = blue> *F1 score of ExtraTreesClassifier*

f1 = f1_score(y_test, lgbm_pred, pos_label='stable')
print('F1: {}'.format(round(f1*100), 2))

### <font color = blue> *classification Report  for ExtraTreesClassifier*

print('Classification Report:\n', classification_report(y_test,xgb_pred, digits =4))

### <font color = blue> *confusion matrix for ExtraTreesClassifier*

etc_cnf_mat = confusion_matrix(y_test, lgbm_pred, labels=['unstable', 'stable'])
print('Confusion Matrix:\n', etc_cnf_mat)

### <font color = blue> *Training set score & Test set score ExtraTreesClassifier*

print("Training set score: {:.3f}".format(lgbm.score(x_train_scaled, y_train)))
print("Test set score: {:.3f}".format(lgbm.score(x_test_scaled, y_test)))

Accuracy of LightGBM: 0.9280
Precision of LightGBM: 94
Recall for  LightGBM: 85
F1: 89
Classification Report:
               precision    recall  f1-score   support

      stable     0.9351    0.9101    0.9224       712
    unstable     0.9510    0.9651    0.9580      1288

    accuracy                         0.9455      2000
   macro avg     0.9430    0.9376    0.9402      2000
weighted avg     0.9453    0.9455    0.9453      2000

Confusion Matrix:
 [[1250   38]
 [ 106  606]]
Training set score: 0.998
Test set score: 0.940


# <font color = Red> *Train model using extra trees classifier*

In [39]:
from sklearn.ensemble import ExtraTreesClassifier
etc = ExtraTreesClassifier(random_state = 1)

#fit on the trining set
etc.fit(x_train_scaled,y_train)

etc_pred = etc.predict(x_test_scaled)

### ***Measuring Model Performance for ExtraTreesClassifier***

### <font color = blue> *Accuracy of ExtraTreesClassifier*

In [40]:

### <font color = blue> *Accuracy of ExtraTreesClassifier*

accuracy = accuracy_score(y_test,etc_pred)
print( 'Accuracy of ExtraTreesClassifier: {:.4f}' .format(accuracy))

### <font color = blue> *Precision of ExtraTreesClassifier*

precision = precision_score(y_test, etc_pred, pos_label='stable')
print('Precision of ExtraTreesClassifierM: {}'.format(round(precision*100), 2))  


### <font color = blue> *Recall for ExtraTreesClassifier*

recall = recall_score(y_test, etc_pred, pos_label='stable')
print('Recall for  ExtraTreesClassifier: {}'.format(round(recall*100), 2))

### <font color = blue> *F1 score of ExtraTreesClassifier*

f1 = f1_score(y_test, etc_pred, pos_label='stable')
print('F1: {}'.format(round(f1*100), 2))

### <font color = blue> *classification Report  for ExtraTreesClassifier*

print('Classification Report:\n', classification_report(y_test,xgb_pred, digits =4))

### <font color = blue> *confusion matrix for ExtraTreesClassifier*

etc_cnf_mat = confusion_matrix(y_test, etc_pred, labels=['unstable', 'stable'])
print('Confusion Matrix:\n', etc_cnf_mat)

### <font color = blue> *Training set score & Test set score ExtraTreesClassifier*

print("Training set score: {:.3f}".format(etc.score(x_train_scaled, y_train)))
print("Test set score: {:.3f}".format(etc.score(x_test_scaled, y_test)))

Accuracy of ExtraTreesClassifier: 0.9280
Precision of ExtraTreesClassifierM: 94
Recall for  ExtraTreesClassifier: 85
F1: 89
Classification Report:
               precision    recall  f1-score   support

      stable     0.9351    0.9101    0.9224       712
    unstable     0.9510    0.9651    0.9580      1288

    accuracy                         0.9455      2000
   macro avg     0.9430    0.9376    0.9402      2000
weighted avg     0.9453    0.9455    0.9453      2000

Confusion Matrix:
 [[1250   38]
 [ 106  606]]
Training set score: 1.000
Test set score: 0.928


# <font color = Red> *Improving ExtraTreesClassifier*

## <font color = Red> Question 17

### <font color = Red>To improve the Extra Trees Classifier, you will use the following parameters (number of estimators, minimum number of samples, minimum number of samples for leaf node and the number of features to consider when looking for the best split) for the hyperparameter grid needed to run a Randomized Cross Validation Search (RandomizedSearchCV).

### <font color = Red> n_estimators = [50, 100, 300, 500, 1000]

### <font color = Red>min_samples_split = [2, 3, 5, 7, 9]

### <font color = Red>min_samples_leaf = [1, 2, 4, 6, 8]

### <font color = Red>max_features = ['auto', 'sqrt', 'log2', None]

### <font color = Red>hyperparameter_grid = {'n_estimators': n_estimators,

### <font color = Red>'min_samples_leaf': min_samples_leaf,

### <font color = Red>'min_samples_split': min_samples_split,

### <font color = Red>'max_features': max_features}

### <font color = Red>Using the ExtraTreesClassifier as your estimator with cv=5, n_iter=10, scoring = 'accuracy', n_jobs = -1, verbose = 1 and random_state = 1. What are the best hyperparameters from the randomized search CV?</font>

In [41]:
#combination of hyperparameters
n_estimators = [50, 100, 300, 500, 1000]

min_samples_split = [2, 3, 5, 7, 9]

min_samples_leaf = [1, 2, 4, 6, 8]

max_features = ['auto', 'sqrt', 'log2', None] 

hyperparameter_grid = {'n_estimators': n_estimators,

                       'min_samples_leaf': min_samples_leaf,

                       'min_samples_split': min_samples_split,

                       'max_features': max_features}

In [42]:
from sklearn.model_selection import RandomizedSearchCV

# Random Search for hyper-parameter optimization
randomscv = RandomizedSearchCV(estimator = etc, param_distributions = hyperparameter_grid, 
                          cv = 5, n_iter = 10, scoring = 'accuracy', 
                          n_jobs = -1, verbose = 1, random_state = 1)


In [43]:

#fit on the training data
search = randomscv.fit(x_train_scaled, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


In [48]:
# best hyperparameters from the randomized search CV

print('best hyperparameters from the randomized search CV: {}'.format(randomscv.best_params_))

best hyperparameters from the randomized search CV: {'n_estimators': 1000, 'min_samples_split': 2, 'min_samples_leaf': 8, 'max_features': None}


### <font color = Red > Answer for Question 17 : best hyperparameters from the randomized search CV: 
### <font color = Blue>{'n_estimators': 1000, 'min_samples_split': 2, 'min_samples_leaf': 8, 'max_features': None}

In [126]:
#get best score
search.best_score_

0.9241249999999999

In [54]:
#Evaluate ExtraTreesClassifier on test set using  best params
extra_etc = ExtraTreesClassifier(max_features = None, 
                            min_samples_leaf= 8,
                            min_samples_split= 2,
                            n_estimators= 1000, 
                            random_state = 1)

#fit on train set
extra_etc.fit(x_train_scaled, y_train)

ExtraTreesClassifier(max_features=None, min_samples_leaf=8, n_estimators=1000,
                     random_state=1)

In [55]:
#predict on test set
extra_etc_pred = extra_etc.predict(x_test_scaled)

### ***Measuring Model Performance for ExtraTreesClassifier***

### <font color = blue> *Accuracy of ExtraTreesClassifier*

In [56]:


accuracy = accuracy_score(y_test,extra_etc_pred)
print( 'Accuracy of extra_etc: {:.4f}' .format((accuracy)))

# ### <font color = blue> *Precision of ExtraTreesClassifier*

# # precision = precision_score(y_test, extra_etc_pred, pos_label='stable')
# # print('Precision of ExtraTreesClassifier: {}'.format(round(precision*100), 2))  
 


# ### <font color = blue> *Recall for ExtraTreesClassifier*

# recall = recall_score(y_test, extra_etc_pred, pos_label='stable')
# print('Recall for  ExtraTreesClassifier: {}'.format(round(recall*100), 2))

# ### <font color = blue> *F1 score of ExtraTreesClassifier*

# f1 = f1_score(y_test, extra_etc_pred, pos_label='stable')
# print('F1: {}'.format(round(f1*100), 2))

# ### <font color = blue> *classification Report  for ExtraTreesClassifier*

# print('Classification Report:\n', classification_report(y_test,extra_etc_pred, digits =4))

# ### <font color = blue> *confusion matrix for ExtraTreesClassifier*

# etc_cnf_mat = confusion_matrix(y_test, extra_etc_pred, labels=['unstable', 'stable'])
# print('Confusion Matrix:\n', etc_cnf_mat)

# ### <font color = blue> *Training set score & Test set score ExtraTreesClassifier*

# print("Training set score: {:.3f}".format(extra_etc.score(x_train_scaled, y_train)))
# print("Test set score: {:.3f}".format(extra_etc.score(x_test_scaled, y_test)))

Accuracy of extra_etc: 0.9270


## <font color = Red> Question 18
## <font color = Red> Train a new ExtraTreesClassifier Model with the new Hyperparameters from the RandomizedSearchCV (with random_state = 1). Is the accuracy of the new optimal model higher or lower than the initial ExtraTreesClassifier model with no hyperparameter tuning?
    
### <font color = Blue> Answer : Accuracy of extra_etc with hyper tuning : 0.9270 < Accuracy of ExtraTreesClassifier: 0.9280

## <font color = Red> Question 20

### <font color = Red> Find the feature importance using the optimal ExtraTreesClassifier model. Which features are the most and least important respectively?

In [202]:
feature = X.columns

# features importance
feat_importance = pd.DataFrame(etc.feature_importances_,index=feature)
feat = feat_importance.sort_values(0)
feat

Unnamed: 0,0
p1,0.039507
p2,0.040371
p4,0.040579
p3,0.040706
g1,0.089783
g2,0.093676
g4,0.094019
g3,0.096883
tau3,0.113169
tau4,0.115466


In [203]:
# most important feature
print('most important feature: {}'.format(feat.idxmax()))

# least important feature
print('least important feature: {}'.format(feat.idxmin()))

most important feature: 0    tau2
dtype: object
least important feature: 0    p1
dtype: object


### <font color = Red> Answer for Question20  : most important feature:    tau2
 
### <font color = Red> least important feature:    p1

## <font color = Red> Question 1
## <font color = Red>You are working on a spam classification system using regularized logistic regression. “Spam” is a positive class (y = 1) and “not spam” is the negative class (y = 0). You have trained your classifier and there are n = 2000 examples in the test set. The confusion matrix of predicted class vs. actual class is:</font>



In [3]:
# Show the correlation matrix
corr_data = [[355, 1480], [45, 120]]
corr_matrix = pd.DataFrame(corr_data, index=['Predicted Spam', 'Predicted Not Spam'],columns=['Actual Spam', 'Actual Not Spam'])

corr_matrix

Unnamed: 0,Actual Spam,Actual Not Spam
Predicted Spam,355,1480
Predicted Not Spam,45,120


What is the F1 score of this classifier?

In [6]:

#calculate f1_score
tp = 355
tn = 120
fp = 1480
fn = 45

precision = tp / (tp + fp)
recall= tp / (tp + fn)
f1_score=  2*((precision* recall)/(precision+recall))
round(f1_score,4)

0.3177

## <font color = Red> Question 14

## <font color = Red> What is the accuracy on the test set using the random forest classifier? In 4 decimal places.</font>

In [57]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(random_state = 1)

# fit the train set 
rf.fit(x_train_scaled,y_train)

rf_pred = rf.predict(x_test_scaled)
accuracy = accuracy_score(y_test,rf_pred)

print( 'Accuracy of ExtraTreesClassifier: {:.4f}' .format((accuracy)))

Accuracy of ExtraTreesClassifier: 0.9290


## <font color = Red> Question 15

## <font color = Red> What is the accuracy on the test set using the xgboost classifier? In 4 decimal places..</font>

In [58]:
import warnings

from xgboost import XGBClassifier

xgb = XGBClassifier(random_state = 1)

# fit on train set
xgb.fit(x_train_scaled,y_train)
xgb_pred = xgb.predict(x_test_scaled)
accuracy = accuracy_score(y_test,xgb_pred)
print( 'Accuracy of ExtraTreesClassifier: {:.4f}' .format((accuracy)))





Accuracy of ExtraTreesClassifier: 0.9455


## <font color = Red> Question 16

## <font color = Red> What is the accuracy on the test set using the LGBM classifier? In 4 decimal places.</font>



In [60]:
from lightgbm import LGBMClassifier

lgbm= LGBMClassifier(random_state = 1)

#fit on train set
lgbm.fit(x_train_scaled, y_train)
lgbm_pred = etc.predict(x_test_scaled)
accuracy = accuracy_score(y_test,lgbm_pred)

print( 'Accuracy of ExtraTreesClassifier: {:.4f}' .format((accuracy)))

Accuracy of ExtraTreesClassifier: 0.9280
