<a href="https://colab.research.google.com/github/MadeaRiggs/Classification-for-Machine-Learning/blob/main/Electrical_grids.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predictive features:

*   'tau1' to 'tau4': the reaction time of each network participant, a real value within the range 0.5 to 10 ('tau1' corresponds to the supplier node, 'tau2' to 'tau4' to the consumer nodes);
*   'p1' to 'p4': nominal power produced (positive) or consumed (negative) by each network participant, a real value within the range -2.0 to -0.5 for consumers ('p2' to 'p4'). As the total power consumed equals the total power generated, p1 (supplier node) = - (p2 + p3 + p4);
* 'g1' to 'g4': price elasticity coefficient for each network participant, a real value within the range 0.05 to 1.00 ('g1' corresponds to the supplier node, 'g2' to 'g4' to the consumer nodes; 'g' stands for 'gamma');

## Dependent variables:

* 'stab': the maximum real part of the characteristic differential equation root (if positive, the system is linearly unstable; if negative, linearly stable);
* 'stabf': a categorical (binary) label ('stable' or 'unstable').
Because of the direct relationship between 'stab' and 'stabf' ('stabf' = 'stable' if 'stab' <= 0, 'unstable' otherwise), 'stab' should be dropped and 'stabf' will remain as the sole dependent variable (binary classification).

In [164]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [165]:
electrical_grid= pd.read_csv('/content/electrical grid data.csv')
electrical_grid

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab,stabf
0,2.959060,3.079885,8.381025,9.780754,3.763085,-0.782604,-1.257395,-1.723086,0.650456,0.859578,0.887445,0.958034,0.055347,unstable
1,9.304097,4.902524,3.047541,1.369357,5.067812,-1.940058,-1.872742,-1.255012,0.413441,0.862414,0.562139,0.781760,-0.005957,stable
2,8.971707,8.848428,3.046479,1.214518,3.405158,-1.207456,-1.277210,-0.920492,0.163041,0.766689,0.839444,0.109853,0.003471,unstable
3,0.716415,7.669600,4.486641,2.340563,3.963791,-1.027473,-1.938944,-0.997374,0.446209,0.976744,0.929381,0.362718,0.028871,unstable
4,3.134112,7.608772,4.943759,9.857573,3.525811,-1.125531,-1.845975,-0.554305,0.797110,0.455450,0.656947,0.820923,0.049860,unstable
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,2.930406,9.487627,2.376523,6.187797,3.343416,-0.658054,-1.449106,-1.236256,0.601709,0.779642,0.813512,0.608385,0.023892,unstable
9996,3.392299,1.274827,2.954947,6.894759,4.349512,-1.663661,-0.952437,-1.733414,0.502079,0.567242,0.285880,0.366120,-0.025803,stable
9997,2.364034,2.842030,8.776391,1.008906,4.299976,-1.380719,-0.943884,-1.975373,0.487838,0.986505,0.149286,0.145984,-0.031810,stable
9998,9.631511,3.994398,2.757071,7.821347,2.514755,-0.966330,-0.649915,-0.898510,0.365246,0.587558,0.889118,0.818391,0.037789,unstable


In [166]:
#checking distribution of dependent variables
electrical_grid['stab'].value_counts()
electrical_grid['stabf'].value_counts()

unstable    6380
stable      3620
Name: stabf, dtype: int64

In [167]:
#checking for null values
electrical_grid.isna().sum()

tau1     0
tau2     0
tau3     0
tau4     0
p1       0
p2       0
p3       0
p4       0
g1       0
g2       0
g3       0
g4       0
stab     0
stabf    0
dtype: int64

In [168]:
electrical_grid.describe()

Unnamed: 0,tau1,tau2,tau3,tau4,p1,p2,p3,p4,g1,g2,g3,g4,stab
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5.25,5.250001,5.250004,5.249997,3.75,-1.25,-1.25,-1.25,0.525,0.525,0.525,0.525,0.015731
std,2.742548,2.742549,2.742549,2.742556,0.75216,0.433035,0.433035,0.433035,0.274256,0.274255,0.274255,0.274255,0.036919
min,0.500793,0.500141,0.500788,0.500473,1.58259,-1.999891,-1.999945,-1.999926,0.050009,0.050053,0.050054,0.050028,-0.08076
25%,2.874892,2.87514,2.875522,2.87495,3.2183,-1.624901,-1.625025,-1.62496,0.287521,0.287552,0.287514,0.287494,-0.015557
50%,5.250004,5.249981,5.249979,5.249734,3.751025,-1.249966,-1.249974,-1.250007,0.525009,0.525003,0.525015,0.525002,0.017142
75%,7.62469,7.624893,7.624948,7.624838,4.28242,-0.874977,-0.875043,-0.875065,0.762435,0.76249,0.76244,0.762433,0.044878
max,9.999469,9.999837,9.99945,9.999443,5.864418,-0.500108,-0.500072,-0.500025,0.999937,0.999944,0.999982,0.99993,0.109403


In [169]:
electrical_grid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tau1    10000 non-null  float64
 1   tau2    10000 non-null  float64
 2   tau3    10000 non-null  float64
 3   tau4    10000 non-null  float64
 4   p1      10000 non-null  float64
 5   p2      10000 non-null  float64
 6   p3      10000 non-null  float64
 7   p4      10000 non-null  float64
 8   g1      10000 non-null  float64
 9   g2      10000 non-null  float64
 10  g3      10000 non-null  float64
 11  g4      10000 non-null  float64
 12  stab    10000 non-null  float64
 13  stabf   10000 non-null  object 
dtypes: float64(13), object(1)
memory usage: 1.1+ MB


In [170]:
from sklearn.preprocessing import LabelEncoder
encoder= LabelEncoder()
electrical_grid['stabf']= encoder.fit_transform(electrical_grid['stabf'])
electrical_grid['stabf']

0       1
1       0
2       1
3       1
4       1
       ..
9995    1
9996    0
9997    0
9998    1
9999    1
Name: stabf, Length: 10000, dtype: int64

In [171]:
#splitting dataset for training
from sklearn.model_selection import train_test_split
X= electrical_grid.drop(columns= ['stab', 'stabf'])
y= electrical_grid['stabf']

x_train, x_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=1)

In [172]:
#since the dependent variable is imbalanced
from imblearn.over_sampling import SMOTE
smote = SMOTE()

#resample the data
X_resampled, y_resampled = smote.fit_resample(X, y)

#check the class distribution after balancing
print("After balancing:", y_resampled.value_counts())


After balancing: 1    6380
0    6380
Name: stabf, dtype: int64


In [173]:
#standardize values using StandardScaler
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

#fit and transform the training set
x_train_scaled = scaler.fit_transform(x_train)

#transform the test set
x_test_scaled = scaler.transform(x_test)


In [174]:
x_train.shape

(8000, 12)

In [175]:
y_train.shape

(8000,)

In [176]:
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.metrics import accuracy_score

#random forest classifier
rf_clf = RandomForestClassifier(n_estimators=100, random_state=1)
rf_clf.fit(x_train, y_train)
rf_predictions = rf_clf.predict(x_test)
rf_accuracy = accuracy_score(y_test, rf_predictions)
print("Random Forest Accuracy:", rf_accuracy)

#extra trees classifier
et_clf = ExtraTreesClassifier(n_estimators=100, random_state=1)
et_clf.fit(x_train, y_train)
et_predictions = et_clf.predict(x_test)
et_accuracy = accuracy_score(y_test, et_predictions)
print("Extra Trees Accuracy:", et_accuracy)


Random Forest Accuracy: 0.929
Extra Trees Accuracy: 0.928


In [177]:
import xgboost as xgb
import lightgbm as lgb

#xgboost model
xgb_model = xgb.XGBClassifier(n_estimators=100, random_state=1)
xgb_model.fit(x_train, y_train)
xgb_predictions = xgb_model.predict(x_test)
xgb_accuracy = accuracy_score(y_test, xgb_predictions)
print("XGBoost Accuracy:", xgb_accuracy)

#lightgbm model
lgb_model = lgb.LGBMClassifier(n_estimators=100, random_state=1)
lgb_model.fit(x_train, y_train)
lgb_predictions = lgb_model.predict(x_test)
lgb_accuracy = accuracy_score(y_test, lgb_predictions)
print("LightGBM Accuracy:", lgb_accuracy)


XGBoost Accuracy: 0.9455
LightGBM Accuracy: 0.939


In [178]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

#define the parameter grid
param_grid = {
    'n_estimators': randint(100, 1000),  # Number of trees in the forest
    'max_features': randint(1, 10),  # Maximum number of features to consider at each split
    'min_samples_split': randint(2, 10),  # Minimum number of samples required to split an internal node
    'min_samples_leaf': randint(1, 10),  # Minimum number of samples required to be at a leaf node
}

#create the ExtraTreesClassifier model
model = ExtraTreesClassifier(random_state=1)

#perform randomized search CV
random_search = RandomizedSearchCV(
    model, param_distributions=param_grid, n_iter=10, scoring='accuracy', n_jobs=-1, verbose=1, cv=5, random_state=1
)

#fit the model on the training data
random_search.fit(x_train, y_train)

#obtain the best hyperparameters
best_params = random_search.best_params_
print("Best Hyperparameters:", best_params)


Fitting 5 folds for each of 10 candidates, totalling 50 fits
Best Hyperparameters: {'max_features': 6, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 229}


In [179]:
#create a new ExtraTreesClassifier model with the best hyperparameters
model = ExtraTreesClassifier(
    n_estimators=best_params['n_estimators'],
    max_features=best_params['max_features'],
    min_samples_split=best_params['min_samples_split'],
    min_samples_leaf=best_params['min_samples_leaf'],
    random_state=1
)

#fit the model on the training data
model.fit(x_train, y_train)

#use the trained model to make predictions
y_pred = model.predict(x_test)

newet_accuracy = accuracy_score(y_test, y_pred)
print("Extra Trees Accuracy:", newet_accuracy)

Extra Trees Accuracy: 0.9395


In [180]:
#access feature importances from the trained model
importance_scores = model.feature_importances_

#create a DataFrame to display the feature importances
feature_importances = pd.DataFrame({'Feature': X.columns, 'Importance': importance_scores})
feature_importances = feature_importances.sort_values('Importance', ascending=False)

#display the most and least important features
most_important_feature = feature_importances['Feature'].iloc[0]
least_important_feature = feature_importances['Feature'].iloc[-1]

print("Most important feature:", most_important_feature)
print("Least important feature:", least_important_feature)


Most important feature: tau2
Least important feature: p1
