In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
file_path = 'Task7.1.csv'
data = pd.read_csv(file_path)

# Separate features and target variable
X = data.drop(columns=['stabf'])
y = data['stabf']

# Splitting the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model Training
knn_classifier = KNeighborsClassifier()
knn_classifier.fit(X_train, y_train)

# Model Evaluation
y_pred = knn_classifier.predict(X_test)

# Performance Metrics
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

# Report Performance and Hyperparameters
print("Model Performance:")
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_rep)

"""Comparing the two models (this week and week 7 qn 2), we observe that the SVM model outperforms the KNN model in terms of accuracy and F1-scores for both classes (stable and unstable). The SVM model achieves higher accuracy and better precision-recall balance for both classes compared to the KNN model. This suggests that the SVM model with a linear kernel is more suitable for this classification task compared to the KNN model."""


Model Performance:
Accuracy: 0.7895

Classification Report:
              precision    recall  f1-score   support

      stable       0.70      0.68      0.69       693
    unstable       0.83      0.85      0.84      1307

    accuracy                           0.79      2000
   macro avg       0.77      0.76      0.77      2000
weighted avg       0.79      0.79      0.79      2000



'Comparing the two models (this week and week 7 qn 2), we observe that the SVM model outperforms the KNN model in terms of accuracy and F1-scores for both classes (stable and unstable). The SVM model achieves higher accuracy and better precision-recall balance for both classes compared to the KNN model. This suggests that the SVM model with a linear kernel is more suitable for this classification task compared to the KNN model.'

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
file_path = 'Task7.1.csv'
data = pd.read_csv(file_path)

# Separate features and target variable
X = data.drop(columns=['stabf'])
y = data['stabf']

# Model Training and Evaluation with 50-50% data splitting
X_train_50, X_test_50, y_train_50, y_test_50 = train_test_split(X, y, test_size=0.5, random_state=42)
dt_model_50 = DecisionTreeClassifier(random_state=42)
dt_model_50.fit(X_train_50, y_train_50)
y_pred_50 = dt_model_50.predict(X_test_50)
accuracy_50 = accuracy_score(y_test_50, y_pred_50)
report_50 = classification_report(y_test_50, y_pred_50)

# Model Training and Evaluation with 80-20% data splitting
X_train_80, X_test_80, y_train_80, y_test_80 = train_test_split(X, y, test_size=0.2, random_state=42)
dt_model_80 = DecisionTreeClassifier(random_state=42)
dt_model_80.fit(X_train_80, y_train_80)
y_pred_80 = dt_model_80.predict(X_test_80)
accuracy_80 = accuracy_score(y_test_80, y_pred_80)
report_80 = classification_report(y_test_80, y_pred_80)

# Print the performances and comparison
print("Model Performance with 50-50% Split:")
print("Accuracy:", accuracy_50)
print("Classification Report:")
print(report_50)
print("\nModel Performance with 80-20% Split:")
print("Accuracy:", accuracy_80)
print("Classification Report:")
print(report_80)


Model Performance with 50-50% Split:
Accuracy: 0.9998
Classification Report:
              precision    recall  f1-score   support

      stable       1.00      1.00      1.00      1795
    unstable       1.00      1.00      1.00      3205

    accuracy                           1.00      5000
   macro avg       1.00      1.00      1.00      5000
weighted avg       1.00      1.00      1.00      5000


Model Performance with 80-20% Split:
Accuracy: 0.9995
Classification Report:
              precision    recall  f1-score   support

      stable       1.00      1.00      1.00       693
    unstable       1.00      1.00      1.00      1307

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000



In [5]:
""" Both models achieve very high accuracy, precision, recall, and F1-score values, indicating excellent performance in classifying the electrical grid stability data.

The main difference between the two models lies in the size of the training and testing sets due to the different data splitting methods:

1. 50-50% Split Model:
   - Accuracy:99.98%
   - Training Set Size: 5000 samples
   - Testing Set Size: 5000 samples
   - The model trained on a larger dataset (5000 samples) has slightly higher accuracy compared to the 80-20% split model. This larger dataset provides more information for the model to learn from, leading to better performance.
   - The classification report shows perfect precision, recall, and F1-score values for both stable and unstable classes, indicating no misclassifications in the testing set.

2. 80-20% Split Model:
   - Accuracy: 99.95%
   - Training Set Size: 4000 samples
   - Testing Set Size: 2000 samples
   - The model trained on a smaller dataset (4000 samples) with a smaller training set proportion achieves slightly lower accuracy compared to the 50-50% split model.
   - Despite the smaller training set, the model still performs exceptionally well with perfect precision, recall, and F1-score values for both classes in the testing set.

Overall, both models demonstrate high performance, but the 50-50% split model benefits from a larger training dataset, resulting in slightly higher accuracy. The impact of the difference in data splitting on the performances of the models showcases the importance of having sufficient data for training to achieve optimal model performance. However, even with a smaller training set in the 80-20% split model, the decision tree algorithm proves to be robust and highly effective in classifying the electrical grid stability data."""

' Both models achieve very high accuracy, precision, recall, and F1-score values, indicating excellent performance in classifying the electrical grid stability data.\n\nThe main difference between the two models lies in the size of the training and testing sets due to the different data splitting methods:\n\n1. **50-50% Split Model:**\n   - **Accuracy:** 99.98%\n   - **Training Set Size:** 5000 samples\n   - **Testing Set Size:** 5000 samples\n   - The model trained on a larger dataset (5000 samples) has slightly higher accuracy compared to the 80-20% split model. This larger dataset provides more information for the model to learn from, leading to better performance.\n   - The classification report shows perfect precision, recall, and F1-score values for both stable and unstable classes, indicating no misclassifications in the testing set.\n\n2. **80-20% Split Model:**\n   - **Accuracy:** 99.95%\n   - **Training Set Size:** 4000 samples\n   - **Testing Set Size:** 2000 samples\n   - T

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
file_path = 'Task7.1.csv'
data = pd.read_csv(file_path)

# Separate features and target variable
X = data.drop(columns=['stabf'])
y = data['stabf']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train KNN classifiers with different distance metrics
knn_euclidean = KNeighborsClassifier()
knn_euclidean.fit(X_train, y_train)

knn_cityblock = KNeighborsClassifier(metric='cityblock')
knn_cityblock.fit(X_train, y_train)

knn_manhattan = KNeighborsClassifier(metric='manhattan')
knn_manhattan.fit(X_train, y_train)

# Evaluate the models
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    report = classification_report(y_test, y_pred)
    return accuracy, report

accuracy_euclidean, report_euclidean = evaluate_model(knn_euclidean, X_test, y_test)
accuracy_cityblock, report_cityblock = evaluate_model(knn_cityblock, X_test, y_test)
accuracy_manhattan, report_manhattan = evaluate_model(knn_manhattan, X_test, y_test)

# Report the performances
print("Performance with Euclidean Distance Metric:")
print("Accuracy:", accuracy_euclidean)
print("Classification Report:")
print(report_euclidean)

print("\nPerformance with Cityblock Distance Metric:")
print("Accuracy:", accuracy_cityblock)
print("Classification Report:")
print(report_cityblock)

print("\nPerformance with Manhattan Distance Metric:")
print("Accuracy:", accuracy_manhattan)
print("Classification Report:")
print(report_manhattan)


Performance with Euclidean Distance Metric:
Accuracy: 0.7895
Classification Report:
              precision    recall  f1-score   support

      stable       0.70      0.68      0.69       693
    unstable       0.83      0.85      0.84      1307

    accuracy                           0.79      2000
   macro avg       0.77      0.76      0.77      2000
weighted avg       0.79      0.79      0.79      2000


Performance with Cityblock Distance Metric:
Accuracy: 0.819
Classification Report:
              precision    recall  f1-score   support

      stable       0.74      0.73      0.74       693
    unstable       0.86      0.87      0.86      1307

    accuracy                           0.82      2000
   macro avg       0.80      0.80      0.80      2000
weighted avg       0.82      0.82      0.82      2000


Performance with Manhattan Distance Metric:
Accuracy: 0.819
Classification Report:
              precision    recall  f1-score   support

      stable       0.74      0.73      

In [7]:
""" 

The above results indicate that both the models using cityblock and Manhattan distance metrics have slightly higher accuracy compared to the model using the Euclidean distance metric. However, the differences in accuracy are marginal.

Looking at the classification reports, we can see that the precision, recall, and F1-score values for both stable and unstable classes are quite similar between the models using cityblock and Manhattan distance metrics. This similarity suggests that these two distance metrics might be capturing similar underlying structures in the data, leading to comparable classification performance.

Overall, all three models demonstrate good performance, but the models using cityblock and Manhattan distance metrics show a slight improvement in accuracy compared to the model using the Euclidean distance metric. However, the differences in performance are relatively small, indicating that the choice of distance metric may not have a significant impact on the classification performance in this particular dataset."""

' \n\nThe above results indicate that both the models using cityblock and Manhattan distance metrics have slightly higher accuracy compared to the model using the Euclidean distance metric. However, the differences in accuracy are marginal.\n\nLooking at the classification reports, we can see that the precision, recall, and F1-score values for both stable and unstable classes are quite similar between the models using cityblock and Manhattan distance metrics. This similarity suggests that these two distance metrics might be capturing similar underlying structures in the data, leading to comparable classification performance.\n\nOverall, all three models demonstrate good performance, but the models using cityblock and Manhattan distance metrics show a slight improvement in accuracy compared to the model using the Euclidean distance metric. However, the differences in performance are relatively small, indicating that the choice of distance metric may not have a significant impact on the 