<a href="https://colab.research.google.com/github/ashwinibhatM19/Samsung/blob/main/KNN_%26_SVM_mushrooms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import time
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score

In [2]:
# Load the breast cancer dataset
df = load_breast_cancer()

X = df.data

y = df.target


In [3]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)


In [4]:
# Feature Scaling is critical for KNN and SVM
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

#K-Nearest Neighbors (KNN) Classifier
KNN is a simple, instance-based learning algorithm that classifies new data points based on the majority class of its nearest neighbors. The performance is heavily influenced by the choice of 'k' (number of neighbors). For this example, we'll use a k of 5, which is a common starting point.

In [5]:
# Initialize and train the KNN Classifier
start_time_knn = time.time()

knn_model = KNeighborsClassifier(n_neighbors=5)

knn_model.fit(X_train_scaled, y_train)

end_time_knn = time.time()


In [6]:
# Make predictions and evaluate
y_pred_knn = knn_model.predict(X_test_scaled)

accuracy_knn = accuracy_score(y_test, y_pred_knn)

roc_auc_knn = roc_auc_score(y_test, y_pred_knn)

training_time_knn = end_time_knn - start_time_knn


In [7]:
print("=== K-Nearest Neighbors (KNN) ===")
print(f"Accuracy: {accuracy_knn:.4f}")

print(f"ROC AUC Score: {roc_auc_knn:.4f}")

print(f"Training Time: {training_time_knn:.4f} seconds")

print("\nClassification Report:")
print(classification_report(y_test, y_pred_knn))

=== K-Nearest Neighbors (KNN) ===
Accuracy: 0.9591
ROC AUC Score: 0.9453
Training Time: 0.0016 seconds

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.89      0.94        64
           1       0.94      1.00      0.97       107

    accuracy                           0.96       171
   macro avg       0.97      0.95      0.96       171
weighted avg       0.96      0.96      0.96       171



#Support Vector Machine (SVM) Classifier
SVM is a powerful algorithm that finds the optimal hyperplane to separate different classes. The SVC (Support Vector Classifier) in scikit-learn is a versatile implementation that can use different kernels, such as the rbf (Radial Basis Function) kernel for non-linear decision boundaries.

In [8]:
# Initialize and train the SVM Classifier
start_time_svm = time.time()
svm_model = SVC(kernel='rbf',
                C=1.0,
                gamma='scale',
                random_state=42)

svm_model.fit(X_train_scaled, y_train)

end_time_svm = time.time()

In [9]:
# Make predictions and evaluate
y_pred_svm = svm_model.predict(X_test_scaled)

accuracy_svm = accuracy_score(y_test, y_pred_svm)

roc_auc_svm = roc_auc_score(y_test, y_pred_svm)

training_time_svm = end_time_svm - start_time_svm

In [10]:
print("\n=== Support Vector Machine (SVM) ===")
print(f"Accuracy: {accuracy_svm:.4f}")

print(f"ROC AUC Score: {roc_auc_svm:.4f}")

print(f"Training Time: {training_time_svm:.4f} seconds")

print("\nClassification Report:")
print(classification_report(y_test, y_pred_svm))


=== Support Vector Machine (SVM) ===
Accuracy: 0.9766
ROC AUC Score: 0.9750
Training Time: 0.0092 seconds

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.97      0.97        64
           1       0.98      0.98      0.98       107

    accuracy                           0.98       171
   macro avg       0.98      0.98      0.98       171
weighted avg       0.98      0.98      0.98       171



When you run this code, you will likely see that both KNN and SVM achieve very high accuracy on this dataset. SVM, in particular, often performs exceptionally well because it's effective at finding a clear decision boundary even in high-dimensional space. While KNN is simple to implement, its performance can be more sensitive to the choice of 'k' and the distance metric.

In [11]:
import pandas as pd
import numpy as np
import time
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score

In [13]:
df = pd.read_csv('mushrooms.csv')

In [14]:
label_encoders = {}
for column in df.columns:
    le = LabelEncoder()
    df[column] = le.fit_transform(df[column])
    label_encoders[column] = le


In [15]:
X = df.drop('class', axis=1)
y = df['class']

In [16]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y)

In [17]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [18]:
start_time_knn = time.time()
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train_scaled, y_train)
end_time_knn = time.time()

In [19]:
y_pred_knn = knn_model.predict(X_test_scaled)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
roc_auc_knn = roc_auc_score(y_test, y_pred_knn)
training_time_knn = end_time_knn - start_time_knn

In [20]:
print("=== K-Nearest Neighbors (KNN) ===")
print(f"Accuracy: {accuracy_knn:.4f}")

=== K-Nearest Neighbors (KNN) ===
Accuracy: 1.0000


In [21]:
print(f"ROC AUC Score: {roc_auc_knn:.4f}")

ROC AUC Score: 1.0000


In [22]:
print(f"Training Time: {training_time_knn:.4f} seconds")

Training Time: 0.0066 seconds


In [24]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred_knn))


Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1263
           1       1.00      1.00      1.00      1175

    accuracy                           1.00      2438
   macro avg       1.00      1.00      1.00      2438
weighted avg       1.00      1.00      1.00      2438



In [25]:
start_time_svm = time.time()
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_model.fit(X_train_scaled, y_train)
end_time_svm = time.time()

In [26]:
y_pred_svm = svm_model.predict(X_test_scaled)
accuracy_svm = accuracy_score(y_test, y_pred_svm)
roc_auc_svm = roc_auc_score(y_test, y_pred_svm)
training_time_svm = end_time_svm - start_time_svm

In [27]:
print("\n=== Support Vector Machine (SVM) ===")
print(f"Accuracy: {accuracy_svm:.4f}")


=== Support Vector Machine (SVM) ===
Accuracy: 1.0000


In [28]:
print(f"ROC AUC Score: {roc_auc_svm:.4f}")

ROC AUC Score: 1.0000


In [29]:
print(f"Training Time: {training_time_svm:.4f} seconds")

Training Time: 0.5006 seconds


In [30]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred_svm))



Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1263
           1       1.00      1.00      1.00      1175

    accuracy                           1.00      2438
   macro avg       1.00      1.00      1.00      2438
weighted avg       1.00      1.00      1.00      2438

