# Predict Student Pass/Fail Using KNN & SVM


**Objective:**  

---


Classify students as Pass/Fail using **K-Nearest Neighbors (KNN)** and **Support Vector Machine (SVM)** models.  

**Dataset:** Student performance data (features like attendance, marks, assignments).  

**Goal:** Compare KNN and SVM models, evaluate performance, visualize results, and determine the best classifier.


In [None]:
# Data handling
import pandas as pd
import numpy as np

# Model building
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

# Evaluation metrics
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# For inline plots
%matplotlib inline


In [None]:
# Load dataset (upload CSV in Colab)
df = pd.read_csv("student_performance_dataset.csv")
df.head()


FileNotFoundError: [Errno 2] No such file or directory: 'student_performance_dataset.csv'

In [None]:
# Check missing values
print(df.isnull().sum())

# Split features and target
X = df.drop("Pass_Fail", axis=1)
y = df["Pass_Fail"]

# Train-Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [None]:
# Train KNN
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

# Predict
y_pred_knn = knn.predict(X_test_scaled)

# Accuracy
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))

# Detailed report
print(classification_report(y_test, y_pred_knn))


In [None]:
cm_knn = confusion_matrix(y_test, y_pred_knn)

# Heatmap
plt.figure(figsize=(6,5))
sns.heatmap(cm_knn, annot=True, fmt='d', cmap='Blues')
plt.title("KNN Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


In [None]:
# Using first 2 features for visualization
X_plot = X_train_scaled[:, :2]
knn_plot = KNeighborsClassifier(n_neighbors=5)
knn_plot.fit(X_plot, y_train)

# Create meshgrid
x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1
y_min, y_max = X_plot[:, 1].min() - 1, X_plot[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

# Predict grid
Z = knn_plot.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X_plot[:,0], X_plot[:,1], c=y_train, s=20, edgecolor='k')
plt.title("KNN Decision Boundary (2 features)")
plt.show()


In [None]:
# Train Linear SVM
svm_linear = SVC(kernel='linear', C=1)
svm_linear.fit(X_train_scaled, y_train)

# Predict
y_pred_linear = svm_linear.predict(X_test_scaled)

# Accuracy & report
print("Linear SVM Accuracy:", accuracy_score(y_test, y_pred_linear))
print(classification_report(y_test, y_pred_linear))


In [None]:
# Train RBF SVM
svm_rbf = SVC(kernel='rbf', C=1, gamma='scale')
svm_rbf.fit(X_train_scaled, y_train)

# Predict
y_pred_rbf = svm_rbf.predict(X_test_scaled)

# Accuracy & report
print("RBF SVM Accuracy:", accuracy_score(y_test, y_pred_rbf))
print(classification_report(y_test, y_pred_rbf))


In [None]:
# Linear SVM
cm_linear = confusion_matrix(y_test, y_pred_linear)
plt.figure(figsize=(6,5))
sns.heatmap(cm_linear, annot=True, fmt='d', cmap='Oranges')
plt.title("Linear SVM Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

# RBF SVM
cm_rbf = confusion_matrix(y_test, y_pred_rbf)
plt.figure(figsize=(6,5))
sns.heatmap(cm_rbf, annot=True, fmt='d', cmap='Greens')
plt.title("RBF SVM Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


## Model Comparison

| Model        | Accuracy | Notes |
|--------------|----------|-------|
| KNN          | XX%      | Experimented with K=5 |
| SVM Linear   | XX%      | Best for linearly separable data |
| SVM RBF      | XX%      | Handles non-linear patterns |

### Key Insights:
- Feature scaling is **essential** for both KNN and SVM.
- Linear SVM is faster and interpretable for linear data.
- RBF SVM performs better on non-linear boundaries.
- Confusion matrices show which model makes fewer errors.
