## Data Loading and Preprocessing

In [2]:
import pandas as pd

data = pd.read_csv('student_data.csv')
X = data[['Hours_Studied', 'Review_Session']]
y = data['Results']


## Fit SVM Model with Linear Kernel

In [4]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

linear_svm = SVC(kernel='linear')
linear_svm.fit(X, y)

# Predict on the same dataset
y_pred_linear = linear_svm.predict(X)

# Evaluate accuracy
accuracy_linear = accuracy_score(y, y_pred_linear)
print(f"Accuracy (Linear Kernel): {accuracy_linear:.2f}")


Accuracy (Linear Kernel): 0.92


## Fit SVM Model with RBF Kernel & GridSearchCV for gamma

In [6]:
from sklearn.model_selection import GridSearchCV

# Define parameter grid for gamma
param_grid = {'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}

# GridSearch with 5-fold cross-validation
grid_search = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=5)
grid_search.fit(X, y)

# Best gamma parameter
best_gamma = grid_search.best_params_['gamma']
print(f"Best gamma value: {best_gamma}")

# Fit the best RBF model
rbf_svm = SVC(kernel='rbf', gamma=best_gamma)
rbf_svm.fit(X, y)

# Predict and evaluate accuracy
y_pred_rbf = rbf_svm.predict(X)
accuracy_rbf = accuracy_score(y, y_pred_rbf)
print(f"Accuracy (RBF Kernel, gamma={best_gamma}): {accuracy_rbf:.2f}")


Best gamma value: 0.1
Accuracy (RBF Kernel, gamma=0.1): 0.93


## Comparison of Results and Evaluation of Overfitting

After fitting the Support Vector Machine (SVM) models with linear and RBF kernels, we obtained two sets of accuracy scores using the entire dataset for training and testing:

Linear Kernel SVM: Achieved moderate accuracy due to its simpler decision boundary, which reduces the risk of overfitting but may not fully capture complex relationships.
RBF Kernel SVM (with optimized gamma): Produced a higher accuracy because the RBF kernel can create complex decision boundaries, capturing nonlinear relationships in the data. The optimal gamma parameter found through grid search cross-validation further enhanced model accuracy.
Discussion on Overfitting:
Evaluating both models on the same dataset they were trained on typically results in overly optimistic performance metrics. In this scenario, the high accuracy achieved—particularly with the RBF kernel—likely indicates overfitting. Overfitting means the model learns not just the general patterns but also the specific noise or peculiarities of the dataset. Consequently, such a model might perform poorly on new, unseen data.

To gain a realistic evaluation and to assess generalizability, a proper train-test split or cross-validation strategy is recommended. Evaluating on separate unseen data would provide a more realistic insight into the model’s true predictive capabilities and help accurately identify potential overfitting issues.

Conclusion:
Linear Kernel: Simpler, less prone to overfitting, slightly lower accuracy.
RBF Kernel: More complex, potentially prone to overfitting if gamma is not properly tuned.
Hence, while the RBF kernel provides greater flexibility and better fits this dataset, it demands careful validation strategies to avoid overfitting in practical applications.