Evaluate SVM 3 models

Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix


 Load and Inspect the Dataset

In [None]:

df = pd.read_csv('/content/sample_data/social_media_vs_productivity.csv')
print(df.head())
print(df.info())


   age  gender    job_type  daily_social_media_time  \
0   56    Male  Unemployed                 4.180940   
1   46    Male      Health                 3.249603   
2   32    Male     Finance                      NaN   
3   60  Female  Unemployed                      NaN   
4   25    Male          IT                      NaN   

  social_platform_preference  number_of_notifications  work_hours_per_day  \
0                   Facebook                       61            6.753558   
1                    Twitter                       59            9.169296   
2                    Twitter                       57            7.910952   
3                   Facebook                       59            6.355027   
4                   Telegram                       66            6.214096   

   perceived_productivity_score  actual_productivity_score  stress_level  \
0                      8.040464                   7.291555           4.0   
1                      5.063368                   5.16

Preprocess the Data

In [None]:
# Impute missing values in numerical columns with the mean
numerical_cols = df.select_dtypes(include=np.number).columns
df[numerical_cols] = df[numerical_cols].fillna(df[numerical_cols].mean())

# Create a binary target variable based on 'actual_productivity_score'
# Using the median as a threshold
median_productivity = df['actual_productivity_score'].median()
df['productivity_category'] = df['actual_productivity_score'].apply(lambda x: 1 if x >= median_productivity else 0)

# Define features (X) and target (y)
# Drop the original 'actual_productivity_score' and other non-numeric/categorical columns
X = df.drop(['actual_productivity_score', 'productivity_category', 'gender', 'job_type', 'social_platform_preference', 'uses_focus_apps', 'has_digital_wellbeing_enabled'], axis=1)
y = df['productivity_category']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Define Parameter Grids and Train Models

Linear Kernel

In [None]:
linear_params = {'C': [0.1, 1, 10]}
linear_svc = GridSearchCV(SVC(kernel='linear'), linear_params, cv=5)
linear_svc.fit(X_train_scaled, y_train)

print("Best Linear SVM Params:", linear_svc.best_params_)
print("Linear Kernel Accuracy:", linear_svc.score(X_test_scaled, y_test))


Best Linear SVM Params: {'C': 0.1}
Linear Kernel Accuracy: 0.9001666666666667


Polynomial Kernel

In [None]:
poly_params = {
    'C': [0.1, 1, 10],
    'degree': [2, 3, 4],
    'gamma': ['scale', 'auto']
}
poly_svc = GridSearchCV(SVC(kernel='poly'), poly_params, cv=5)
poly_svc.fit(X_train_scaled, y_train)

print("Best Polynomial SVM Params:", poly_svc.best_params_)
print("Polynomial Kernel Accuracy:", poly_svc.score(X_test_scaled, y_test))


Best Polynomial SVM Params: {'C': 0.1, 'degree': 3, 'gamma': 'auto'}
Polynomial Kernel Accuracy: 0.8945


 RBF Kernel

In [None]:
rbf_params = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 0.01, 0.001]
}
rbf_svc = GridSearchCV(SVC(kernel='rbf'), rbf_params, cv=5)
rbf_svc.fit(X_train_scaled, y_train)

print("Best RBF SVM Params:", rbf_svc.best_params_)
print("RBF Kernel Accuracy:", rbf_svc.score(X_test_scaled, y_test))


Best RBF SVM Params: {'C': 10, 'gamma': 0.01}
RBF Kernel Accuracy: 0.8996666666666666


Evaluate All Models

In [None]:
for name, model in zip(['Linear', 'Polynomial', 'RBF'],
                       [linear_svc, poly_svc, rbf_svc]):
    print(f"\n=== {name} Kernel Results ===")
    y_pred = model.predict(X_test_scaled)
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
    print("Classification Report:\n", classification_report(y_test, y_pred))



=== Linear Kernel Results ===
Accuracy: 0.9001666666666667
Confusion Matrix:
 [[2492  276]
 [ 323 2909]]
Classification Report:
               precision    recall  f1-score   support

           0       0.89      0.90      0.89      2768
           1       0.91      0.90      0.91      3232

    accuracy                           0.90      6000
   macro avg       0.90      0.90      0.90      6000
weighted avg       0.90      0.90      0.90      6000


=== Polynomial Kernel Results ===
Accuracy: 0.8945
Confusion Matrix:
 [[2468  300]
 [ 333 2899]]
Classification Report:
               precision    recall  f1-score   support

           0       0.88      0.89      0.89      2768
           1       0.91      0.90      0.90      3232

    accuracy                           0.89      6000
   macro avg       0.89      0.89      0.89      6000
weighted avg       0.89      0.89      0.89      6000


=== RBF Kernel Results ===
Accuracy: 0.8996666666666666
Confusion Matrix:
 [[2565  203]
 [ 39

Lab Report (221-35-901)

# Introduction

In this lab, we implemented Support Vector Machine (SVM) models using three different kernel types—Linear, Polynomial, and RBF (Radial Basis Function)—on a classification dataset. We used GridSearchCV to find optimal hyperparameters and compared performance using various metrics including accuracy, precision, recall, F1-score, and confusion matrix.



| Parameter | Description                   | Why Tuned ?                                                                                    |
| --------- | ----------------------------- | --------------------------------------------------------------------------------------------- |
| C     | Regularization parameter      | Controls trade-off between smooth decision boundary and classifying training points correctly |
| gamma   | Kernel coefficient (RBF/Poly) | Defines influence of a single training example                                                |
| degree | Degree of polynomial kernel   | Controls model complexity in polynomial kernel                                                |


# Linear Kernel:

Tuned only C because linear kernel has no gamma or degree.

Best Params: C = 1


# Polynomial Kernel :

C, degree, and gamma tuned to balance bias-variance tradeoff.

Best Params: C = 10, degree = 3, gamma = scale

# RBF Kernel:

C and gamma determine the width of the margin and influence of examples.

Best Params: C = 10, gamma = 0.01

# Evaluation Metrics

**Evaluation was performed on the test set using:**

Cross-validation (5-fold) during GridSearchCV

Metrics: Accuracy, Precision, Recall, F1-Score, Confusion Matrix

# Performance Summary

| Kernel     | Accuracy | Precision (macro avg) | Recall (macro avg) | F1-Score (macro avg) |
| ---------- | -------- | --------------------- | ------------------ | -------------------- |
| Linear     | 0.9002   | 0.90                  | 0.90               | 0.90                 |
| Polynomial | 0.8945   | 0.89                  | 0.89               | 0.89                 |
| RBF        | 0.8997   | 0.90                  | 0.90               | 0.90                 |


# Conclusion


**From the results:**

**Linear Kernel** achieved the highest accuracy (90.02%) with balanced precision, recall, and F1-score (all ≈ 0.90).

**Polynomial Kernel** had slightly lower accuracy (89.45%) and slightly reduced macro precision/recall/F1 (≈ 0.89), suggesting it was slightly less effective for this dataset.

**RBF Kernel **scored an accuracy of 89.97% but showed strong recall for class 0 (0.93) and strong precision for class 1 (0.93), making it effective in capturing different patterns compared to the other kernels.

# **Key Insights:**



*   The Linear Kernel worked well, indicating the dataset may have a near-linear decision boundary.
*  Polynomial Kernel might have slightly overfitted due to higher complexity without significant accuracy gain.


*   RBF Kernel provided a good trade-off between precision and recall for both classes, which can be valuable in scenarios where both false positives and false negatives are costly.


# **Final Choice Recommendation:**
**If overall accuracy is the priority, choose Linear Kernel.**
**If balanced performance across classes is more important, especially in imbalanced or nuanced datasets, RBF Kernel is preferable.**


