<a href="https://colab.research.google.com/github/TamilselviMunusamy007/MachineLearning_M606/blob/main/support_vector_machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machines
You should build a machine learning pipeline using a support vector machine model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Conduct data exploration, data preprocessing, and feature engineering if necessary.
- Train and test a support vector machine model using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

In [None]:


url = "https://raw.githubusercontent.com/m-mahdavi/teaching/refs/heads/main/datasets/mnist.csv"
df = pd.read_csv(url)

df.head()

Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,31953,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34452,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60897,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,36953,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1981,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Separate features and target

In [None]:

print(df.columns)
X = df.drop('class', axis=1)
y = df['class']

Index(['id', 'class', 'pixel1', 'pixel2', 'pixel3', 'pixel4', 'pixel5',
       'pixel6', 'pixel7', 'pixel8',
       ...
       'pixel775', 'pixel776', 'pixel777', 'pixel778', 'pixel779', 'pixel780',
       'pixel781', 'pixel782', 'pixel783', 'pixel784'],
      dtype='object', length=786)


Split features & target

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42
)
print(y.value_counts())

id
43287    1
16920    1
16753    1
43411    1
14392    1
        ..
1981     1
36953    1
60897    1
34452    1
31953    1
Name: count, Length: 4000, dtype: int64


Data Preprocessing - Missing values

In [None]:
X.fillna(X.mean(), inplace=True)

Standardize data

In [None]:

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

SVM - training

In [None]:
svm_model = SVC(kernel='rbf', C=5, gamma='scale', random_state=42)
svm_model.fit(X_train_scaled, y_train)

Prediction

In [None]:
y_pred = svm_model.predict(X_test_scaled)
print(y_pred)


[48246  6324 54055 63546  6770 25506  2766  2784 13348 41464 53815  3539
 18528 56569 15407 42496 34217 56898 52893 68024 16175 38925 59445 40325
 69266 41230 69345 48672 39273 42499 40817 65225 54739 10188 55209 40817
 44332 42333 66659 44706 66153 24635 52787 59865  5727 59474 54288 61631
 56687  4560  3663 29707 39378 64884 10114 22780  5656 29389 40840 23288
 16576  1299 15708 29389 64261 43371 62434  7037 11716 17644 64611 56111
 45706 67851 61949 66140 54237 61311 36988 63949 14362 64214 67977  2938
 58202  2784  2304 25553 62924 69313 21162 63967 22492 25523 65084  5137
 35773 40920  9612 63242 62919 33166 54978 20923 34509  4702 43322 13604
  9281  2424 33917 24826 23323 69272 39706 18063 22280 48344  1391 13619
 17219 68555 69626 17970 26904 18284  3663 62902 39720  5727  2649 46552
   639 17624 16768  6800 50029 27253 42521 34178   939  3440 63512 12777
 63546 67501 34877 37087 22280 24235 49383  3663 39941 18906  8943 33628
 14113 67379  7086 66853  8193 51975 47826  9219 55