# Support Vector Machine

### Text classification with SVM encoded by Sentence Transformers

Using a custom dataset in csv. With reference to [scikit-learn](https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html#sphx-glr-auto-examples-svm-plot-iris-svc-py)

In [71]:
%pip install scikit-learn
%pip install pandas
%pip install numpy
%pip install matplotlib
%pip install sentence_transformers

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Load the Dataset
1. Read the dataset.csv file using pandas
2. Load the csv dataset into a hf_dataset object
3. Perform a train test split on the hf_dataset

In [100]:
from datasets import Dataset
import pandas as pd

df = pd.read_csv(r"C:\Users\ISS-User1\Documents\Eugene\Glowing-Torch\datasets\custom_dataset.csv", encoding='latin1')

hf_dataset = Dataset.from_pandas(df)
hf_dataset = hf_dataset.train_test_split(
    test_size=1-0.90, shuffle=True)

print(hf_dataset)

DatasetDict({
    train: Dataset({
        features: ['name', 'category'],
        num_rows: 625
    })
    test: Dataset({
        features: ['name', 'category'],
        num_rows: 70
    })
})


## Load the embedding model

This sentence transformer is used to create encodings with shape of (384,) that will then be used for classification.

In [129]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')


## Encode the dataset using the sentence transformer model

In [130]:
X = model.encode(hf_dataset['train']['name'])
y = hf_dataset['train']['category']

## Instantiate an SVM instance and fit the model

Create a LinearSVC SVM classifier and fit with the training data.

In [149]:
from sklearn.metrics import hinge_loss
from sklearn import svm

svm_clf = svm.LinearSVC(C=3, dual="auto")
svm_clf.fit(X, y)

Compute the hinge loss of the SVM model

In [139]:
decision_scores = svm_clf.decision_function(X)

hinge_loss_value = hinge_loss(y, decision_scores)
print("Hinge loss:", hinge_loss_value)

Hinge loss: 0.0


## Predicting the data


In [151]:
query = ["Vintage Straight High Jeans"]
encoding = model.encode(query)

import numpy as np

def softmax(logits):
    exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
    probabilities = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)
    return probabilities

decision_scores = svm_clf.decision_function(encoding)
prediction = svm_clf.predict(encoding)
probabilities = softmax(decision_scores)
for i,name in enumerate(query):
    probability = [(svm_clf.classes_[idx],round(x,6)) for idx,x in enumerate(probabilities[i])]
    print(f"Query: {name}")
    print(f"Predicted: {prediction[i]}")
    print('='*30)
    for p in probability:
        print(f"{p[0].strip()}: {p[1]}")

Query: Vintage Straight High Jeans
Predicted: Regular and Straight Jeans
Plain and Short Sleeve T-Shirts: 0.035593
All other Polo Shirts: 0.007797
All other shorts: 0.042346
Basic Tank Tops or Vest Tops: 0.017731
Basic, Cotton, Plain and Short Sleeve T-Shirts: 0.009171
Chino Shorts: 0.023531
None of the given categories: 0.011121
Other Jeans: 0.059083
Other T-Shirts: 0.000993
Other Tank Tops or Vest Tops: 0.063354
Regular and Straight Jeans: 0.710262
Regular, Plain, Short Sleeve Polo Shirt: 0.019018


In [154]:
X_test = model.encode(hf_dataset['test']['name'])
y_test = hf_dataset['test']['category']

score = svm_clf.score(X_test, y_test)
print(score)

for i in range(0, len(hf_dataset['test']['name'])):
    prediction = svm_clf.predict([X_test[i]])
    if prediction != y_test[i]:
        print(f"Name: {hf_dataset['test']['name'][i]}")
        print(f"Predicted: {prediction[0]}")
        print(f"Truth: {y_test[i]}")
        print('='*30)

0.8714285714285714
Name: Slim Fit Waffled polo shirt
Predicted: Regular, Plain, Short Sleeve Polo Shirt
True: All other Polo Shirts
Name: Loose Fit Print polo shirt
Predicted: Other T-Shirts
True: All other Polo Shirts
Name: KIDS Ultra Stretch Dry Sweat Hoodie
Predicted: None of the given categories
True: Other T-Shirts
Name: Heidi Twist Back Tank
Predicted: All other Polo Shirts
True: Other Tank Tops or Vest Tops
Name: Dry Colour V Neck Short Sleeve T-Shirt
Predicted: Other T-Shirts
True: Basic, Cotton, Plain and Short Sleeve T-Shirts
Name: Regular Fit Piqué polo shirt
Predicted: All other Polo Shirts
True: Regular, Plain, Short Sleeve Polo Shirt
Name: Regular Fit Piqué sports top
Predicted: All other Polo Shirts
True: Regular, Plain, Short Sleeve Polo Shirt
Name: Patch pocket Jeans
Predicted: None of the given categories
True: Other Jeans
Name: Slim Jeans
Predicted: Regular and Straight Jeans
True: Other Jeans
