# Wine Classification - SVM (Multiple classes)
A demonstration on how to use the SVM model for multiple classes.

We'll use the wine classification dataset from: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html#sklearn.datasets.l

It behaves similar as the breast-cancer dataset, returning either a Bunch
object or a (Dataframe, Series) tuple.



# Step 1: Loading the dataset and performing basic analysis.

In [2]:
from sklearn.datasets import load_wine

# This will get us a Bunch object (dict-like)
wine_data = load_wine(return_X_y = False, as_frame = False)
# Splitting it into X and Y (features, label)
X = wine_data.data
Y = wine_data.target

print(f"Input data size: {X.shape}")
print(f"Output data size: {Y.shape}")

Input data size: (178, 13)
Output data size: (178,)


In [3]:
# Getting the feature names and the target classes
print(f"Feature names: {wine_data.feature_names}")

print(f"\nLabel names: {wine_data.target_names}")

Feature names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

Label names: ['class_0' 'class_1' 'class_2']


In [4]:
# Getting the counts of our classes
n_class0 = (Y==0).sum()
n_class1 = (Y==1).sum()
n_class2 = (Y==2).sum()
print(f'{n_class0} class0 samples - {n_class0*100/Y.sum():.2f}%'
      f'\n{n_class1} class1 samples - {n_class1*100/Y.sum():.2f}%,'
      f'\n{n_class2} class2 samples - {n_class2*100/Y.sum():.2f}%')

59 class0 samples - 35.33%
71 class1 samples - 42.51%,
48 class2 samples - 28.74%


# Step 2: Splitting the data

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state = 42)

In [6]:
# Variables for testing and training
Y_train.size, Y_test.size

(133, 45)

# Step 3: Training the model
In an SVC model, multiclass support is implicitly handled according to the one-vs-one scheme.



In [7]:
from sklearn.svm import SVC
clf = SVC(kernel = 'linear', C=1.0, random_state = 42)
clf.fit(X_train, Y_train)

In [8]:
# Getting the predictions
predictions = clf.predict(X_test)
predictions

array([0, 0, 2, 0, 1, 0, 1, 2, 1, 2, 0, 2, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 2, 2, 2, 1, 1, 1, 0, 0, 1, 2, 0, 0, 0, 2, 2, 1, 2, 0, 1, 1, 2,
       2])

In [12]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

report = classification_report(Y_test, predictions)
print(report)
print(confusion_matrix(Y_test, predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      0.94      0.97        18
           2       0.92      1.00      0.96        12

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

[[15  0  0]
 [ 0 17  1]
 [ 0  0 12]]


In [15]:
print(f'The accuracy is: {clf.score(X_test, Y_test)*100:.1f}%')


The accuracy is: 97.8%
