# Exercise - 10

Train an `SVM` classifier on the Wine dataset, which you can load using `sklearn.datasets.load_wine()`. This dataset contains the chemical analysis of 178 wine samples produced by 3 different cultivators: the goal is to train a classification model capable of predicting the cultivator based on the wine's chemical analysis. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 3 classes. What accuracy can you reach?

In [27]:
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler 
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV
from scipy.stats import loguniform, uniform

## Loading Dataset

In [28]:
wine = load_wine(as_frame= True)

In [29]:
print(wine.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0.98  3.88    2.29  0.63
    Fl

In [30]:
X, y = wine.data, wine.target

In [31]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state= 42)

## First model

In [32]:
svm_clf = Pipeline([
    ('scaler', StandardScaler()),
    ('SVC', SVC(random_state= 42))
])
svm_clf.fit(X_train, y_train)

In [33]:
cross_val_score(svm_clf, X_train, y_train).mean()

0.9698005698005698

It performed really well, but let's fine tune it.

## Fine Tunning

In [34]:
param_dist = {
    'SVC__gamma': loguniform(0.001, 1),
    'SVC__C': uniform(1, 10)
}

rnd_search_cv = RandomizedSearchCV(svm_clf, param_dist, n_iter= 100, random_state= 42)
rnd_search_cv.fit(X_train, y_train)

In [35]:
best_model = rnd_search_cv.best_estimator_

In [36]:
rnd_search_cv.best_params_

{'SVC__C': 9.287375091519294, 'SVC__gamma': 0.011756010900231853}

In [37]:
rnd_search_cv.best_score_

0.9925925925925926

## On Test Set

In [38]:
cross_val_score(best_model, X_test, y_test).mean()

0.9777777777777779