# Exercise 6

## SVM & Regularization


For this homework we consider a set of observations on a number of red and white wine varieties involving their chemical properties and ranking by tasters. Wine industry shows a recent growth spurt as social drinking is on the rise. The price of wine depends on a rather abstract concept of wine appreciation by wine tasters, opinion among whom may have a high degree of variability. Pricing of wine depends on such a volatile factor to some extent. Another key factor in wine certification and quality assessment is physicochemical tests which are laboratory-based and takes into account factors like acidity, pH level, presence of sugar and other chemical properties. For the wine market, it would be of interest if human quality of tasting can be related to the chemical properties of wine so that certification and quality assessment and assurance process is more controlled.

Two datasets are available of which one dataset is on red wine and have 1599 different varieties and the other is on white wine and have 4898 varieties. All wines are produced in a particular area of Portugal. Data are collected on 12 different properties of the wines one of which is Quality, based on sensory data, and the rest are on chemical properties of the wines including density, acidity, alcohol content etc. All chemical properties of wines are continuous variables. Quality is an ordinal variable with possible ranking from 1 (worst) to 10 (best). Each variety of wine is tasted by three independent tasters and the final rank assigned is the median rank given by the tasters.

A predictive model developed on this data is expected to provide guidance to vineyards regarding quality and price expected on their produce without heavy reliance on volatility of wine tasters.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
data_r = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/Wine_data_red.csv')
data_w = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/Wine_data_white.csv')

In [3]:
data = data_w.assign(type = 'white')

data = data.append(data_r.assign(type = 'red'), ignore_index=True)
data.sample(5)

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type
736,6.6,0.25,0.3,14.4,0.052,40.0,183.0,0.998,3.02,0.5,9.1,6,white
3119,5.9,0.19,0.37,0.8,0.027,3.0,21.0,0.9897,3.09,0.31,10.8,5,white
5302,7.7,0.69,0.05,2.7,0.075,15.0,27.0,0.9974,3.26,0.61,9.1,5,red
267,5.3,0.58,0.07,6.9,0.043,34.0,149.0,0.9944,3.34,0.57,9.7,5,white
5382,10.6,0.44,0.68,4.1,0.114,6.0,24.0,0.997,3.06,0.66,13.4,6,red


# Exercise 6.1

Show the frecuency table of the quality by type of wine

In [4]:
data[["type","quality"]].pivot_table(index="type", columns="quality", aggfunc=len, fill_value = 0)

quality,3,4,5,6,7,8,9
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
red,10,53,681,638,199,18,0
white,20,163,1457,2198,880,175,5


# Exercise 6.2

* Standarized the features (not the quality)
* Create a binary target for each type of wine
* Create two Linear SVM's for the white and red wines, repectively.


In [4]:
import warnings
warnings.filterwarnings('ignore')

data['quality2'] = data['quality'] > 5
data_red = data[data['type']=="red"]
data_white = data[data['type']=="white"]
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type,quality2
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,white,True
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,white,True
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,white,True
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,white,True
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,white,True


# Exercise 6.3

Test the two SVM's using the different kernels (‘poly’, ‘rbf’, ‘sigmoid’)


In [5]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, f1_score
from sklearn import preprocessing
from sklearn.model_selection import train_test_split  
from sklearn.svm import SVC # "Support Vector Classifier"

y =  data_red['quality2']
X = data_red.drop(['quality','type', 'quality2'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)  

In [6]:
std_X = preprocessing.scale(X_train)

clf = SVC(kernel='poly')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Poly\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))


Kernel: Poly

             precision    recall  f1-score   support

      False       0.71      0.72      0.71       152
       True       0.74      0.74      0.74       168

avg / total       0.73      0.73      0.73       320

0.728125


In [7]:
clf = SVC(kernel='rbf')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Rbf\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))


Kernel: Rbf

             precision    recall  f1-score   support

      False       0.69      0.72      0.70       152
       True       0.73      0.70      0.72       168

avg / total       0.71      0.71      0.71       320

0.709375


In [8]:
clf = SVC(kernel='sigmoid')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Sigmoid\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))


Kernel: Sigmoid

             precision    recall  f1-score   support

      False       0.65      0.65      0.65       152
       True       0.68      0.68      0.68       168

avg / total       0.67      0.67      0.67       320

0.66875


In [9]:
y =  data_white['quality2']
X = data_white.drop(['quality','type', 'quality2'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
std_X = preprocessing.scale(X_train)


In [10]:
clf = SVC(kernel='poly')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Poly\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))


Kernel: Poly

             precision    recall  f1-score   support

      False       0.78      0.33      0.47       328
       True       0.74      0.95      0.83       652

avg / total       0.75      0.74      0.71       980

0.7448979591836735


In [11]:
clf = SVC(kernel='rbf')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Rbf\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))


Kernel: Rbf

             precision    recall  f1-score   support

      False       0.77      0.55      0.64       328
       True       0.80      0.92      0.86       652

avg / total       0.79      0.79      0.78       980

0.7938775510204081


In [12]:
clf = SVC(kernel='sigmoid')
clf.fit(std_X, y_train)
y_pred = clf.predict(preprocessing.scale(X_test))  
print("Kernel: Sigmoid\n")
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
 

Kernel: Sigmoid

             precision    recall  f1-score   support

      False       0.47      0.45      0.46       328
       True       0.73      0.74      0.74       652

avg / total       0.64      0.65      0.64       980

0.6459183673469387


# Exercise 6.4
Using the best SVM find the parameters that gives the best performance

'C': [0.1, 1, 10, 100, 1000], 'gamma': [0.01, 0.001, 0.0001]

In [13]:
C=[ 0.1, 1, 10, 100, 1000]
gamma=[0.01, 0.001, 0.0001]

y_r =  data_red['quality2']
X_r = data_red.drop(['quality','type', 'quality2'], axis=1)
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_r, y_r, test_size = 0.20)  

y_w =  data_white['quality2']
X_w = data_white.drop(['quality','type', 'quality2'], axis=1)
X_train_w, X_test_w, y_train_w, y_test_w = train_test_split(X_w, y_w, test_size = 0.20)

std_X_w = preprocessing.scale(X_train_w)
for c in C:
    for g in gamma:
        clf = SVC(kernel='rbf',degree=7,C=c,gamma=g)
        clf.fit(std_X_w, y_train_w)
        y_pred_w = clf.predict(preprocessing.scale(X_test_w))  
        print("Parametro C: ",c)
        print("Parametro gamma: ",g)
        print("Accuracy: ",accuracy_score(y_test_w,y_pred_w))  
       # print(classification_report(y_test_w,y_pred_w))

Parametro C:  0.1
Parametro gamma:  0.01
Accuracy:  0.6928571428571428
Parametro C:  0.1
Parametro gamma:  0.001
Accuracy:  0.6489795918367347
Parametro C:  0.1
Parametro gamma:  0.0001
Accuracy:  0.6489795918367347
Parametro C:  1
Parametro gamma:  0.01
Accuracy:  0.7520408163265306
Parametro C:  1
Parametro gamma:  0.001
Accuracy:  0.689795918367347
Parametro C:  1
Parametro gamma:  0.0001
Accuracy:  0.6489795918367347
Parametro C:  10
Parametro gamma:  0.01
Accuracy:  0.7612244897959184
Parametro C:  10
Parametro gamma:  0.001
Accuracy:  0.7448979591836735
Parametro C:  10
Parametro gamma:  0.0001
Accuracy:  0.6908163265306122
Parametro C:  100
Parametro gamma:  0.01
Accuracy:  0.763265306122449
Parametro C:  100
Parametro gamma:  0.001
Accuracy:  0.7510204081632653
Parametro C:  100
Parametro gamma:  0.0001
Accuracy:  0.7387755102040816
Parametro C:  1000
Parametro gamma:  0.01
Accuracy:  0.7642857142857142
Parametro C:  1000
Parametro gamma:  0.001
Accuracy:  0.7591836734693878
Pa

En el caso del vino blanco, se utiliza el kernel 'rbf' y se encuentra que los parámetros que generan mejor desempeño son: C = 1000 y gamma = 0.01.

In [14]:
std_X_r = preprocessing.scale(X_train_r)
for c in C:
    for g in gamma:
        clf = SVC(kernel='rbf',degree=7,C=c,gamma=g)
        clf.fit(std_X_r, y_train_r)
        y_pred_r = clf.predict(preprocessing.scale(X_test_r))  
        print("Parametro C: ",c)
        print("Parametro gamma: ",g)
        print("Accuracy: ",accuracy_score(y_test_r,y_pred_r))  
       # print(classification_report(y_test_w,y_pred_w))

Parametro C:  0.1
Parametro gamma:  0.01
Accuracy:  0.746875
Parametro C:  0.1
Parametro gamma:  0.001
Accuracy:  0.540625
Parametro C:  0.1
Parametro gamma:  0.0001
Accuracy:  0.540625
Parametro C:  1
Parametro gamma:  0.01
Accuracy:  0.765625
Parametro C:  1
Parametro gamma:  0.001
Accuracy:  0.74375
Parametro C:  1
Parametro gamma:  0.0001
Accuracy:  0.540625
Parametro C:  10
Parametro gamma:  0.01
Accuracy:  0.765625
Parametro C:  10
Parametro gamma:  0.001
Accuracy:  0.759375
Parametro C:  10
Parametro gamma:  0.0001
Accuracy:  0.74375
Parametro C:  100
Parametro gamma:  0.01
Accuracy:  0.76875
Parametro C:  100
Parametro gamma:  0.001
Accuracy:  0.753125
Parametro C:  100
Parametro gamma:  0.0001
Accuracy:  0.753125
Parametro C:  1000
Parametro gamma:  0.01
Accuracy:  0.796875
Parametro C:  1000
Parametro gamma:  0.001
Accuracy:  0.7625
Parametro C:  1000
Parametro gamma:  0.0001
Accuracy:  0.765625


# Exercise 6.5

Compare the results with other methods

In [18]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()

logreg.fit(X_train_w, y_train_w)
y_pred_w = logreg.predict(X_test_w)
print(logreg.coef_)
print("Accuracy: ",accuracy_score(y_test_w,y_pred_w))
print("f1 score: ",f1_score(y_test_w,y_pred_w))

logreg.fit(X_train_r, y_train_r)
y_pred_r = logreg.predict(X_test_r)
print(logreg.coef_)
print("Accuracy: ",accuracy_score(y_test_r,y_pred_r))
print("f1 score: ",f1_score(y_test_r,y_pred_r))

[[-2.01172761e-01 -5.50638295e+00  2.88424205e-01  4.88418172e-02
  -4.32367411e-01  1.33369069e-02 -3.85570445e-03 -2.48427106e+00
  -6.34930606e-01  1.14567841e+00  9.38654714e-01]]
Accuracy:  0.7591836734693878
f1 score:  0.8287373004354137
[[-0.00309141 -2.67589309 -0.72183535  0.01010696 -1.33735789  0.03713407
  -0.02325091 -1.0386632  -1.76800114  1.93727826  0.8735114 ]]
Accuracy:  0.721875
f1 score:  0.729483282674772


Al utilizar un modelo de regresión logística se obtiene que, en el caso del vino blanco, el resultado tiene mayor precisión utilizando SVM mientras que para el vino tinto, tiene mayor precisión el modelo de regresión logística.  

# Regularization

# Exercise 6.6


* Train a linear regression to predict wine quality (Continous)

* Analyze the coefficients

* Evaluate the RMSE

In [19]:
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
y =  data['quality']
X = data.drop(['quality','type', 'quality2'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)  

linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print(linreg.coef_)

from sklearn import metrics
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print(list(X))

[ 7.87881988e-02 -1.28602726e+00 -9.01442263e-02  4.41014354e-02
 -5.24707457e-01  6.43018676e-03 -2.42265299e-03 -6.54145838e+01
  4.59238536e-01  7.76986647e-01  2.56883012e-01]
RMSE: 0.7492115737970361
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']


# Exercise 6.7

* Estimate a ridge regression with alpha equals 0.1 and 1.
* Compare the coefficients with the linear regression
* Evaluate the RMSE

In [20]:
from sklearn.linear_model import Ridge
ridgereg = Ridge(alpha=0.1, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
ridgereg.coef_

0.7516526097330757


array([ 3.28670584e-02, -1.14429320e+00,  3.44173462e-02,  2.50056860e-02,
       -1.04936858e+00,  5.00919910e-03, -1.74294551e-03, -3.50102676e+01,
        2.46528658e-01,  6.49967615e-01,  2.56662584e-01])

En general, se observa una reducción en la magnitud de los coeficientes al aplicar la ridge regresión con respecto a la regresión lineal anterior.

In [21]:
from sklearn.linear_model import Ridge
ridgereg = Ridge(alpha=1, normalize=True)
ridgereg.fit(X_train, y_train)
y_pred = ridgereg.predict(X_test)
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
ridgereg.coef_

0.7840037650695478


array([ 7.53905132e-05, -5.79585252e-01,  1.82874511e-01,  4.80181550e-03,
       -1.33815551e+00,  1.78811662e-03, -5.33721132e-04, -2.47755484e+01,
        8.16162403e-02,  2.97054409e-01,  1.38919852e-01])

# Exercise 6.8

* Estimate a lasso regression with alpha equals 0.01, 0.1 and 1.
* Compare the coefficients with the linear regression
* Evaluate the RMSE

In [22]:
from sklearn.linear_model import Lasso
lassoreg = Lasso(alpha=0.01, normalize=False)
lassoreg.fit(X_train, y_train)
y_pred = lassoreg.predict(X_test)

print(lassoreg.coef_)
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

[ 5.25176256e-04 -8.95711656e-01  0.00000000e+00  1.63531443e-02
 -0.00000000e+00  7.09433568e-03 -2.00412840e-03 -0.00000000e+00
  0.00000000e+00  1.45701115e-02  3.36024110e-01]
0.758860434170095


In [23]:
from sklearn.linear_model import Lasso
lassoreg = Lasso(alpha=0.1, normalize=False)
lassoreg.fit(X_train, y_train)
y_pred = lassoreg.predict(X_test)

print(lassoreg.coef_)
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

[-0.         -0.          0.          0.00560169 -0.          0.00775869
 -0.00093088 -0.         -0.          0.          0.2771603 ]
0.7883868704348472


In [24]:
from sklearn.linear_model import Lasso
lassoreg = Lasso(alpha=1, normalize=False)
lassoreg.fit(X_train, y_train)
y_pred = lassoreg.predict(X_test)

print(lassoreg.coef_)
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

[-0.         -0.          0.         -0.         -0.          0.0008104
 -0.00040419 -0.          0.          0.          0.        ]
0.8724987119586692


La 
regresión lasso genera coeficientes aun más suavizados que la regresión ridge, con respecto a la lineal. En particular, se oobserva que a mayor alpha, los coeficientes se reducen. 

# Exercise 6.9

* Create a binary target

* Train a logistic regression to predict wine quality (binary)

* Analyze the coefficients

* Evaluate the f1score

In [25]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
y = data['quality2']
X = data.drop(['quality','type', 'quality2'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20) 

logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print(logreg.coef_)

from sklearn.metrics import f1_score
f1_score(y_test, y_pred)


[[-0.02107634 -4.02894716 -0.58016863  0.05689132 -1.52035686  0.01727209
  -0.00845489 -3.39855554 -0.11914079  1.82950374  0.84410775]]


0.810344827586207

# Exercise 6.10

* Estimate a regularized logistic regression using:
* C = 0.01, 0.1 & 1.0
* penalty = ['l1, 'l2']
* Compare the coefficients and the f1score

In [26]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = X_train.astype(float)
X_test = X_test.astype(float)
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

logreg = LogisticRegression(C=0.1, penalty='l1',solver='liblinear')
logreg.fit(X_train_scaled, y_train)
print(logreg.coef_)

[[ 0.01980796 -0.67719415 -0.06650948  0.26926473 -0.08056314  0.27075642
  -0.40108407  0.          0.07391007  0.28762564  1.04147213]]


In [29]:
C=[ 0.01, 0.1, 1.0]
penalty=['l1', 'l2']

for c in C:
    for p in penalty:
        logreg = LogisticRegression(C= c, penalty=p, solver='liblinear')
        logreg.fit(X_train_scaled, y_train)
        y_pred = logreg.predict(preprocessing.scale(X_test_scaled))  
        y_pred_prob = logreg.predict_proba(X_test_scaled)
        #print(metrics.log_loss(y_test, y_pred_prob))
        print("Parametro C: ",c)
        print("Penalty: ",p)
        print(logreg.coef_)
        print("Accuracy: ",accuracy_score(y_test,y_pred))  
        print("F1 score: ",f1_score(y_test,y_pred))  


Parametro C:  0.01
Penalty:  l1
[[ 0.         -0.4715298   0.          0.05028495  0.          0.
  -0.01380268  0.          0.          0.12599567  0.85891715]]
Accuracy:  0.7423076923076923
F1 score:  0.8057971014492753
Parametro C:  0.01
Penalty:  l2
[[ 0.07676207 -0.55250823 -0.03714039  0.27837915 -0.10889693  0.21754247
  -0.32994422 -0.15708738  0.10794533  0.26875093  0.80334581]]
Accuracy:  0.7484615384615385
F1 score:  0.8104347826086956
Parametro C:  0.1
Penalty:  l1
[[ 0.01981395 -0.67717652 -0.06651255  0.26924605 -0.0805422   0.27071666
  -0.4010074   0.          0.07391683  0.28761124  1.04148349]]
Accuracy:  0.7484615384615385
F1 score:  0.811527377521614
Parametro C:  0.1
Penalty:  l2
[[ 0.09778954 -0.67977    -0.08566517  0.35213997 -0.0881744   0.29727818
  -0.43424601 -0.11924105  0.11948656  0.31219682  0.98767491]]
Accuracy:  0.7453846153846154
F1 score:  0.8087810514153667
Parametro C:  1.0
Penalty:  l1
[[ 0.07631949 -0.70302944 -0.09104198  0.3325842  -0.0859883

Se observa que al aplicar regularización en los datos, no solamente cambia la magnitud del peso de las variables sobre la calidad, sino que también cambia la dirección de algunas de estas relaciones. Dentro de las diferentes opciones mostradas en el paso anterior, aquella que genera mejor precisión y f1_score está dada por la combinación C=0.1 y P=l1.