# Regularized Classification 

- Wine dataset from the UCI Machine Learning Repository: [data](http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data), [data dictionary](http://archive.ics.uci.edu/ml/datasets/Wine)
- **Goal:** Predict the origin of wine using chemical analysis

### Load and prepare the wine dataset

In [1]:
# read in the dataset
import pandas as pd
url = './Datasets/wine.data'
wine = pd.read_csv(url, header=None)
wine.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [2]:
# examine the response variable
wine[0].value_counts()

2    71
1    59
3    48
Name: 0, dtype: int64

In [3]:
# define X and y
X = wine.drop(0, axis=1)
y = wine[0]

In [4]:
# split into training and testing sets
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

### Logistic regression (unregularized)

In [5]:
# build a logistic regression model
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [6]:
# examine the coefficients
print (logreg.coef_)

[[ -3.99124151e-01   7.19279747e-01   8.37693694e-01  -6.15868229e-01
   -3.70377994e-02  -2.82463938e-03   1.16591034e+00   5.50062272e-02
   -3.01116872e-01  -1.99445473e-01  -9.27764817e-02   9.50749025e-01
    1.62914206e-02]
 [  9.25208053e-01  -1.26103740e+00  -8.33651892e-01   2.26252556e-01
    2.04570203e-02   3.54760905e-01   5.13094123e-01   1.77967019e-01
    7.92779927e-01  -1.69458682e+00   5.05970942e-01  -3.15504616e-01
   -1.38557568e-02]
 [ -3.48621049e-01   7.05375606e-01   1.11108903e-01   2.02375183e-01
   -1.34734683e-02  -6.06889861e-01  -1.85854093e+00  -3.86951442e-02
   -7.05962480e-01   1.08803431e+00  -3.93735372e-01  -9.41034620e-01
    1.02444278e-03]]


In [7]:
# generate predicted probabilities
y_pred_prob = logreg.predict_proba(X_test)
print (y_pred_prob)

[[  4.18229062e-03   1.38301949e-02   9.81987514e-01]
 [  3.84533237e-05   9.98810545e-01   1.15100193e-03]
 [  9.85324034e-01   1.30108815e-02   1.66508475e-03]
 [  1.71289398e-02   9.81687969e-01   1.18309145e-03]
 [  9.88618919e-01   9.93735465e-06   1.13711441e-02]
 [  2.09961449e-03   2.82895116e-02   9.69610874e-01]
 [  7.79537995e-02   8.22834161e-01   9.92120391e-02]
 [  9.98943529e-01   3.24740043e-07   1.05614645e-03]
 [  5.93224358e-04   1.27953805e-03   9.98127238e-01]
 [  3.22030032e-04   9.67960020e-01   3.17179503e-02]
 [  9.92198282e-01   3.92162652e-04   7.40955538e-03]
 [  1.65750500e-01   8.28262756e-01   5.98674422e-03]
 [  1.90619682e-04   9.99075617e-01   7.33763075e-04]
 [  9.98316173e-01   1.36901471e-03   3.14811982e-04]
 [  1.56559131e-02   9.83714953e-01   6.29133867e-04]
 [  3.77712021e-04   9.94445307e-01   5.17698120e-03]
 [  5.03260379e-07   4.13444103e-01   5.86555394e-01]
 [  9.28367834e-01   6.90614123e-02   2.57075413e-03]
 [  1.30362752e-04   9.91327

In [8]:
# calculate log loss
from sklearn import metrics
print (metrics.log_loss(y_test, y_pred_prob))

0.125095797474


### Logistic regression (regularized)

- [LogisticRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) documentation
- **C:** must be positive, decrease for more regularization
- **penalty:** l1 (lasso) or l2 (ridge)

In [9]:
# standardize X_train and X_test
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [10]:
# try C=0.1 with L1 penalty
logreg = LogisticRegression(C=0.1, penalty='l1')
logreg.fit(X_train_scaled, y_train)
print (logreg.coef_)

[[ 0.21044295  0.          0.          0.          0.          0.
   0.48771781  0.          0.          0.          0.          0.15291818
   1.47735975]
 [-0.65721045 -0.05610412 -0.11396509  0.          0.          0.          0.
   0.          0.         -0.73818854  0.24416656  0.         -0.63410758]
 [ 0.          0.          0.          0.          0.          0.
  -0.84110528  0.          0.          0.61503491 -0.49042752 -0.30554287
   0.        ]]


In [None]:
# generate predicted probabilities and calculate log loss
y_pred_prob = logreg.predict_proba(X_test_scaled)
print (metrics.log_loss(y_test, y_pred_prob))

In [11]:
# try C=0.1 with L2 penalty
logreg = LogisticRegression(C=0.1, penalty='l2')
logreg.fit(X_train_scaled, y_train)
print (logreg.coef_)

[[ 0.59163934  0.06886667  0.33592964 -0.49616684  0.111539    0.21570086
   0.40524509 -0.15526139 -0.02534651  0.05399014  0.14877346  0.42327938
   0.89815007]
 [-0.73545676 -0.32942948 -0.47995296  0.294866   -0.1500246   0.04264373
   0.14500586  0.07250763  0.17409795 -0.70726652  0.4128986   0.09997212
  -0.81284365]
 [ 0.20136567  0.30989025  0.15977925  0.18867218  0.04204443 -0.27108109
  -0.55886639  0.07486943 -0.17471153  0.68266464 -0.52385748 -0.49566967
  -0.02565631]]


In [None]:
# generate predicted probabilities and calculate log loss
y_pred_prob = logreg.predict_proba(X_test_scaled)
print (metrics.log_loss(y_test, y_pred_prob))

In [None]:
# pipeline of StandardScaler and LogisticRegression
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(StandardScaler(), LogisticRegression())

In [None]:
import numpy as np

# grid search for best combination of C and penalty
from sklearn.grid_search import GridSearchCV
C_range = 10.**np.arange(-2, 3)
penalty_options = ['l1', 'l2']
param_grid = dict(logisticregression__C=C_range, logisticregression__penalty=penalty_options)
grid = GridSearchCV(pipe, param_grid, cv=10, scoring='log_loss')
grid.fit(X, y)

In [None]:
# print all log loss scores
grid.grid_scores_

In [None]:
# examine the best model
print (grid.best_score_)
print (grid.best_params_)

## Comparing regularized linear models with unregularized linear models

**Advantages of regularized linear models:**

- Better performance
- L1 regularization performs automatic feature selection
- Useful for high-dimensional problems (p > n)

**Disadvantages of regularized linear models:**

- Tuning is required
- Feature scaling is recommended