# Regularization and Generalization
Textbook by Charu Aggarwal. Chapter 4. Exercise 3. Implement perceptron on Ionosphere data. Test the effect of regularization.

## Get the data
First, obtain the Ionosphere data. 
Published by Johns Hopkins in 1989.

In [113]:
# I had not used datahub before so this took a while to figure out.
from datapackage import Package
# https://datahub.io/machine-learning/ionosphere#python
package = Package('https://datahub.io/machine-learning/ionosphere/datapackage.json')

# print list of all resources:
print(package.resource_names)

['validation_report', 'ionosphere_csv', 'ionosphere_json', 'ionosphere_zip', 'ionosphere_arff', 'ionosphere']


In [97]:
# print processed tabular data (if exists any)
for resource in package.resources:
    #print(type(resource))
    #print(resource.descriptor)
    #print(resource.descriptor['datahub']['type'])
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        ion_resource = resource
#print(csv_resource.descriptor)
#print(ion_resource.name)
#dir(ion_resource)
# ion_resource.headers
# This data consists of 34 features, 'a01'..'a34', all decimal numbers,
# plus 'class'. The headers and data are a list not a dict.

In [115]:
ion_table = ion_resource.read()
label_counts={}
min_val = ion_table[0][0]
max_val = ion_table[0][0]
for one_ion in ion_table:
    ion_class = one_ion[-1]
    if (ion_class not in label_counts):
        label_counts[ion_class] = 0
    label_counts[ion_class] += 1
    for x in range(34):
        value=float(one_ion[x])
        one_ion[x]=value # numbers that were loaded as strings
        min_val = min(min_val,value)
        max_val = max(max_val,value)

# This shows all the features are between -1 and +1 (correlations?).
# Note all the negative ones were loaded as strings.
# We have two imbalanced classes labeled g and b. 
min_val,max_val,label_counts

(-1.0, Decimal('1'), {'g': 225, 'b': 126})

## Data Prep
We know this much: must use 34 correlations to predict a binary class.
Create train vs test sets.
Apply a scaler.
Before using Perceptron, try a Logistic Regression classifier.

In [124]:
from sklearn.model_selection import train_test_split
import numpy as np
np_ion=np.array(ion_table)
np_ion.shape
X=np_ion[:,:34]
y=np_ion[:,34:35]
X.shape,y.shape

((351, 34), (351, 1))

In [125]:
X_train,X_test,y_train,y_test = train_test_split(X,y)
y_train=y_train.ravel()
y_test=y_test.ravel()
X_train.shape,y_train.shape

((263, 34), (263,))

In [126]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaler.fit(X_train)
X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)

from sklearn.linear_model import LogisticRegression
log = LogisticRegression()
log.fit(X_train,y_train)
log.score(X_test,y_test)

0.8522727272727273

## Start to train a Perceptron
Start with no regularization. Then try regularization.

In [159]:
from sklearn.linear_model import Perceptron
peep = Perceptron(penalty=None,alpha=0.01)
peep.fit(X_train,y_train)
score1=peep.score(X_train,y_train)  # optimistic score on training set

In [160]:
score2=peep.score(X_test,y_test)
print("VanillaPerceptron accuracy on train set, test set:")
score1,score2

VanillaPerceptron accuracy on train set, test set:


(0.908745247148289, 0.8181818181818182)

In [161]:
peep = Perceptron(penalty='l1',alpha=0.01)
peep.fit(X_train,y_train)
score3=peep.score(X_test,y_test)

In [162]:
peep = Perceptron(penalty='l2',alpha=0.01)
peep.fit(X_train,y_train)
score4=peep.score(X_test,y_test)

In [163]:
print("Perceptron+L1, Perceptron+L2")
score3,score4

Perceptron+L1, Perceptron+L2


(0.875, 0.8295454545454546)

## Results
Note we did not use a random seed so results change on every run.
Generally, Perceptron accuracy <= Logistic Regression, 
indicating the Perceptron is doing about as well as can be expected.

Generally, L1 and L2 regularization were no help or actually hurt.
That was unless the alpha was set to 0.01 
which seems to be a magic number for this data.
(I learned this hyperparameter on the test data. 
For a real experiment, I would learn it on a validation set,
not the test set.)
With alpha = 0.01, L2 regularization helps a little
and L1 helps a lot. 

### Did L1 regularization reduce feature dimensions?
This tests if L1 zeroed out many features, as claimed in the book.
The answer is yes.

In [167]:
peep = Perceptron(penalty='l1',alpha=0.01)
peep.fit(X_train,y_train)
peep.coef_

array([[ 4.68421204,  0.        ,  1.87586143,  0.        ,  1.21036935,
         2.01586437,  0.        ,  3.50675494,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.66519366,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        , -0.11598845,  0.        ,  0.        ,  0.91146394,
         0.        , -3.1340329 ,  0.        ,  0.        ,  0.        ,
         1.03483693,  0.        ,  0.        , -1.64605064]])

In [168]:
peep = Perceptron(penalty='l2',alpha=0.01)
peep.fit(X_train,y_train)
peep.coef_

array([[ 2.65499021,  0.        ,  3.20379293,  3.72405769,  3.37091218,
        -1.30186641,  2.39037932,  5.15498796,  2.58934064,  1.53373799,
        -0.70795674,  0.12886974,  0.32139925,  1.554493  ,  2.07367581,
         0.71903772,  0.60032163,  0.6624338 ,  2.23543353, -0.52072256,
         3.00765067, -4.23203806,  1.1359442 , -2.59423479, -0.60161967,
        -2.84845442, -3.19964909, -0.65817278,  0.66307534,  0.62286416,
         1.35601047, -0.38172148,  0.95966684, -0.37273396]])