<a href="https://colab.research.google.com/github/aml-spring-19/homework-5-cv5/blob/master/HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import pandas as pd
import keras

# plots
import matplotlib.pyplot
import seaborn as sns
%matplotlib inline

# processing & model selection
from keras.utils import to_categorical
from sklearn.model_selection import GridSearchCV, train_test_split, StratifiedShuffleSplit

# models
from keras import regularizers
from keras.models import Sequential
from keras.layers import Dense, Activation, BatchNormalization
from keras.wrappers.scikit_learn import KerasClassifier

# other
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)

# Homework 5

### Task 1

Task is to run a multilayer perceptron with two hidden layers and relu activations, using the Keras Sequential Interface. We also tune for regularization strength and number of hidden units. 

Importing the data:

In [2]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Map the target variable:

In [3]:
print(df.species.unique())

['setosa' 'versicolor' 'virginica']


In [0]:
mapping = {'setosa': 0, 'versicolor': 1, 'virginica': 2}
df.species.replace(mapping, inplace = True)

Split our dataset into training and test sets. 

In [0]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('species', axis=1), df.species, test_size=0.2)

Encode the target:

In [0]:
num_species = len(df['species'].unique())

y_train = to_categorical(y_train, num_species)
y_test = to_categorical(y_test, num_species)

Make our model:

In [7]:
X_train.shape

(120, 4)

In [0]:
def make_model(hidden_size, reg_strength, optimizer='adam'):
    """
    Create the model to pass to the Keras Classifier
    """
    
    model = Sequential([
        Dense(hidden_size, input_shape=(4,), activation='relu',   # first layer
              kernel_regularizer=regularizers.l2(reg_strength)),
        Dense(hidden_size, activation='relu',                     # second layer
              kernel_regularizer=regularizers.l2(reg_strength)),
        Dense(3, activation='softmax')                            # output layer
    ])
    
    # compile the above model
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    
    return model

Before proceeding any further, we should initialize the cross-validation method to be used in the grid search later on. 

In [0]:
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=42)

In [10]:
clf = KerasClassifier(make_model, epochs=10, verbose=0)

param_grid = {'hidden_size': [32, 64, 128],
              'reg_strength': np.logspace(-3, -1, 5)}

grid = GridSearchCV(clf, param_grid=param_grid, cv=sss, n_jobs=-1)
grid.fit(X_train, y_train)



Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


GridSearchCV(cv=StratifiedShuffleSplit(n_splits=3, random_state=42, test_size=0.2,
            train_size=None),
       error_score='raise-deprecating',
       estimator=<keras.wrappers.scikit_learn.KerasClassifier object at 0x7f8c6da0a588>,
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'hidden_size': [32, 64, 128], 'reg_strength': array([0.001  , 0.00316, 0.01   , 0.03162, 0.1    ])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [11]:
print('Best score: {}'.format(np.round(grid.best_score_, 2)))
print('Best params: {}'.format(grid.best_params_))

Best score: 0.97
Best params: {'hidden_size': 128, 'reg_strength': 0.0031622776601683794}


In [12]:
score = grid.score(X_test, y_test)
print('Test score: {}'.format(np.round(score, 2)))

Test score: 0.97


### Task 2

Train a multilayer perceptron (fully connected) on the Fashion MNIST dataset. 
1. Use 10000 samples from the training set for model selection and to compute learning curves (accuracy vs epochs).
2. Compare the following models and plot their learning curves:
    - Vanilla Model
    - Model using dropout
    - Model using batch normalization and residual connections (but no dropout)





In [0]:
# loading the data
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

In [15]:
print('Training size: {}, Test size: {}'.format(X_train.shape[0],X_test.shape[0]))
print('Each image has a shape of: {}x{}'.format(X_train.shape[1], X_train.shape[2]))

Training size: 60000, Test size: 10000
Each image has a shape of: 28x28


Before proceeding with the modeling, we have to process our data accordingly. 

In [0]:
# reshape X and cast to float
X_train = X_train.reshape(60000, 784); X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32'); X_test = X_test.astype('float32')

# scale X
X_train /= 255; X_test /= 255

# encode the target accordingly
n_classes = len(np.unique(y_train))
y_train = to_categorical(y_train, n_classes); y_test = to_categorical(y_test, n_classes)

**Vanilla Model**

We start off by tuning and eventually training our vanilla model. Vanilla models are typically used to describe simple networks with 1 layer, so we'll only look at optimizing the number of hidden units in this case. 

In [0]:
# def make_model(hidden_size):
#     """
#     Create the model to pass to the Keras Classifier
#     """
    
#     model = Sequential([
#         Dense(hidden_size, input_shape=(784,), activation='relu'),  # first layer
#         Dense(3, activation='softmax')                            # output layer
#     ])
    
#     # compile the above model
#     model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    
#     return model