# Homework 2: Multilayer Perceptron 

## Context
You are given the task of writing a multi-layer perceptron that determines whether user interactions with a website leads to a purchase (class 1) or not (class 0). Other data scientists have already collected and quantized the interactions into a vector of 100 numbers, which will serve as your inputs to your model. You are to predict whether future interactions will result in a purchase or not.

## Procedure
In this assignment you'll use Keras to write and tune a multi-layer perceptron. You are provided with a training dataset, and your model will be evaluated on a testing set that you won't have access to. 

The architecture of the model and how you fit it is up to you. However, you are not to use any recurrent or convolutional layers yet. Write at least two models and compare them using cross validation (See the cross validation notebook for reference). For your own sanity, just use K<=4 for KFold if you go that route. 

In [None]:
%matplotlib inline

# feel free to add more imports here
import numpy as np
import matplotlib.pyplot as plt
import sklearn.model_selection as ms
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD  # use SGD in your model
from keras.utils import to_categorical
from sklearn.metrics import accuracy_score


## Step 0: Load data

Load in the data below-- set the path correctly so that you can read in "data.csv"

In [None]:
### YOUR CODE HERE ###
path_to_data = "/tmp/data.csv"  # modify this
######################

# load data
ds = np.loadtxt(path_to_data, delimiter=",")
y_train = ds[:, 0]  # first column is the labels
X_train = ds[:, 1:]  # rest of the columns are the data

In [None]:
# we already convert y to categorical
y_train_vectorized = to_categorical(y_train)

It's good practice to specify the architecture of your model in a list to stay organized. Modify "layer_sizes" below to an architecture of your choosing. Note that the first and last elements are already included for you, so please enter your hidden layers inbetween.

## Step 1: Specify model architecture

In [None]:
### YOUR CODE HERE ###
# example:
# layer_sizes = [X_train.shape[1], 5, y_train_vectorized.shape[1]]  
# remember the first and last layers need to have the same dimensionality as your input and output
model1_layer_sizes = []
model2_layer_sizes = []
# feel free to add more models if you want to explore
######################

## Step 2: Build models
Define build_model() functions below to construct your models. You can use the below cell as a template.

In [None]:
def build_model1():  # make sure you change the function name for each model!
    model = Sequential()

    ### YOUR CODE HERE ###
    # build your model. remember the input to the first layer needs to be layer_sizes[0]
    model.add(Dense(input_dim=model1_layer_sizes[0],
                    units=model1_layer_sizes[1],
                    kernel_initializer="uniform",
                    activation="relu"))
    

    
    ######################
    # we write the last layer for you.
    # Finally, add a readout layer, mapping to output units using the softmax function
    model.add(Dense(units=model1_layer_sizes[-1], # last layer
                    kernel_initializer='uniform',
                    activation="softmax"))
    
    sgd = SGD(lr=0.001, decay=1e-7, momentum=.9)  # Stochastic gradient descent, leave these parameters fixed
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=["accuracy"])  
    # we'll have the categorical crossentropy as the loss function
    # we also want the model to automatically calculate accuracy
    return model

In [None]:
def build_model2():  # make sure you change the function name for each model!
    model = Sequential()

    ### YOUR CODE HERE ###
    # build your model. remember the input to the first layer needs to be layer_sizes[0]
    model.add(Dense(input_dim=model2_layer_sizes[0],
                    units=model2_layer_sizes[1],
                    kernel_initializer="uniform",
                    activation="relu"))
    

    
    ######################
    # we write the last layer for you.
    # Finally, add a readout layer, mapping to output units using the softmax function
    model.add(Dense(units=model2_layer_sizes[-1], # last layer
                    kernel_initializer='uniform',
                    activation="softmax"))
    
    sgd = SGD(lr=0.001, decay=1e-7, momentum=.9)  # Stochastic gradient descent, leave these parameters fixed
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=["accuracy"])  
    # we'll have the categorical crossentropy as the loss function
    # we also want the model to automatically calculate accuracy
    return model

## Step 3: Cross validate and train models

You will write the entire cross-validation process for all of your proposed models. Print the average accuracies so you can compare them.

For each .fit() function, use epochs=500, batch_size=50, verbose=0 (as in the keras-cv notebook).

In [None]:
# define kfold here:
kf = 

In [None]:
# cross validation for model 1 (refer to keras-cv)
# we can pass any dataset into kf.split(), and it will return the indices of the "train" and "validation" partitions
accuracies = []

# STEP 1: partition the data chunks and iterate through them
# write the for loop here

    
    # build your model below
    model1 = build_model1()
    
    # STEP 2: train the model on the k-1 chunks using the options above    
    model1.fit()  # fill in the arguments for .fit()
    
    # STEP 3: predict the kth chunk and evaluate accuracy 
    # this is implemented for you
    proba = model1.predict_proba(X_train[val_idx], batch_size=32)  # predict the classes for the validation set
    classes = np.argmax(proba, axis=1)
    
    # save the accuracy (implemented for you)
    accuracies.append(accuracy_score(y_train[val_idx], classes))

# STEP 4: average across the k accuracies
model1_accuracy = np.array(accuracies).mean()  # the mean performance of model 1
print(model1_accuracy)

In [None]:
# cross validation for model 2 (refer to keras-cv)
accuracies = []

# STEP 1: partition the data chunks and iterate through them


    # STEP 2: train the model on the k-1 chunks using the options above    
    model1.fit()  # fill in the arguments for .fit()
    
    # STEP 3: predict the kth chunk and evaluate accuracy 
    # this is implemented for you
    proba = model1.predict_proba(X_train[val_idx], batch_size=32)  # predict the classes for the validation set
    classes = np.argmax(proba, axis=1)
    
    # save the accuracy (implemented for you)
    accuracies.append(accuracy_score(y_train[val_idx], classes))

# STEP 4: average across the k accuracies
model2_accuracy = np.array(accuracies).mean()  # the mean performance of model 1
print(model2_accuracy)

In [None]:
# cross validation for other models if necessary

## Step 4: Select and train the final model

In [None]:
final_model =  # use the build function for the model you selected

final_model.fit(X_train, y_train_vectorized, 
                epochs=1000, batch_size=50, verbose = 0)  

## Step 5: Export (save) the final model

In [None]:
output_name =  # set the final file name here
final_model.save(output_name)  # 