# Intensive Module in Machine Learning
# Problem set 7: Neural Networks

If you are looking at the pdf/html version of this document, start by running the command `jupyter notebook` to launch an interactive notebook and then navigate to the correct folder and open this file `problem-set-7.ipynb`. Import your default packages for manipulating data and plotting:

In [2]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline 

## 1. Concepts

## 1.1 Backpropagation by hand (difficult)

If you really want to understand neural networks, once in your life, you have to have impolemented backpropagation by hand. You might take this exercise home.

Look at the logistic regression example from last week usin the Spam data. View the model as a one-layer neural network. Implement a NeuralNetwork class in python which stores the wights and biases in form of numpy arrays. Your class should have methods such as `fit` and `predict`. For the fit method implement backpropagation by hand to obtain the gradients and then run gradient descent.

In [3]:
data2 = pd.read_csv('Spam_Data.txt', sep=",", header=None)

#Split data into training and testing data
mask = np.random.rand(len(data2)) < 0.8
train = data2[mask]
test = data2[~mask]

#Split training data into training and cross validation data
mask = np.random.rand(len(train)) < 0.8
cvset = train[~mask]
core_train = train[mask]

X2 = train.loc[:, range(0,57)]
y2 = train.loc[:, 57]
X2_core = core_train.loc[:, range(0,57)]
y2_core = core_train.loc[:, 57]
X2_cv = cvset.loc[:, range(0,57)]
y2_cv = cvset.loc[:, 57]
X2_test = test.loc[:, range(0,57)]
y2_test = test.loc[:, 57]

print(X2.head())
print(y2.head())

     0     1     2    3     4     5     6     7     8     9   ...    47    48  \
0  0.00  0.64  0.64  0.0  0.32  0.00  0.00  0.00  0.00  0.00  ...   0.0  0.00   
2  0.06  0.00  0.71  0.0  1.23  0.19  0.19  0.12  0.64  0.25  ...   0.0  0.01   
3  0.00  0.00  0.00  0.0  0.63  0.00  0.31  0.63  0.31  0.63  ...   0.0  0.00   
4  0.00  0.00  0.00  0.0  0.63  0.00  0.31  0.63  0.31  0.63  ...   0.0  0.00   
5  0.00  0.00  0.00  0.0  1.85  0.00  0.00  1.85  0.00  0.00  ...   0.0  0.00   

      49   50     51     52    53     54   55    56  
0  0.000  0.0  0.778  0.000  0.00  3.756   61   278  
2  0.143  0.0  0.276  0.184  0.01  9.821  485  2259  
3  0.137  0.0  0.137  0.000  0.00  3.537   40   191  
4  0.135  0.0  0.135  0.000  0.00  3.537   40   191  
5  0.223  0.0  0.000  0.000  0.00  3.000   15    54  

[5 rows x 57 columns]
0    1
2    1
3    1
4    1
5    1
Name: 57, dtype: int64


## 1.2 Neural Networks with Tensorflow

Implement the above model using tensorflow. To understand the functionalitites better, please implement explicitly the placeholders for all the tensors as well as the loop iterating over the data.

In [4]:
import tensorflow as tf

#Initialise random seed for reproducability
random_state = 2000
np.random.seed(random_state)
tf.set_random_seed(random_state)

# Initialize placeholders 
x = tf.placeholder(dtype = tf.float32, shape = [None, 57])
y = tf.placeholder(dtype = tf.int32, shape = [None])
# Have 1 fully connected layer 
logits = tf.contrib.layers.fully_connected(x, 2)
# Define a loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y, logits = logits))
# Define an optimizer 
train_op = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)
# Convert logits to label indexes
correct_pred = tf.argmax(logits, 1)
# Define an accuracy metric
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

print("logits: ", logits)
print("loss: ", loss)
print("predicted_labels: ", correct_pred)

  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters


logits:  Tensor("fully_connected/Relu:0", shape=(?, 2), dtype=float32)
loss:  Tensor("Mean:0", shape=(), dtype=float32)
predicted_labels:  Tensor("ArgMax:0", shape=(?,), dtype=int64)


In [5]:
sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(1001):
        _, accuracy_val = sess.run([train_op, accuracy], feed_dict={x: X2, y: y2})

In [6]:
# Run predictions against the full test set.
predicted = sess.run([correct_pred], feed_dict={x: X2_test})[0]
# Calculate correct matches 
match_count = sum([int(y == y_) for y, y_ in zip(y2_test, predicted)])
# Calculate the accuracy
accuracy = match_count / len(y2_test)
# Print the accuracy
print("Accuracy: {:.3f}".format(accuracy))

# Close the session
sess.close()

Accuracy: 0.918


## 1.3 High-level APIs for Tensorflow: Keras

In practice, we can use high-level APIs such as Keras to do the above. Now building the network becomes a one-liner and you can use `model.fit(X)` and `model.predict(X,y)` similar to Sklean. Build the above neural network using Keras.

In [7]:
import keras
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(units=1, activation='sigmoid',input_dim=57))
model.compile(loss='binary_crossentropy', optimizer='sgd',metrics=["accuracy"])
# summarize the model
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1)                 58        
Total params: 58
Trainable params: 58
Non-trainable params: 0
_________________________________________________________________


Using TensorFlow backend.


In [9]:
#Early_Stop = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1, mode='auto')
history = model.fit(X2.values, y2, epochs=100, batch_size=32, verbose=0)
#tl,vl = historyBNN.history['loss'], historyBNN.history['val_loss']           

In [18]:
ypred = model.predict(X2_test.values, batch_size=32)
ypred[ypred > 0.5] = 1
ypred[ypred <= 0.5] = 0
test_acc = np.mean(np.equal(ypred, np.expand_dims(y2_test,axis=1)))
print('Test accuracy:', test_acc)

Test accuracy: 0.9218585005279831


# 2. Practice

## 2.1 Spam classification

Above you already implemented a one-layer neural network in Keras for the Spam data.

a) Now add one hidden-layer to the model. Optimise your hyper-parameters and report on the best model. 

In [20]:
from keras.layers.core import Lambda
from keras import backend as K
from keras.optimizers import SGD

hidden_units = [128,256,512,1024]
dropout_ratios = [0.2,0.4,0.6]

scores = np.zeros(len(hidden_units))
dropout_value = np.zeros(len(hidden_units))
        
for ii in range(0,len(hidden_units)):
    best_score = 0.0
    for d in dropout_ratios:
        model = Sequential()
        model.add(Dense(units=hidden_units[ii], activation='relu', input_dim=57))
        model.add(Lambda(lambda x: K.dropout(x,level=d)))
        model.add(Dense(units=1))
        sgd = SGD(lr=0.01, momentum=0.9, nesterov=True)
        model.compile(loss='binary_crossentropy', optimizer='sgd',metrics=["accuracy"])
        history = model.fit(X2_core.values, y2_core, epochs=1000, batch_size=32, verbose=0)
        ypred = model.predict(X2_cv.values, batch_size=32)
        ypred[ypred > 0.5] = 1
        ypred[ypred <= 0.5] = 0
        valid_acc = np.mean(np.equal(ypred, np.expand_dims(y2_cv,axis=1)))
        if valid_acc > best_score:
            best_score = valid_acc
            scores[ii] = valid_acc
            dropout_value[ii] = d
    print(hidden_units[ii])

128
256
512
1024


In [23]:
print(scores)

[0.61538462 0.61538462 0.61538462 0.61538462]


b) Try different architectures, e.g. with 2 hidden layers.

In [25]:
model = Sequential()
model.add(Dense(units=512, activation='relu',input_dim=57))
model.add(Lambda(lambda x: K.dropout(x,level=0.2)))
model.add(Dense(units=256, activation='relu'))
model.add(Lambda(lambda x: K.dropout(x,level=0.2)))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd',metrics=["accuracy"])
# summarize the model
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_31 (Dense)             (None, 512)               29696     
_________________________________________________________________
lambda_16 (Lambda)           (None, 512)               0         
_________________________________________________________________
dense_32 (Dense)             (None, 256)               131328    
_________________________________________________________________
lambda_17 (Lambda)           (None, 256)               0         
_________________________________________________________________
dense_33 (Dense)             (None, 1)                 257       
Total params: 161,281
Trainable params: 161,281
Non-trainable params: 0
_________________________________________________________________


## 2.2 Continue practising ...

Look at another example from last week, like the heart data, and try to build the classification model using a neural network.