## Final Movie Genre Multilabel Prediction Model (MLP Neural Net)

This model is a multi-layer perceptron neural net combining and using the label probabilities from the three previous models, including the movie poster CNN, movie overview text MLP, and the random forest model of the movie metadata.

### Import Anticipated Libraries

In [1]:
#!pip install keras 
#!pip install tensorflow
#!pip install tensorflow.python
#!pip install h5py

In [2]:
from __future__ import print_function
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
import numpy as np
import pandas as pd
from scipy import misc
import time
import pickle

In [3]:
import keras
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Conv2D, MaxPooling2D, Flatten, Dropout, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.callbacks import EarlyStopping
from keras.optimizers import SGD
from keras import backend as K
from keras.applications.inception_v3 import InceptionV3

Using TensorFlow backend.


### Load the prepared data.

In [4]:
print('Loading data...')

with open('full_dataset.pickle', 'rb') as handle:
    dataset = pickle.load(handle)
    
print(len(dataset['train_x']), 'train sequences')
print(len(dataset['test_x']), 'test sequences')
 
num_classes = len(dataset['labels'])
print(num_classes, 'classes.')

print('Data loaded.')

Loading data...
5120 train sequences
1704 test sequences
8 classes.
Data loaded.


### Stack the various model outputs into a set of inputs 

In [5]:
train_x = np.concatenate((dataset['MLP_Overview_Train_Probabilities'], 
                          dataset['CNN_Poster_Train_Probabilities'],
                          dataset['Metadata_Train_Probabilities']),
                         axis=1)

test_x = np.concatenate((dataset['MLP_Overview_Test_Probabilities'],
                         dataset['CNN_Poster_Test_Probabilities'],
                         dataset['Metadata_Test_Probabilities']),
                       axis=1)

### Build a model for prediction on the stacked inputs.

In [6]:
print('Building model...')
model = Sequential()
model.add(Dense(512, input_shape=(train_x.shape[1],)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
 
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Building model...


In [7]:
batch_size = 32
epochs = 200

#### Early stopping critera, stop training the model once validation accuracy does not improve after 5 epochs.

In [8]:
callbacks = [
    EarlyStopping(monitor='val_acc', patience=5, verbose=0)
]

#### Fit the model.

In [9]:
history = model.fit(train_x, dataset['train_y'],
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_split=0.2,
                    callbacks=callbacks)

Train on 4096 samples, validate on 1024 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200


In [10]:
score = model.evaluate(test_x, dataset['test_y'],
                       batch_size=batch_size, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test accuracy: 0.862235915493


### Calculate the Multi-label Accuracy Metrics on the Unseen (by all models) Test Data

In [11]:
test_predictions = model.predict(test_x)
test_predictions_results_bayesian = (test_predictions > 0.5) * 1.0

In [12]:
def calculate_multilabel_accuracy_metrics(y_predicted, y_true):
    # Accuracy = | (T intersect P) | / | (T union P) |
    # Precision = | (T intersect P) | / | P |
    # Recall = | (T intersect P) | / | T | 
    observations = y_predicted.shape[0]
    accuracy = 0.0
    precision = 0.0
    recall = 0.0
    for index in range(0, observations):
        t_int_p = float(sum((y_predicted[index] + y_true[index])==2))
        t_uni_p = float(sum((y_predicted[index] + y_true[index])>=1.0))
        p = float(sum(y_predicted[index]))
        t = float(sum(y_true[index]))
        accuracy += (t_int_p / t_uni_p)
        if (p > 0):
            precision += (t_int_p / p)
        recall += (t_int_p / t)
    return accuracy / observations, precision / observations, recall / observations

In [13]:
accuracy, precision, recall = calculate_multilabel_accuracy_metrics(test_predictions_results_bayesian, 
                                                                    dataset['test_y'])

In [14]:
print('Total multilabel accuracy is : ', accuracy)
print('Total multilabel precision is : ', precision)
print('Total multilabel recall is : ', recall)

Total multilabel accuracy is :  0.581201095462
Total multilabel precision is :  0.738605242567
Total multilabel recall is :  0.685299295775
