# MLP:
The first half of this notebook may be used to train an MLP. Training for RNN models can be found in the second half of this notebook

Note that this notebook requires the use of train_df.pkl and test_df.pkl files. These are generated in the preprocess.ipynb notebook. If you have not run this notebook, you will not have the necessary data to proceed with this notebook!

In [1]:
# Load Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from src.utils import get_batches, shuffle, train_val_split, preds_to_scores,scores_to_preds, plot_train_loss
from src.mlp import MLP
from src.rnn import RNN
import seaborn as sns
import plotly.plotly as py
import plotly.graph_objs as go

%load_ext autoreload
%autoreload 2

In [2]:
# Define the path to the data. This is the training dataframe saved from the preprocessing notebook.
# If you have not run the preprocessing notebook, go back and do so now.
data_path = './data/train_df.pkl'
train_df = pd.read_pickle(data_path)

In [3]:
# To further isolate our data, we will only examine essays from a single set
# Feel free to experiment with different essay sets by choosing a different value
# for the set variable. Sets 1, 3, 4, 5, and 6 are supported!

set = 1

df = train_df.loc[train_df['essay_set'] == set]
df.head()

Unnamed: 0,essay_id,essay_set,rater1_domain1,rater2_domain1,domain1_score,essays_embed,word_count,min_score,max_score,rater1_domain1_norm,rater2_domain1_norm,norm_score
0,1,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",299,2.0,12.0,4,4,8
1,2,1,5,4,9,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",349,2.0,12.0,5,4,9
2,3,1,4,3,7,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",236,2.0,12.0,4,3,7
4,5,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",387,2.0,12.0,4,4,8
5,6,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",204,2.0,12.0,4,4,8


In [4]:
# In order to avoid bias toward more common scores, we will limit the number
# of essays from each scoring bucket to a set value
score_df = None
min_score = int(df['min_score'].min())
max_score = int(df['max_score'].max())

n_max = 100

for i in range(min_score,max_score+1):
    if score_df is None:
        score_df = df.loc[df['domain1_score'] == i][:n_max]
    else:
        temp_df = df.loc[df['domain1_score'] == i][:n_max]
        score_df = pd.concat([score_df, temp_df])
df = score_df

In [5]:
# Extract essay vectors and corresponding scores
X = np.array(df['essays_embed'])
y = np.array(df['domain1_score'])
X = np.stack(X, axis=0)
print('There are {} training essays, each of shape {} x {}'.format(X.shape[0], X.shape[1], X.shape[2]))

There are 566 training essays, each of shape 426 x 200


These essays are the wrong shape to feed directly into the MLP. Therefore, each essay matrix needs to be flattened into a 1-D vector.

In [6]:
X_flatten = np.reshape(X, [X.shape[0], -1])
print('There are {} training essays, each a vector of length {}'.format(X_flatten.shape[0], X_flatten.shape[1]))

There are 566 training essays, each a vector of length 85200


The next step is to shuffle the data and separate it into training and validation sets.

In [7]:
X, y = shuffle(X_flatten, y)

X_train, y_train, X_val, y_val = train_val_split(X, y, train_prop=0.85)

Here we need to transform the labels to the form that the network will predict. For example, in set 1, the essays are graded on a scale from 2-12, therefore there are 11 classes into which the network will try to classify each essay. However, the network will classify essays into the scale 0-10. Therefore, this step will perform this shift on the labels. If the scoring range already starts at 0, no shift is performed.

In [8]:
if min_score != 0:
    y_train_adj = scores_to_preds(y_train, min_score)
    print('Training labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_train),max(y_train), min(y_train_adj), max(y_train_adj)))
    y_val_adj = scores_to_preds(y_val, min_score)
    print('Validation labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_val),max(y_val), min(y_val_adj), max(y_val_adj)))
else:
    print('No score adjustment necessary')
    y_train_adj = y_train
    y_val_adj = y_val

Training labels shifted from a scale of (2,12) to (0,10)
Validation labels shifted from a scale of (2,12) to (0,10)


### Initial MLP
Here we define an MLP model to train. The parameters below were the initial parameters tested on the dataset. This model learns the training set well, but is unable to generalize to the validation set. You may skip training this model to save time.

In [9]:
# User Defined Parameters
model_name = 'mlp_set1_bad'
hidden_dims = [128,64]
weight_scale = 1e-2
batch_size = 16
n_epochs = 20
l2_reg = 1e-4
keep_prob = 1
reg = False
lr = 1e-3

# Derived Parameters
input_dim = X_train.shape[1]
num_classes = max_score-min_score + 1
n_batches = round(X_train.shape[0]/batch_size)
batch_gen = get_batches(X_train, y_train_adj, batch_size, net_type='mlp')

mlp_net = MLP(input_dim=input_dim, hidden_dims=hidden_dims, num_classes=num_classes, weight_scale=weight_scale,\
              l2_reg=l2_reg, keep_prob=keep_prob, regression=reg)

In [10]:
print('Training Network...')
train_loss_hist, val_loss_hist = mlp_net.train(gen=batch_gen, X_val=X_val, y_val=y_val_adj, n_epochs=n_epochs, n_batches=n_batches, lr=lr,\
                                               save_every_n=5, model_name=model_name)

Training Network...


---------- Training epoch: 1 ----------
Epoch 1, Batch 1 -- Loss: 2.475 Validation accuracy: 0.119
Sample Grade Predictions: 
Preds:   4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4
Best validation accuracy! - Saving Model


---------- Training epoch: 2 ----------
Epoch 2, Batch 1 -- Loss: 1.746 Validation accuracy: 0.238
Sample Grade Predictions: 
Preds:   7 7 4 4 7 4 7 7 7 7 7 7 4 7 7 4 4 7 7 4
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4
Best validation accuracy! - Saving Model


---------- Training epoch: 3 ----------
Epoch 3, Batch 1 -- Loss: 1.934 Validation accuracy: 0.381
Sample Grade Predictions: 
Preds:   6 7 4 4 5 4 5 7 8 7 8 8 4 7 5 4 4 8 7 4
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4
Best validation accuracy! - Saving Model


---------- Training epoch: 4 ----------
Epoch 4, Batch 1 -- Loss: 1.229 Validation accuracy: 0.417
Sample Grade Predictions: 
Preds:   6 7 4 5 5 5 5 7 7 7 7 7 4 7 5 4 5 7 7 4

In [11]:
fig = plot_train_loss(train_loss_hist, val_loss_hist, n_batches, model_name)
py.iplot(fig, filename='basic-area')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~pmt210/0 or inside your plot.ly account where it is named 'basic-area'



Consider using IPython.display.IFrame instead



### Train your own MLP
The MLP above is able to learn the training set, but is unable to generalize for the validation set. Below is another MLP model definition. The user may change the model name and parameters, or leave the model definition as is. The model will be saved to the 'model/' directory of this project. Parameters such as the following may be defined by the user: learning rate, number of training epochs, l2 regularization, dropout probability, and regression vs classification.

After many iterations, we found the following mlp parameters yielded the best results on both the training and validation sets. Note that this model is much larger and requires a GPU to train in a reasonable amount of time. 

In [12]:
# User Defined Parameters
model_name = 'mlp_set'+'{}'.format(set)
hidden_dims = [1024,256]
weight_scale = 1e-2
batch_size = 16
n_epochs = 20
l2_reg = 1e-4
keep_prob = 0.6
reg = False
lr = 1e-4

# Derived Parameters
input_dim = X_train.shape[1]
num_classes = max_score-min_score + 1
n_batches = round(X_train.shape[0]/batch_size)
batch_gen = get_batches(X_train, y_train_adj, batch_size, net_type='mlp')

mlp_net = MLP(input_dim=input_dim, hidden_dims=hidden_dims, num_classes=num_classes, weight_scale=weight_scale,\
              l2_reg=l2_reg, keep_prob=keep_prob, regression=reg)

In [13]:
print('Training Network...')
train_loss_hist, val_loss_hist = mlp_net.train(gen=batch_gen, X_val=X_val, y_val=y_val_adj, n_epochs=n_epochs, n_batches=n_batches, lr=lr,\
                                               save_every_n=5, model_name=model_name)

Training Network...


---------- Training epoch: 1 ----------
Epoch 1, Batch 1 -- Loss: 5.514 Validation accuracy: 0.119
Sample Grade Predictions: 
Preds:   8 8 4 7 2 4 8 8 8 7 8 4 8 8 2 7 0 8 1 2
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4
Best validation accuracy! - Saving Model


---------- Training epoch: 2 ----------
Epoch 2, Batch 1 -- Loss: 2.409 Validation accuracy: 0.202
Sample Grade Predictions: 
Preds:   8 8 8 7 4 8 7 6 8 8 8 7 7 8 8 8 8 8 8 8
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4
Best validation accuracy! - Saving Model


---------- Training epoch: 3 ----------
Epoch 3, Batch 1 -- Loss: 1.730 Validation accuracy: 0.179
Sample Grade Predictions: 
Preds:   7 8 8 8 5 7 5 8 7 7 7 7 5 8 7 5 6 7 7 7
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 3 4 6 7 4


---------- Training epoch: 4 ----------
Epoch 4, Batch 1 -- Loss: 1.732 Validation accuracy: 0.357
Sample Grade Predictions: 
Preds:   6 7 5 4 5 5 4 7 8 6 8 8 5 8 6 4 5 8 8 5
Actual:  6 5 4 5 6 4 5 7 8 8 7 10 4 6 6 

In [15]:
fig = plot_train_loss(train_loss_hist, val_loss_hist, n_batches, model_name)
py.iplot(fig, filename='basic-area')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~pmt210/0 or inside your plot.ly account where it is named 'basic-area'


## Test the QWK of the trained model
Now we can use essays from the test dataset to obtain a quadratic weighted
kappa (QWK) score for the model. This metric is used to quantify how well
the model predicted the essay scores relative to random guessing. A value
of 0 indicates that the predictions were no better than random guessing,
while a value of 1 indicates perfect matching between predictions and labels.

In [16]:
data_path = './data/test_df.pkl'
test_df = pd.read_pickle(data_path)
df = test_df.loc[test_df['essay_set'] == set]
X_test = np.array(df['essays_embed'])
y_test = np.array(df['domain1_score'])
X_test = np.stack(X_test, axis=0)
X_test = np.reshape(X_test, [X_test.shape[0], -1])
print('There are {} testing essays'.format(X_test.shape[0]))
      
if min_score != 0:
    y_test_adj = scores_to_preds(y_test, min_score)
    print('Testing labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_test),max(y_test), min(y_test_adj), max(y_test_adj)))
else:
    print('No score adjustment necessary')
    y_test_adj = y_test

There are 298 testing essays
Testing labels shifted from a scale of (4,12) to (2,10)


In [17]:
preds = mlp_net.predict('./model/'+model_name, X_test)

# We need to map predictions from classes in the model to actual scores
#preds = preds_to_scores(preds, min_score=min_score)

INFO:tensorflow:Restoring parameters from ./model/mlp_set1


In [18]:
from src.utils import quadratic_weighted_kappa
y_test_adj = scores_to_preds(y_test, min_score)
k = quadratic_weighted_kappa(y_test_adj, preds, num_classes)

print('The quadratic weighted kappa score for set {} using {} is : {}'\
     .format(set, model_name, k))

The quadratic weighted kappa score for set 1 using mlp_set1 is : 0.7032761357170665


# RNN:
The second half of this notebook may be used for training an RNN - specifically an LSTM or GRU

In [19]:
# Define the path to the data
data_path = './data/train_df.pkl'
train_df = pd.read_pickle(data_path)

# To further isolate our data, we will only examine essays from a single set
# Feel free to experiment with different essay sets!
set = 1
df = train_df.loc[train_df['essay_set'] == set]
df.head()

Unnamed: 0,essay_id,essay_set,rater1_domain1,rater2_domain1,domain1_score,essays_embed,word_count,min_score,max_score,rater1_domain1_norm,rater2_domain1_norm,norm_score
0,1,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",299,2.0,12.0,4,4,8
1,2,1,5,4,9,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",349,2.0,12.0,5,4,9
2,3,1,4,3,7,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",236,2.0,12.0,4,3,7
4,5,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",387,2.0,12.0,4,4,8
5,6,1,4,4,8,"[[0.1285, 0.68849, 0.83504, -0.16483, -0.36831...",204,2.0,12.0,4,4,8


In [20]:
# In order to avoid bias toward more common scores, we will limit the number
# of essays from each scoring bucket to a set value
score_df = None
min_score = int(df['min_score'].min())
max_score = int(df['max_score'].max())

n_max = 100
for i in range(min_score,max_score+1):
    if score_df is None:
        score_df = df.loc[df['domain1_score'] == i][:n_max]
    else:
        temp_df = df.loc[df['domain1_score'] == i][:n_max]
        score_df = pd.concat([score_df, temp_df])
df = score_df

In [21]:
# Extract essay vectors and corresponding scores
X = np.array(df['essays_embed'])
y = np.array(df['domain1_score'])
X = np.stack(X, axis=0)
print('There are {} training essays, each of shape {} x {}'.format(X.shape[0], X.shape[1], X.shape[2]))

There are 566 training essays, each of shape 426 x 200


The next step is to shuffle the data and separate it into training and validation sets.

In [22]:
X, y = shuffle(X, y)
X_train, y_train, X_val, y_val = train_val_split(X, y, train_prop=0.85)

Here we need to transform the labels to the form that the network will predict. For example, in set 1, the essays are graded on a scale from 2-12, therefore there are 11 classes into which the network will try to classify each essay. However, the network will classify essays into the scale 0-10. Therefore, this step will perform this shift on the labels.

In [23]:
if min_score != 0:
    y_train_adj = scores_to_preds(y_train, min_score)
    print('Training labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_train),max(y_train), min(y_train_adj), max(y_train_adj)))
    y_val_adj = scores_to_preds(y_val, min_score)
    print('Validation labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_val),max(y_val), min(y_val_adj), max(y_val_adj)))
else:
    print('No score adjustment necessary')
    y_train_adj = y_train
    y_val_adj = y_val

Training labels shifted from a scale of (2,12) to (0,10)
Validation labels shifted from a scale of (2,12) to (0,10)


### Initial RNN
Here we define an RNN model to train. The parameters below were the initial parameters tested on the dataset. model learns the training and validation set well. It serves as a good baseline from which you can design your own RNN. If you'd like, you may skip training this model to save time and move directly to training your own model with tunable parameters.

In [24]:
# User Defined Parameters
batch_size = 32
cell_type = 'lstm'
rnn_size = 128
lr = 1e-3
n_epochs = 20
keep_prob = 1

# Derived Parameters
model_name = cell_type+'_set'+'{}'.format(set)
num_classes = max_score-min_score + 1
n_batches = round(X_train.shape[0]/batch_size)
seq_length = X_train.shape[1]
embed_size = X_train.shape[2]

X_val_t = X_val[:batch_size]
y_val_t = y_val_adj[:batch_size]
batch_gen = get_batches(X_train, y_train_adj, batch_size, net_type=cell_type)

rnn_net = RNN(num_classes, batch_size, seq_length, embed_size, cell_type=cell_type,
                 rnn_size=rnn_size, learning_rate=lr, train_keep_prob=1)

In [25]:
print('Training Network...')
train_loss_hist, val_loss_hist = rnn_net.train(batch_gen, X_val_t, y_val_t,\
                                              n_epochs, n_batches, save_every_n=5,\
                                              model_name=model_name)

Training Network...


---------- Training epoch: 1 ----------
Epoch 1, step 5 loss: 2.3478  validation accuracy: 0.21875  0.3637 sec/batch
Best validation accuracy! - Saving Model
Sample Grade Predictions: 
Preds:   5 5 5 5 7 5 5 5 8 5 5 5 5 5 5 7 5 7 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 1, step 10 loss: 2.1765  validation accuracy: 0.15625  0.4166 sec/batch
Sample Grade Predictions: 
Preds:   8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 1, step 15 loss: 2.3023  validation accuracy: 0.21875  0.4170 sec/batch
Sample Grade Predictions: 
Preds:   5 5 5 5 8 5 5 5 8 5 5 5 5 5 5 8 7 7 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5


---------- Training epoch: 2 ----------
Epoch 2, step 5 loss: 1.9614  validation accuracy: 0.3125  0.4130 sec/batch
Best validation accuracy! - Saving Model
Sample Grade Predictions: 
Preds:   7 5 5 5 8 5 5 5 8 7 5 5 7 5 5 8 8 8 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10

Epoch 13, step 10 loss: 1.5181  validation accuracy: 0.5  0.4148 sec/batch
Sample Grade Predictions: 
Preds:   7 5 6 5 8 4 6 5 8 7 5 5 6 5 6 8 7 7 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 13, step 15 loss: 1.3879  validation accuracy: 0.53125  0.3989 sec/batch
Sample Grade Predictions: 
Preds:   7 4 6 4 8 4 6 4 8 7 5 4 7 4 6 8 7 8 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5


---------- Training epoch: 14 ----------
Epoch 14, step 5 loss: 1.3106  validation accuracy: 0.375  0.3919 sec/batch
Sample Grade Predictions: 
Preds:   7 5 7 5 8 4 6 5 8 7 5 5 7 5 7 8 7 8 8 6
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 14, step 10 loss: 1.3413  validation accuracy: 0.46875  0.3932 sec/batch
Sample Grade Predictions: 
Preds:   7 4 6 4 8 4 6 4 8 7 4 4 6 4 6 8 7 7 8 4
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 14, step 15 loss: 1.0459  validation accuracy: 0.46875  0.3943 sec/batch
Sample Grade Predictions: 
Preds:   7 5 6 5 8 4 6 5 8 7 5 5 6

In [26]:
fig = plot_train_loss(train_loss_hist, val_loss_hist, n_batches, model_name)
py.iplot(fig, filename='basic-area')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~pmt210/0 or inside your plot.ly account where it is named 'basic-area'



Consider using IPython.display.IFrame instead



### Train your own RNN
The LSTM above is able to learn the training set and performance on the validation set is comparable. These preliminary results are promising, but changing hyperparameters can yield even better results. Below is another RNN model definition. Again, many parameters can be modified by the user or left alone with the parameters that yielded our best results.The model will be saved to the 'model/' directory of this project. 

After many iterations, we found the following mlp parameters yielded the best results on both the training and validation sets:

In [27]:
# User Defined Parameters

batch_size = 32
cell_type = 'gru'
rnn_size = 256
lr = 1e-3
n_epochs = 20
keep_prob = 1

# Derived Parameters
model_name = cell_type+'_set'+'{}'.format(set)
num_classes = max_score-min_score + 1
n_batches = round(X_train.shape[0]/batch_size)
seq_length = X_train.shape[1]
embed_size = X_train.shape[2]

X_val_t = X_val[:batch_size]
y_val_t = y_val_adj[:batch_size]
batch_gen = get_batches(X_train, y_train_adj, batch_size, net_type=cell_type)

rnn_net = RNN(num_classes, batch_size, seq_length, embed_size, cell_type=cell_type,
                 rnn_size=rnn_size, learning_rate=lr, train_keep_prob=1)

In [28]:
print('Training Network...')
train_loss_hist, val_loss_hist = rnn_net.train(batch_gen, X_val_t, y_val_t,\
                                              n_epochs, n_batches, save_every_n=2,\
                                              model_name=model_name)

Training Network...


---------- Training epoch: 1 ----------
Epoch 1, step 5 loss: 2.3286  validation accuracy: 0.28125  0.4493 sec/batch
Best validation accuracy! - Saving Model
Sample Grade Predictions: 
Preds:   8 5 8 5 8 5 5 5 8 8 5 5 8 5 8 8 8 8 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 1, step 10 loss: 2.2615  validation accuracy: 0.1875  0.4598 sec/batch
Sample Grade Predictions: 
Preds:   5 5 5 5 8 5 5 5 8 5 5 5 5 5 5 8 5 6 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 1, step 15 loss: 2.1431  validation accuracy: 0.25  0.4661 sec/batch
Sample Grade Predictions: 
Preds:   6 5 5 5 8 5 5 5 8 6 5 5 6 5 5 8 7 8 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5


---------- Training epoch: 2 ----------
Epoch 2, step 5 loss: 1.6735  validation accuracy: 0.4375  0.4393 sec/batch
Best validation accuracy! - Saving Model
Sample Grade Predictions: 
Preds:   7 5 6 5 8 5 6 5 8 7 5 5 7 5 6 8 8 8 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
E

Epoch 13, step 10 loss: 1.4762  validation accuracy: 0.46875  0.4502 sec/batch
Sample Grade Predictions: 
Preds:   6 5 6 4 8 4 6 5 8 6 5 5 6 4 6 8 7 7 8 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 13, step 15 loss: 1.1396  validation accuracy: 0.375  0.4629 sec/batch
Sample Grade Predictions: 
Preds:   8 5 7 5 8 4 7 5 8 8 6 5 8 5 7 8 8 8 8 6
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5


---------- Training epoch: 14 ----------
Epoch 14, step 5 loss: 1.2130  validation accuracy: 0.4375  0.4615 sec/batch
Sample Grade Predictions: 
Preds:   7 4 5 4 7 4 5 4 9 7 4 4 7 4 5 7 7 7 9 4
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 14, step 10 loss: 1.3048  validation accuracy: 0.46875  0.4618 sec/batch
Sample Grade Predictions: 
Preds:   7 5 6 5 8 4 6 5 9 7 5 5 7 5 6 9 8 8 9 5
Actual:  7 5 6 5 7 4 6 4 10 9 4 4 7 4 6 8 8 8 10 5
Epoch 14, step 15 loss: 1.2880  validation accuracy: 0.5  0.4631 sec/batch
Sample Grade Predictions: 
Preds:   7 4 6 4 8 4 6 4 8 7 5 4 7 

In [30]:
fig = plot_train_loss(train_loss_hist, val_loss_hist, n_batches, model_name)
py.iplot(fig, filename='basic-area')

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~pmt210/0 or inside your plot.ly account where it is named 'basic-area'


## Test the QWK of the trained model
Now we can use essays from the test dataset to obtain a quadratic weighted
kappa (QWK) score for the model. This metric is used to quantify how well
the model predicted the essay scores relative to random guessing. A value
of 0 indicates that the predictions were no better than random guessing,
while a value of 1 indicates perfect matching between predictions and labels.

In [31]:
data_path = './data/test_df.pkl'
test_df = pd.read_pickle(data_path)
df = test_df.loc[test_df['essay_set'] == set]
X_test = np.array(df['essays_embed'])
y_test = np.array(df['domain1_score'])
X_test = np.stack(X_test, axis=0)

print('There are {} testing essays'.format(X_test.shape[0]))
      
if min_score != 0:
    y_test_adj = scores_to_preds(y_test, min_score)
    print('Testing labels shifted from a scale of ({},{}) to ({},{})'\
          .format(min(y_test),max(y_test), min(y_test_adj), max(y_test_adj)))
else:
    print('No score adjustment necessary')
    y_test_adj = y_test

There are 298 testing essays
Testing labels shifted from a scale of (4,12) to (2,10)


In [32]:
batch_size = X_test.shape[0]
seq_length = X_test.shape[1]
embed_size = X_test.shape[2]

pred_net = RNN(num_classes, batch_size, seq_length, embed_size, cell_type=cell_type,
                 rnn_size=rnn_size, learning_rate=lr, train_keep_prob=1)
preds = pred_net.predict('./model/'+model_name, X_test)


INFO:tensorflow:Restoring parameters from ./model/gru_set1
Running network predictions


In [33]:
k = quadratic_weighted_kappa(preds[0], y_test_adj, num_classes)

print('The quadratic weighted kappa score for set {} using {} is : {}'\
     .format(set, model_name, k))

The quadratic weighted kappa score for set 1 using gru_set1 is : 0.6883838173986511


# Results Visualization

In [34]:
sets=['set1','set3','set4','set5','set6']

In [35]:
#First here is the training time for each set for each model
MLP_training_time = [170.3, 25.5, 11.7, 66.4, 33.1]
LSTM_training_time = [157.0, 35.0, 30.0, 39.1, 52.3]
GRU_training_time = [177.3, 31.4, 33.1, 42.1, 55.4]

trace1 = go.Bar(x=sets,y=MLP_training_time,name='MLP')
trace2 = go.Bar(x=sets,y=LSTM_training_time,name='LSTM')
trace3 = go.Bar(x=sets,y=GRU_training_time,name='GRU')

data = [trace1, trace2, trace3]

layout = go.Layout(barmode='group',
              title='Training Times for each Network and Essay Set',
              xaxis=dict(
                  title='Essay Set'),
              yaxis=dict(
                  title='Training Time (s)'),
              showlegend=True)


fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='grouped-bar')


Consider using IPython.display.IFrame instead



In [37]:
#Then, here is the kappa value for each set for each model
MLP_kappa = [0.725, 0.546, 0.600, 0.626, 0.512]
LSTM_kappa = [0.69, 0.579, 0.551, 0.658, 0.688]
GRU_kappa = [0.69, 0.506, 0.689, 0.664, 0.736]

trace1 = go.Bar(x=sets,y=MLP_kappa,name='MLP')
trace2 = go.Bar(x=sets,y=LSTM_kappa,name='LSTM')
trace3 = go.Bar(x=sets,y=GRU_kappa,name='GRU')

data = [trace1, trace2, trace3]

layout = go.Layout(barmode='group')

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='grouped-bar')