# DKT Model

This file trains a DKT model with Assistment data and tests the model.

In [72]:
# Modules
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn import metrics
import random
import math

Read in the data we pre-processed. It contains each student's action sequence.

In [73]:
# Limit the data size
skill_cut = 150        # limit skill amounts
student_cut = 5000    # limit sequences

#dataset = pd.read_csv("Assistments/assistment_for_dkt.csv")
dataset = pd.read_csv("synthetic/set_10.csv")

dataset = dataset[dataset['skill'] < skill_cut]
print dataset.columns
num_records = len(dataset)
num_skills = len(dataset['skill'].value_counts())
num_actions = 2 * num_skills    # action: every skill correct/wrong
num_labels = num_skills + 1     # one-hot question, plus one bit for correct/wrong
num_students = len(dataset['student'].value_counts())
print str(num_records) + " problem records"
print str(num_skills) + " skills"
print str(num_students) + " students"
print str(np.sum(dataset['correct'].values)) + " correct answers"

Index([u'student', u'skill', u'correct'], dtype='object')
200000 problem records
5 skills
4000 students
121560 correct answers


The following LSTM is based on the one in the Udacity Assignment. The structure of LSTM is the one introduced in this [article](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

In [74]:
# Hyper parameters to Tune
num_hidden = 200
init_mean = 0
init_stddev = 0.001
# batch_size sequences, with the length of time_window
batch_size = 100
time_window = 50
# Training
# We are using Adams Optimizer, so no hyperparameter.
clipping_norm = 2
dropout_keep = 1.0

train_ratio = 0.5

#### Assistments Tuning
|set| num hidden | init mean | init stddev | batch size |time window|clipping norm|Dropout| AUC    | Overfit After |
|:-:|:----------:|:---------:|:-----------:|:----------:|:---------:|:-----------:|:-----:|:------:|:-------------:|
| 1 |     200    |     0     |    0.001    |     50     |    50     |     10      |   1   | 0.8152 | epoch 8       |
| 2 |     200    |     0     |    0.001    |    100     |    50     |     10      |   1   | 0.8172 | epoch 9       |
| 3 |     200    |     0     |    0.001    |    100     |    50     |      5      |   1   | 0.8173 | epoch 9       |
| 4 |     200    |     0     |    0.001    |    100     |    50     |      2      |   1   | 0.8177 | epoch 8       | 
| 5 |     200    |   0.01    |    0.001    |    100     |    50     |      2      |   1   | 0.8152 | epoch 10      |
| 6 |     200    |     0     |    0.001    |    100     |   100     |      2      |   1   | 0.8169 | epoch 19      |
| 7 |     200    |     0     |    0.001    |    100     |    50     |      2      |  0.5  | 0.8174 | epoch 13      | 
| 8 |     200    |     0     |    0.001    |    100     |    50     |      2      |  0.5  | 0.8250 | epoch 15      | 
| 9 |     200    |     0     |    0.001    |    100     |    50     |      2      |  0.2  | 0.8185 | epoch 20+     |
|10 |     200    |     0     |    0.001    |    100     |    50     |      2      |   1   | 0.8248 | epoch 8       |

*Set 1-7 use 60% data to train, the rest to test.*
*Set 8- use 80% data to train, the rest to test. (As Piech did)*

As for now, none of the hyperparameters seem to have a major influence on the performance. So probaly we'll just leave it here. Note that we are just using the default AdamOptimizer and haven't tuned even one bit.

The padding seems matter a lot, since batch size and time window seem to affect the performance a lot. A larger time window results in slower convergence, and looks like less prone to overfit.

AUC drop in one epoch does not necessarily mean that the model has overfitted. However, our model seems to overfit merely after 10 epoches, therefore we need to add regularization tricks, like dropout.
In this table, dropout means the keep_prob. According to set 7, it does help to reduce overfitting, yet it doesn't seem to improve the performance.


#### Synthetic Tuning
The followings are the results for the synthetic data. We use 50% data to train, and the rest to test (as Mozer's work did).

In [75]:
# LSTM Model
graph = tf.Graph()
with graph.as_default():
    # Parameters: _x for new input, _m for old output, _b for bias
    # Input gate
    input_x = tf.Variable(tf.truncated_normal([num_actions, num_hidden], init_mean, init_stddev))
    input_m = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], init_mean, init_stddev))
    input_b = tf.Variable(tf.zeros([1, num_hidden]))
    # Forget gate
    forget_x = tf.Variable(tf.truncated_normal([num_actions, num_hidden], init_mean, init_stddev))
    forget_m = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], init_mean, init_stddev))
    forget_b = tf.Variable(tf.zeros([1, num_hidden]))
    # Update cell:                             
    update_x = tf.Variable(tf.truncated_normal([num_actions, num_hidden], init_mean, init_stddev))
    update_m = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], init_mean, init_stddev))
    update_b = tf.Variable(tf.zeros([1, num_hidden]))
    # Output gate:
    output_x = tf.Variable(tf.truncated_normal([num_actions, num_hidden], init_mean, init_stddev))
    output_m = tf.Variable(tf.truncated_normal([num_hidden, num_hidden], init_mean, init_stddev))
    output_b = tf.Variable(tf.zeros([1, num_hidden]))
    # Variables saving state across the sequence (length: time_window).
    saved_output = tf.Variable(tf.zeros([batch_size, num_hidden]), trainable=False)
    saved_state = tf.Variable(tf.zeros([batch_size, num_hidden]), trainable=False)
    # Classifier weights and biases.
    classify_w = tf.Variable(tf.truncated_normal([num_hidden, num_skills], init_mean, init_stddev))
    classify_b = tf.Variable(tf.zeros([num_skills]))
  
    def lstm_train_cell(i, o, state):
        # input, last/saved_output, last/saved_state
        input_gate = tf.sigmoid(tf.nn.dropout(tf.matmul(i, input_x), dropout_keep) + tf.matmul(o, input_m) + input_b)
        forget_gate = tf.sigmoid(tf.nn.dropout(tf.matmul(i, forget_x), dropout_keep) + tf.matmul(o, forget_m) + forget_b)
        update = tf.tanh(tf.nn.dropout(tf.matmul(i, update_x), dropout_keep) + tf.matmul(o, update_m) + update_b)
        state = forget_gate * state + input_gate * update
        output_gate = tf.sigmoid(tf.nn.dropout(tf.matmul(i, output_x), dropout_keep) + tf.matmul(o, output_m) + output_b)
        # return new_output, new_state
        return output_gate * tf.tanh(state), state

    # Input data.
    inputs = list()
    question_labels = list()
    action_labels = list()    # only when training
    for _ in range(time_window):
        inputs.append(tf.placeholder(tf.float32, shape=[batch_size, num_actions]))
        question_labels.append(tf.placeholder(tf.float32, shape=[batch_size, num_skills]))
        action_labels.append(tf.placeholder(tf.float32, shape=[batch_size, ]))
    
    # State resets when starting a new sequence
    reset_state = tf.group(saved_output.assign(tf.zeros([batch_size, num_hidden])),
                           saved_state.assign(tf.zeros([batch_size, num_hidden])))
    
    outputs = list()
    output = saved_output
    state = saved_state
    for i in inputs:
        output, state = lstm_train_cell(i, output, state)
        outputs.append(output)

    # State saving across different segment of a sequence
    with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):
        logits = tf.nn.xw_plus_b(tf.concat(0, outputs), classify_w, classify_b)
        # logits of the actual encountered problem:
        logits_of_interest = tf.reduce_sum(tf.mul(logits, tf.concat(0, question_labels)), 1)
        truth = tf.reshape(tf.concat(0, action_labels), [-1])    # flatten
        # binary cross entropy: padding would introduce some constant loss
        loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits_of_interest, truth))
    
    optimizer = tf.train.AdamOptimizer()
    gradients, var = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, clipping_norm)
    optimizer = optimizer.apply_gradients(zip(gradients, var))
    
    prediction = tf.sigmoid(logits_of_interest)
    
    # Testing
    def lstm_test_cell(i, o, state):
        # no dropout
        input_gate = tf.sigmoid(tf.matmul(i, input_x) + tf.matmul(o, input_m) + input_b)
        forget_gate = tf.sigmoid(tf.matmul(i, forget_x) + tf.matmul(o, forget_m) + forget_b)
        update = tf.tanh(tf.matmul(i, update_x) + tf.matmul(o, update_m) + update_b)
        state = forget_gate * state + input_gate * update
        output_gate = tf.sigmoid(tf.matmul(i, output_x) + tf.matmul(o, output_m) + output_b)
        return output_gate * tf.tanh(state), state
    
    test_outputs = list()
    test_output = saved_output
    test_state = saved_state
    for i in inputs:
        test_output, test_state = lstm_test_cell(i, test_output, test_state)
        test_outputs.append(test_output)

    with tf.control_dependencies([saved_output.assign(test_output), saved_state.assign(test_state)]):
        test_logits = tf.nn.xw_plus_b(tf.concat(0, test_outputs), classify_w, classify_b)
        test_logits_of_interest = tf.reduce_sum(tf.mul(test_logits, tf.concat(0, question_labels)), 1)
    
    test_status = tf.sigmoid(test_logits)
    test_prediction = tf.sigmoid(test_logits_of_interest)

Genrating input sequences for LSTM is a bit complicated. The general idea is first take a batch of students then pad their sequence to the same length. When feeding to LSTM, we feed one "window"(time interval).

In [76]:
class DataGenerator(object):
    def __init__(self, dataset, train_ratio):
        # convert file to sequence
        dataset = dataset.values
        seqs = list()
        last_student = -1
        print dataset.shape
        for i in range(len(dataset)):
            if dataset[i][0] != last_student:    # a new student
                last_student = dataset[i][0]
                seqs.append([(dataset[i][1], dataset[i][2])])  # (skill, correct)
            else:     # same student
                seqs[-1].append((dataset[i][1], dataset[i][2]))
        del dataset
        
        tot_seqs = min(len(seqs), student_cut)
        print "total: %d sequences" % tot_seqs
        
        # split train and test
        train_size = int(tot_seqs * train_ratio)
        train_seq_cnt = 0
        self._train_seqs = list()
        for i in range(train_size):
            self._train_seqs.append(seqs[i])
            train_seq_cnt += len(seqs[i])
        test_seq_cnt = 0
        self._test_seqs = list()
        for i in range(train_size, tot_seqs):
            self._test_seqs.append(seqs[i])
            test_seq_cnt += len(seqs[i])
        print "%d records for train" % train_seq_cnt
        print "%d records for test" % test_seq_cnt
        self._tot_train_record = train_seq_cnt
        self._tot_test_record = test_seq_cnt
        
        # takes around 2GB memory:
        self._train_inputs = []
        self._train_labels = []
        self.generate_batch(self._train_seqs, self._train_inputs, self._train_labels)
        
        self._test_inputs = []
        self._test_labels = []
        self.generate_batch(self._test_seqs, self._test_inputs, self._test_labels)
        
        print "all batch generated"
        
        self._train_cursor = -1
        self._test_cursor = -1
        
    def get_train_batch_num(self):
        return len(self._train_inputs)
    
    def get_test_batch_num(self):
        return len(self._test_inputs)
    
    def get_train_batch(self):
        self._train_cursor += 1
        if self._train_cursor == len(self._train_inputs):
            self._train_cursor = 0
        return self._train_inputs[self._train_cursor], self._train_labels[self._train_cursor]
    
    def get_test_batch(self):
        self._test_cursor += 1
        if self._test_cursor == len(self._test_inputs):
            self._test_cursor = 0
        return self._test_inputs[self._test_cursor], self._test_labels[self._test_cursor]
    
    def generate_batch(self, seqs_pool, inputs, labels):
        seq_count = len(seqs_pool)
        num_batch = int(math.ceil(float(seq_count) / batch_size))
        correct_cnt = 0
        for start in range(0, seq_count, batch_size):            
            end = min(seq_count, start + batch_size)
            maxlen = 0
            for i in range(start, end):
                if maxlen < len(seqs_pool[i]):
                    maxlen = len(seqs_pool[i])
            num_window = int(math.ceil(float(maxlen) / time_window))
            
            # setup empty data (i.e., padded with full 0s)
            inputs.append([])
            labels.append([])
            for _ in range(num_window):
                inputs[-1].append([])
                labels[-1].append([])
                for _ in range(time_window):
                    inputs[-1][-1].append(np.zeros([batch_size, num_actions], dtype=np.float32))
                    labels[-1][-1].append(np.zeros([batch_size, num_labels], dtype=np.float32))
            
            # fill in data
            for i in range(start, end):
                pos_in_batch = i - start    # position in batch
                seq = seqs_pool[i]
                # from back to front
                for back_offset in range(1, len(seq) + 1):
                    # find the row of the record
                    window_offset = - int(math.ceil(float(back_offset) / time_window))
                    frame_offset = - back_offset % time_window
                    if frame_offset == 0:
                        frame_offset = - time_window
                    # code the record by setting ones
                    record = seq[- back_offset]
                    labels[-1][window_offset][frame_offset][pos_in_batch][record[0]] = 1
                    labels[-1][window_offset][frame_offset][pos_in_batch][num_skills] = record[1]
                    
                    input_back_offset = back_offset - 1    # skew input backward 1 time step
                    if input_back_offset == 0:
                        continue
                    input_window_offset = - int(math.ceil(float(input_back_offset) / time_window))
                    input_frame_offset = - input_back_offset % time_window
                    if input_frame_offset == 0:
                        input_frame_offset = - time_window
                    inputs[-1][input_window_offset][input_frame_offset][pos_in_batch][2 * record[0] + record[1]] = 1

The following session trains and runs the LSTM.

In [77]:
# Running Specifications
num_epochs = 50
test_frequency = 1
output_frequency = 3

data_generator = DataGenerator(dataset, train_ratio)

(200000, 3)
total: 4000 sequences
100000 records for train
100000 records for test
all batch generated


In [78]:
with tf.Session(graph=graph) as session:
    # Initialize
    tf.initialize_all_variables().run()
    mean_loss = 0
    for epoch in range(num_epochs):
        pred_all = []
        truth_all = []
        for batch_no in range(data_generator.get_train_batch_num()):
            batch_inputs, batch_labels = data_generator.get_train_batch()
            reset_state.run()    # new sequence
            for input_window, label_window in zip(batch_inputs, batch_labels):
                
                feed_dict = dict()
                for i in range(time_window):
                    feed_dict[inputs[i]] = input_window[i]
                    feed_dict[question_labels[i]] = label_window[i][:, 0:num_skills]
                    feed_dict[action_labels[i]] = label_window[i][:, num_skills]
                
                _, l, pred = session.run([optimizer, loss, prediction], feed_dict=feed_dict)
                mean_loss += l
                label_all = np.concatenate(label_window, axis=0)
                # Exclude padded actions
                for i in range(len(pred)):
                    if np.sum(label_all[i]) != 0:
                        pred_all.append(pred[i])
                        truth_all.append(label_all[i][num_skills])
        
        assert len(pred_all) == data_generator._tot_train_record
        assert len(truth_all) == data_generator._tot_train_record
        print "epoch " + str(epoch) + ": loss = " + str(mean_loss)
        print "Train AUC = " + str(metrics.roc_auc_score(truth_all, pred_all))
        mean_loss = 0
        
        if epoch % test_frequency == 0:
            pred_all = []
            truth_all = []
            for batch_no in range(data_generator.get_test_batch_num()):
                batch_inputs, batch_labels = data_generator.get_test_batch()
                reset_state.run()
                for input_window, label_window in zip(batch_inputs, batch_labels):
                    feed_dict = dict()
                    for i in range(time_window):
                        feed_dict[inputs[i]] = input_window[i]
                        feed_dict[question_labels[i]] = label_window[i][:, 0:num_skills]
                        feed_dict[action_labels[i]] = np.zeros([batch_size, ])      # No need to give the target
                    
                    pred = test_prediction.eval(feed_dict)
                    label_all = np.concatenate(label_window, axis=0)
                    # Exclude padded actions
                    for i in range(len(pred)):
                        if np.sum(label_all[i]) != 0:
                            pred_all.append(pred[i])
                            truth_all.append(label_all[i][num_skills])
            assert len(pred_all) == data_generator._tot_test_record
            assert len(truth_all) == data_generator._tot_test_record
            print "Test AUC = " + str(metrics.roc_auc_score(truth_all, pred_all)) + "    "
            
            if epoch % output_frequency == 0:
                pred_action = file("prediction@epoch_" + str(epoch) + '.csv', 'w')
                pred_action.write('pred,truth\n')
                for i in range(len(pred_all)):
                    pred_action.write(str(pred_all[i]) + ',' + str(truth_all[i]) + '\n')
                pred_action.flush()
                pred_action.close()

epoch 0: loss = 13.6827976704
Train AUC = 0.515000529531
Test AUC = 0.584912453392    
epoch 1: loss = 13.1435196996
Train AUC = 0.600681757112
Test AUC = 0.612999938733    
epoch 2: loss = 12.9146568179
Train AUC = 0.633367989542
Test AUC = 0.64319886649    
epoch 3: loss = 12.7293676734
Train AUC = 0.651073460989
Test AUC = 0.65011028235    
epoch 4: loss = 12.6690826416
Train AUC = 0.652727859873
Test AUC = 0.64929161729    
epoch 5: loss = 12.6387900114
Train AUC = 0.65455916542
Test AUC = 0.651058855692    
epoch 6: loss = 12.6218128204
Train AUC = 0.655307708842
Test AUC = 0.651616834941    
epoch 7: loss = 12.6067184806
Train AUC = 0.656217494866
Test AUC = 0.652427379945    
epoch 8: loss = 12.5932086706
Train AUC = 0.656948762235
Test AUC = 0.653249392922    
epoch 9: loss = 12.5792400837
Train AUC = 0.657703359308
Test AUC = 0.65422925813    
epoch 10: loss = 12.5626182556
Train AUC = 0.658566439879
Test AUC = 0.655813939798    
epoch 11: loss = 12.5413085818
Train AUC = 0.65

#### Output Log
[Set 1]  
epoch 0: loss = 567.766662002    Test AUC = 0.769160009625    
epoch 1: loss = 561.678450704    Test AUC = 0.792352762107    
epoch 2: loss = 559.633854687    Test AUC = 0.802843348663    
epoch 3: loss = 558.392439961    Test AUC = 0.809475230732    
epoch 4: loss = 557.570638657    Test AUC = 0.813102115054    
epoch 5: loss = 556.914604604    Test AUC = 0.814558511321    
epoch 6: loss = 556.38486594     Test AUC = 0.814910906558    
epoch 7: loss = 555.848721504    Test AUC = 0.815193798888    
epoch 8: loss = 555.357498288    Test AUC = 0.815226028808    
epoch 9: loss = 554.890223265    Test AUC = 0.814296198834    
epoch 10: loss = 554.397933245   Test AUC = 0.81007721421    

[Set 2]  
epoch 0: loss = 332.910510778
Test AUC = 0.674394648502    
epoch 1: loss = 330.410343766
Test AUC = 0.769532576628    
epoch 2: loss = 328.604476213
Test AUC = 0.789874732514    
epoch 3: loss = 327.884168208
Test AUC = 0.799412435034    
epoch 4: loss = 327.37021488
Test AUC = 0.80563279165    
epoch 5: loss = 326.957753122
Test AUC = 0.809847454097    
epoch 6: loss = 326.599114776
Test AUC = 0.814033516832    
epoch 7: loss = 326.286964297
Test AUC = 0.816342685584    
epoch 8: loss = 326.063906908
Test AUC = 0.816886560374    
epoch 9: loss = 325.838874578
Test AUC = 0.817185123433    
epoch 10: loss = 325.609891176
Test AUC = 0.817061722639    
epoch 11: loss = 325.378716528
Test AUC = 0.816330687684    
epoch 12: loss = 325.160521328
Test AUC = 0.815502604237  

[Set 3]   
epoch 0: loss = 332.835750043
Test AUC = 0.718647725378    
epoch 1: loss = 330.070697308
Test AUC = 0.777552152737    
epoch 2: loss = 328.405154288
Test AUC = 0.790903209804    
epoch 3: loss = 327.849017143
Test AUC = 0.800270027838    
epoch 4: loss = 327.271425068
Test AUC = 0.807289911952    
epoch 5: loss = 326.908729076
Test AUC = 0.811421508002    
epoch 6: loss = 326.619960666
Test AUC = 0.813599784297    
epoch 7: loss = 326.336533487
Test AUC = 0.815815462967    
epoch 8: loss = 326.063607395
Test AUC = 0.817161858418    
epoch 9: loss = 325.825604141
Test AUC = 0.81731912728    
epoch 10: loss = 325.611338079
Test AUC = 0.816776815415    
epoch 11: loss = 325.375428975
Test AUC = 0.815700919001    
epoch 12: loss = 325.164224088
Test AUC = 0.814964095224    
epoch 13: loss = 324.97730583
Test AUC = 0.81452376418   

[Set 4] *Current Best Performance!*       
epoch 0: loss = 332.829924166
Test AUC = 0.664068842513    
epoch 1: loss = 330.335905254
Test AUC = 0.773178666242    
epoch 2: loss = 328.584171534
Test AUC = 0.789058629209    
epoch 3: loss = 327.841807842
Test AUC = 0.800505674067    
epoch 4: loss = 327.341259539
Test AUC = 0.806546229115    
epoch 5: loss = 326.919404149
Test AUC = 0.810932691401    
epoch 6: loss = 326.596299648
Test AUC = 0.813966169368    
epoch 7: loss = 326.293343902
Test AUC = 0.816576452397    
epoch 8: loss = 326.05825007
Test AUC = 0.817669991178    
epoch 9: loss = 325.900897801
Test AUC = 0.817252342778    
epoch 10: loss = 325.680461705
Test AUC = 0.817112690406    
epoch 11: loss = 325.424844742
Test AUC = 0.816096301841 

[Set 5]
Not so good, didn't copy ...

[Set 6]
epoch 0: loss = 169.199989498
Test AUC = 0.694248942971    
epoch 1: loss = 168.180951178
Test AUC = 0.758151402947    
epoch 2: loss = 167.458352327
Test AUC = 0.776980935197    
epoch 3: loss = 167.064126432
Test AUC = 0.78581568835    
epoch 4: loss = 166.804977894
Test AUC = 0.792739664357    
epoch 5: loss = 166.616363168
Test AUC = 0.79761515851    
epoch 6: loss = 166.471256316
Test AUC = 0.802961889688    
epoch 7: loss = 166.332275212
Test AUC = 0.806399543644    
epoch 8: loss = 166.206695676
Test AUC = 0.80953400805    
epoch 9: loss = 166.08298558
Test AUC = 0.811898844887    
epoch 10: loss = 166.017994046
Test AUC = 0.811594235753    
epoch 11: loss = 165.930454969
Test AUC = 0.814531147132    
epoch 12: loss = 165.844239295
Test AUC = 0.81505496762    
epoch 13: loss = 165.779064178
Test AUC = 0.815770047319    
epoch 14: loss = 165.706398845
Test AUC = 0.816363040141    
epoch 15: loss = 165.610882163
Test AUC = 0.816819624032    
epoch 16: loss = 165.540887356
Test AUC = 0.816358639571    
epoch 17: loss = 165.505313337
Test AUC = 0.816286493099    
epoch 18: loss = 165.440673649
Test AUC = 0.816875850513    
epoch 19: loss = 165.342892051
Test AUC = 0.816913585871    
epoch 20: loss = 165.273579121
Test AUC = 0.815895911042    
epoch 21: loss = 165.200304389
Test AUC = 0.815416721379    
epoch 22: loss = 165.138483346
Test AUC = 0.814756179908 

[Set 7] dropout    
epoch 0: loss = 332.870533764
Test AUC = 0.691292566637    
epoch 1: loss = 330.289458454
Test AUC = 0.776335239547    
epoch 2: loss = 328.70829308
Test AUC = 0.789275108075    
epoch 3: loss = 327.995635629
Test AUC = 0.799685036618    
epoch 4: loss = 327.551552176
Test AUC = 0.805448125492    
epoch 5: loss = 327.15153569
Test AUC = 0.809385595509    
epoch 6: loss = 326.881620884
Test AUC = 0.812742414623    
epoch 7: loss = 326.648236334
Test AUC = 0.814646328776    
epoch 8: loss = 326.412108302
Test AUC = 0.81654314133    
epoch 9: loss = 326.326880515
Test AUC = 0.816486054169    
epoch 10: loss = 326.117108941
Test AUC = 0.816923694333    
epoch 11: loss = 325.971155286
Test AUC = 0.816806584182    
epoch 12: loss = 325.817107141
Test AUC = 0.816793382851    
epoch 13: loss = 325.739503741
Test AUC = 0.817429746999    
epoch 14: loss = 325.577210069
Test AUC = 0.816653405995    
epoch 15: loss = 325.47225219
Test AUC = 0.815456828841    
epoch 16: loss = 325.354509473
Test AUC = 0.814910316196  

[Set 8]   More Training Data      
epoch 0: loss = 442.598678648
Test AUC = 0.714518576918    
epoch 1: loss = 438.640891492
Test AUC = 0.784920791465    
epoch 2: loss = 436.994219601
Test AUC = 0.796291819461    
epoch 3: loss = 436.168569326
Test AUC = 0.80236383621    
epoch 4: loss = 435.593809307
Test AUC = 0.811327798837    
epoch 5: loss = 435.098297596
Test AUC = 0.815025850644    
epoch 6: loss = 434.73207444
Test AUC = 0.818647892153    
epoch 7: loss = 434.444538713
Test AUC = 0.819475078187    
epoch 8: loss = 434.154552817
Test AUC = 0.823203870896    
epoch 9: loss = 434.021283925
Test AUC = 0.823570042129    
epoch 10: loss = 433.785658121
Test AUC = 0.824657541134    
epoch 11: loss = 433.603366315
Test AUC = 0.823718093871    
epoch 12: loss = 433.42709285
Test AUC = 0.824673245364    
epoch 13: loss = 433.285766006
Test AUC = 0.824757491193    
epoch 14: loss = 433.117334008
Test AUC = 0.824185458514    
epoch 15: loss = 432.980303049
Test AUC = 0.825087745808    
epoch 16: loss = 432.827882051
Test AUC = 0.823600250167    
epoch 17: loss = 432.721944571
Test AUC = 0.822377446865    
epoch 18: loss = 432.531647742
Test AUC = 0.822054276727    
epoch 19: loss = 432.439513385
Test AUC = 0.822682079411    
epoch 20: loss = 432.288996994
Test AUC = 0.821480176982    
epoch 21: loss = 432.167292416
Test AUC = 0.821338178869    
epoch 22: loss = 431.995352268
Test AUC = 0.820607282061    
epoch 23: loss = 431.919500768
Test AUC = 0.819470860341    

[Set 9]   
epoch 0: loss = 442.436189771
Test AUC = 0.753455622017    
epoch 1: loss = 438.8392694
Test AUC = 0.777074121609    
epoch 2: loss = 437.405785859
Test AUC = 0.78850172991    
epoch 3: loss = 436.760501921
Test AUC = 0.795736220818    
epoch 4: loss = 436.279414892
Test AUC = 0.800484407744    
epoch 5: loss = 435.988939106
Test AUC = 0.801635255998    
epoch 6: loss = 435.712391376
Test AUC = 0.806506550885    
epoch 7: loss = 435.445997953
Test AUC = 0.809259519829    
epoch 8: loss = 435.241165936
Test AUC = 0.810997590907    
epoch 9: loss = 435.003818512
Test AUC = 0.813937628272    
epoch 10: loss = 434.881675243
Test AUC = 0.814369496473    
epoch 11: loss = 434.726149261
Test AUC = 0.81523300481    
epoch 12: loss = 434.620957196
Test AUC = 0.815983006419    
epoch 13: loss = 434.479012251
Test AUC = 0.816051364493    
epoch 14: loss = 434.409355283
Test AUC = 0.816258395236    
epoch 15: loss = 434.268797874
Test AUC = 0.818516805702    
epoch 16: loss = 434.227459311
Test AUC = 0.817730055713    
epoch 17: loss = 434.108650446
Test AUC = 0.817608971351    
epoch 18: loss = 434.019179344
Test AUC = 0.816177514025    
epoch 19: loss = 433.954689682
Test AUC = 0.817096863738    
epoch 20: loss = 433.933077633
Test AUC = 0.817612623597    
epoch 21: loss = 433.827710927
Test AUC = 0.818286525829    
epoch 22: loss = 433.756803811
Test AUC = 0.817414341931  

[Set 10]    
epoch 0: loss = 442.226243258
Train AUC = 0.639660286102
Test AUC = 0.759812569788    
epoch 1: loss = 438.143710256
Train AUC = 0.760106085819
Test AUC = 0.786862651556    
epoch 2: loss = 436.413987935
Train AUC = 0.787821990708
Test AUC = 0.802491135186    
epoch 3: loss = 435.555881798
Train AUC = 0.800702039632
Test AUC = 0.809645335167    
epoch 4: loss = 434.980609298
Train AUC = 0.808693899847
Test AUC = 0.816356077532    
epoch 5: loss = 434.456144571
Train AUC = 0.816129886448
Test AUC = 0.819674311283    
epoch 6: loss = 434.063976288
Train AUC = 0.821731286341
Test AUC = 0.82338996213    
epoch 7: loss = 433.697606802
Train AUC = 0.82697030444
Test AUC = 0.82482351228    
epoch 8: loss = 433.43130964
Train AUC = 0.830989103501
Test AUC = 0.824790458538    
epoch 9: loss = 433.12830174
Train AUC = 0.835590257196
Test AUC = 0.824643925436    
epoch 10: loss = 432.815392196
Train AUC = 0.840240887547
Test AUC = 0.823530914905 
