# DCMN Explore Notebook

## Step 1: Load data and see what our problems are like!

In [1]:
import os
from data import read_examples

data_path = './RACE/'
train_high, train_middle = read_examples(os.path.join(data_path, 'train'))

In [2]:
from utils import print_example

print_example(train_high, 0)

id: 
 high1000.txt
passage: 
 When newspapers and radio describe the damage caused by a hurricane  named Hazel, girls named Hazel are probably teased  by their friends. To keep out of trouble, the Weather Bureau says,"Any _ between hurricane names and the names of particular girls is purely accidental."
Some women became angry because hurricanes are given their names, but many other women are proud to see their names make headlines. They don't even care that they are the names of destructive storms. Because more women seem to like it than dislike it, the Weather Bureau has decided to continue using girl's names for hurricanes.
In some ways a hurricane is like a person. After it is born, it grows and develops, then becomes old and dies. Each hurricane has a character of its own. Each follows its own path through the world, and people remember it long after it gone. So it is natural to give hurricanes' names, and to talk about them almost if they were alive.
questions: 
 ['What happens t

## Step 2: Getting a BERT layer and its corresponding preprocessor
We found bert-base to be too big for our hardware, so two smaller bert instances are tried. Among those the 2-layer-128-hidden one also only has a `max-seq-length` of 128, instead of the original 512. Considering the length of our passages a larger small model is used instead.

In [3]:
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text

arguments = dict(seq_length=512)

#preprocess = hub.load('https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3')
preprocess = hub.load('https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3')
bert_preprocess = hub.KerasLayer(preprocess)

#bert = hub.load('https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1')
bert = hub.load('https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/2')
bert_layer = hub.KerasLayer(bert)





In [18]:
from preprocessing import dcmn_preprocess

sample_size = 1000
sample_data = train_high[:1000]
validation_data = train_high[sample_size:sample_size + 100]

sample_inputs, sample_answers = dcmn_preprocess(sample_data)
valid_inputs, valid_answers = dcmn_preprocess(validation_data)
answer_dict = {'A': tf.constant([1, 0, 0, 0]),
               'B': tf.constant([0, 1, 0, 0]),
               'C': tf.constant([0, 0, 1, 0]),
               'D': tf.constant([0, 0, 0, 1])}
sample_answers_encoded = tf.convert_to_tensor([answer_dict[a] for a in sample_answers])
valid_answers_encoded = tf.convert_to_tensor([answer_dict[a] for a in valid_answers])

## Compiling the model and training!

In [7]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Dense
from model import DCMN, GateLayers

config = {'n_choices': 4,
          'hidden_size': 256, #Under current setup, must adjust to match bert embedding size
          'dropout': 0.1,
          'bert_preprocess': bert_preprocess,
          'bert': bert_layer,
          'gate_layer': GateLayers,
          'classifier': Dense(4, activation='softmax')}

model = DCMN(config)

In [8]:
model.compile(
    optimizer = tf.keras.optimizers.Adam(5e-3),
    loss = tf.keras.losses.CategoricalCrossentropy(),
    metrics = [tf.keras.metrics.CategoricalAccuracy()]
)

model.fit(x = sample_inputs, y = sample_answers_encoded, batch_size = 10, epochs = 5, validation_data=(valid_inputs, valid_answers_encoded))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1e3d00addf0>

In [9]:
model.fit(x = sample_inputs, y = sample_answers_encoded, batch_size = 10, epochs = 10, validation_data=(valid_inputs, valid_answers_encoded))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1e3e6c973d0>

In [10]:
model.fit(x = sample_inputs, y = sample_answers_encoded, batch_size = 10, epochs = 20, validation_data=(valid_inputs, valid_answers_encoded))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1e3e6c9bb20>

In [11]:
model.fit(x = sample_inputs, y = sample_answers_encoded, batch_size = 10, epochs = 20, validation_data=(valid_inputs, valid_answers_encoded))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1e3e71b6fa0>

In [16]:
model.fit(x = sample_inputs, y = sample_answers_encoded, batch_size = 10, epochs = 20, validation_data=(valid_inputs, valid_answers_encoded))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1e40d8c6d00>

In [17]:
chkpt_path = './trained_models/dcmn_bert_4_256_epoch_80/'
model.save_weights(chkpt_path)