<h1><center>READ: Recurrent Encoder Neural Language Model with Hierarichal Attention Decoder Fine tuning for Text Classification</center></h1>

This work is inspired by two recent advances in NLP:

1- ULMFiT: Transfer learning from pre-trained model for LM, fined tuned on NLP task

2- HATT: Hierarichal Attention Classifier

__What's in READ not in ULMFiT__
- Hierarichy: which is good for sentiment prediction
- Attention


__What's in ULMFiT not in READ__
- AWD-LSTM
- Pre-trained LM on wikitext, then IMDB
- LRFind
- Bidirectional

__References__
https://github.com/fastai/fastai/blob/master/courses/dl2/imdb.ipynb
https://arxiv.org/abs/1801.06146
https://www.cs.cmu.edu/~hovy/papers/16HLT-hierarchical-attention-networks.pdf



In [1]:
import numpy as np
import pandas as pd
from numpy import array
from collections import defaultdict
import re

from bs4 import BeautifulSoup

import sys
import os
import string
os.environ['KERAS_BACKEND']='tensorflow'

from keras.preprocessing.text import Tokenizer, text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences
from keras.utils.np_utils import to_categorical

from keras.layers import Embedding
from keras.layers import Dense, Input, Flatten
from keras.layers import Conv1D, MaxPooling1D, Embedding, Dropout, LSTM, GRU, Bidirectional, TimeDistributed
from keras.models import Model, Sequential
from keras.callbacks import ModelCheckpoint, Callback
from keras.models import load_model

from keras import backend as K
from keras.engine.topology import Layer, InputSpec
from keras import initializers

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
from pickle import dump

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


# IMDb data

## Data loading

In [2]:
from pathlib import Path

DATA_PATH=Path('../dat/')
DATA_PATH.mkdir(exist_ok=True)
!curl -O http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz 
!tar -xf aclImdb_v1.tar.gz -C {DATA_PATH}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 80.2M  100 80.2M    0     0    98k      0  0:13:52  0:13:52 --:--:--  142k17118  0  74715      0  0:18:45  0:08:41  0:10:04  131k59.5M    0     0  90219      0  0:15:32  0:11:32  0:04:00 39405


In [3]:
BOS = 'xbos'  # beginning-of-sentence tag
FLD = 'xfld'  # data field tag
UNK_TOK = '_UNK_'
UNK_ID = 0 # 0 index is reserved for the UNK in both Keras Tokenizer and Embedding

PATH=Path('../dat/aclImdb/')

In [4]:
CLAS_PATH=Path('../dat/imdb_clas/')
CLAS_PATH.mkdir(exist_ok=True)

LM_PATH=Path('../dat/imdb_lm/')
LM_PATH.mkdir(exist_ok=True)

In [5]:
import re
re1 = re.compile(r'  +')
import html

def fixup(x):
    x = x.replace('#39;', "'").replace('amp;', '&').replace('#146;', "'").replace(
        'nbsp;', ' ').replace('#36;', '$').replace('\\n', "\n").replace('quot;', "'").replace(
        '<br />', "\n").replace('\\"', '"').replace('<unk>','u_n').replace(' @.@ ','.').replace(
        ' @-@ ','-').replace('\\', ' \\ ')
    return re1.sub(' ', html.unescape(x)) # Do not lower() so that capitalized words still hold a sentiment

In [6]:
import numpy as np
CLASSES = ['neg', 'pos']#, 'unsup']

def get_texts(path):
    texts,labels = [],[]
    for idx,label in enumerate(CLASSES):
        for fname in (path/label).glob('*.*'):
            texts.append(fixup(fname.open('r', encoding='utf-8').read()))
            labels.append(idx)
    return np.array(texts),np.array(labels)
    #return texts, labels

trn_texts,trn_labels = get_texts(PATH/'train')
val_texts,val_labels = get_texts(PATH/'test')

In [7]:
len(trn_texts),len(val_texts)

(25000, 25000)

In [8]:
for t in trn_texts[:10]:
  print(t)
  print('\n')

In the autobiographical coming-of-age tale "Romulus, My Father," Eric Bana, of "Munich" fame, plays an impoverished German émigré struggling to raise his son, Raymond (Kodi Smit-McPhee), in rural 1960's Australia. The major obstacle to the family's stability and happiness is his wife, Christina (Franka Potente), who flagrantly violates her wedding vows by shamelessly shacking up with other men. Despite her highly unconventional behavior, Romulus refuses to grant her a divorce, masochistically torturing himself in the vain hope that she will one day return to him. It is, unfortunately, the good-hearted and good-natured Raimond who must bear witness to all this marital turmoil - and it is his memoir that serves as the basis for the movie (Raimond Gaita would later grow up to be an author).

Even though I admire "Romulus, My Father" for what it is trying to do, I can't honestly say I enjoyed it, for while the film has some fine performances and serious intentions going for it, these simpl

In [9]:
np.random.seed(42)
trn_idx = np.random.permutation(len(trn_texts))
val_idx = np.random.permutation(len(val_texts))

In [10]:
trn_texts = trn_texts[trn_idx]
val_texts = val_texts[val_idx]

trn_labels = trn_labels[trn_idx]
val_labels = val_labels[val_idx]

## Fit tokenizer

In [11]:
from keras.preprocessing.text import Tokenizer, text_to_word_sequence
VOCAB_SIZE = 60000
tokenizer = Tokenizer(nb_words=VOCAB_SIZE)
tokenizer.fit_on_texts(np.concatenate([trn_texts, val_texts]))



In [12]:
# Insert UNK
tokenizer.word_index[UNK_TOK] = UNK_ID

In [13]:
str2int = tokenizer.word_index
int2str = dict([(value, key) for (key, value) in str2int.items()])

# Utils

In [14]:
GLOVE_DIR = "../dat/glove"
try:
    os.mkdir(os.path.join('../dat', 'glove'))
except:
    pass
!wget -P {GLOVE_DIR} https://github.com/ahmadelsallab/HierarichalAttentionClassifier_HATT_Sentiment/raw/master/data/glove/glove.6B.100d.txt


--2019-03-19 16:54:12--  https://github.com/ahmadelsallab/HierarichalAttentionClassifier_HATT_Sentiment/raw/master/data/glove/glove.6B.100d.txt
Resolving github.com (github.com)... 140.82.118.3, 140.82.118.4
Connecting to github.com (github.com)|140.82.118.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ahmadelsallab/HierarichalAttentionClassifier_HATT_Sentiment/master/data/glove/glove.6B.100d.txt [following]
--2019-03-19 16:54:16--  https://raw.githubusercontent.com/ahmadelsallab/HierarichalAttentionClassifier_HATT_Sentiment/master/data/glove/glove.6B.100d.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.240.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.240.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6339063 (6.0M) [text/plain]
Saving to: ‘../dat/glove/glove.6B.100d.txt.1’


2019-03-19 16:54:49 (192 KB/s) - ‘../dat/glove/g

In [15]:

def load_embeddings(embeddings_file):
    embeddings_index = {}
    f = open(embeddings_file)
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

    f.close()

    print('Total %s word vectors.' % len(embeddings_index))

    embedding_matrix = np.random.random((VOCAB_SIZE+1, EMBEDDING_DIM))
    for word, i in str2int.items():
        if i < VOCAB_SIZE:
          embedding_vector = embeddings_index.get(word)
          if embedding_vector is not None:
              # words not found in embedding index will be all-zeros.
              embedding_matrix[i] = embedding_vector    
    return embedding_matrix

In [16]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /home/ahmad/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

# Params

In [17]:
LM_DATA_SIZE = 200000
LM_SEQ_LEN = 50
VOCAB_SIZE = 60000
MAX_SENT_LENGTH = LM_SEQ_LEN
MAX_SENTS = 15
MAX_NB_WORDS = VOCAB_SIZE
EMBEDDING_DIM = 100
GLOVE_DIR = "./dat/glove"

# NLM

https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/

## Data preparation

In [18]:
def prepare_lm_data(in_texts, seq_len, size):
    
    # organize into sequences of tokens
    length = seq_len + 1
    sequences = list()
    for i in range(length, len(in_texts)):
      if i < size:
        # select sequence of tokens
        seq = in_texts[i-length:i]
        if(len(seq) != 51):
          print(len(seq))
        # convert into a line
        #line = ' '.join(seq)
        # store

        sequences.append(seq)
        '''
        l = len(line.split())#len(tokenizer.texts_to_sequences(line)) 
        if  l!= 51:
          print(l)
        '''
        #print(line)
      else:
        break
        
    return sequences

In [19]:
def binarize_lm_data(in_texts, tokenizer):
    
    sequences = []
    for t in in_texts:
      words_idx = []
      for w in t:
        if w in tokenizer.word_index:
          idx = tokenizer.word_index[w]
          if idx < VOCAB_SIZE:
            words_idx.append(idx)
          else:
            words_idx.append(UNK_ID)
        else:
          words_idx.append(UNK_ID)
          
      sequences.append(words_idx) 
    
    #sequences = [[tokenizer.word_index[w] for w in t] for t in in_texts]
    #return np.array(tokenizer.texts_to_sequences(in_texts))
    return np.array(sequences)

In [20]:

texts = text_to_word_sequence(' '.join(list(trn_texts)))#list(trn_texts)# np.concatenate([trn_texts, val_texts])


text_sequences = prepare_lm_data(texts, LM_SEQ_LEN, size=LM_DATA_SIZE)



In [21]:
for t in text_sequences[:10]:
  print(t)

['a', 'man', 'brings', 'his', 'new', 'wife', 'to', 'his', 'home', 'where', 'his', 'former', 'wife', 'died', 'of', 'an', 'accident', 'his', 'new', 'wife', 'has', 'just', 'been', 'released', 'from', 'an', 'institution', 'and', 'is', 'also', 'very', 'rich', 'all', 'of', 'the', 'sudden', 'she', 'starts', 'hearing', 'noises', 'and', 'seeing', 'skulls', 'all', 'over', 'the', 'place', 'is', 'she', 'going', 'crazy']
['man', 'brings', 'his', 'new', 'wife', 'to', 'his', 'home', 'where', 'his', 'former', 'wife', 'died', 'of', 'an', 'accident', 'his', 'new', 'wife', 'has', 'just', 'been', 'released', 'from', 'an', 'institution', 'and', 'is', 'also', 'very', 'rich', 'all', 'of', 'the', 'sudden', 'she', 'starts', 'hearing', 'noises', 'and', 'seeing', 'skulls', 'all', 'over', 'the', 'place', 'is', 'she', 'going', 'crazy', 'again']
['brings', 'his', 'new', 'wife', 'to', 'his', 'home', 'where', 'his', 'former', 'wife', 'died', 'of', 'an', 'accident', 'his', 'new', 'wife', 'has', 'just', 'been', 'releas

In [22]:
sequences = binarize_lm_data(text_sequences, tokenizer)

sz_limit = LM_DATA_SIZE# len(sequences)

# separate into input and output
sequences = array(sequences[:sz_limit])
X, y = sequences[:,:-1], sequences[:,-1]

#y = to_categorical(y, num_classes=vocab_size)

In [23]:
print(X.shape)
print(y.shape)

(199949, 50)
(199949,)


## Model

In [24]:
# define model
#model = Sequential()
#model.add(Embedding(vocab_size, 50, input_length=seq_length))
#model.add(LSTM(100, return_sequences=True))
#model.add(LSTM(100))

GLOVE_DIR = "../dat/glove"

embeddings_file = os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')
embedding_matrix = load_embeddings(embeddings_file)
        
  
embedding_layer = Embedding(VOCAB_SIZE+1,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=LM_SEQ_LEN,
                            trainable=True)
sentence_input = Input(shape=(LM_SEQ_LEN,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
l_word_enc = TimeDistributed(Dense(200))(l_lstm)
l_lstm_2 = LSTM(100)(l_word_enc)
#model.add(Dense(100, activation='relu'))
l_dense = Dense(100, activation='relu')(l_word_enc)
#model.add(Dense(vocab_size, activation='softmax'))
output = Dense(VOCAB_SIZE+1, activation='softmax')(l_lstm_2)
model = Model(sentence_input, output)
print(model.summary())
word_enc = Model(sentence_input, l_word_enc)

# compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Total 7396 word vectors.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 50)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 50, 100)           6000100   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 50, 200)           120600    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 50, 200)           40200     
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               120400    
_________________________________________________________________
dense_3 (Dense)              (None, 60001)             6060101   
Total params: 12,341,401
Trainable params: 12,341,401
Non-trainable params: 0
_______________________________________

In [25]:
# Mount GDrive
#from google.colab import drive
#drive.mount('/content/gdrive')
#gdrive_path = 'gdrive/My Drive'
gdrive_path = '../dat'

In [26]:
# save the model to file
lm_model_file_name = 'lm_model.h5'
word_enc_model_file_name = 'word_enc_model.h5'


In [73]:
load_prev_model = True
filepath = os.path.join(gdrive_path, lm_model_file_name)
if load_prev_model and os.path.exists(filepath):
  model = load_model(filepath)  
  #word_enc = load_model(os.path.join(gdrive_path, word_enc_model_file_name))  
  print('Existing LM loaded')
  

Existing LM loaded


In [28]:
import os
class SaveWordEncoder(Callback):
    '''
    def on_train_begin(self, logs={}):
        self.acc = []
    '''
    def on_epoch_end(self, batch, logs={}):
      word_enc.save(os.path.join(gdrive_path, word_enc_model_file_name))
        

word_enc_model_cbk = SaveWordEncoder()


In [74]:
checkpoint = ModelCheckpoint(filepath, monitor='acc', verbose=1, save_best_only=True, mode='max')

In [75]:
callbacks_lst = [word_enc_model_cbk, checkpoint]

In [31]:
'''
import keras.backend as K
K.clear_session()
model = load_model(filepath)
'''

'\nimport keras.backend as K\nK.clear_session()\nmodel = load_model(filepath)\n'

In [76]:

# fit model
model.fit(X, y, batch_size=128, epochs=100, callbacks=callbacks_lst)



Epoch 1/100

Epoch 00001: acc improved from -inf to 0.25485, saving model to ../dat/lm_model.h5
Epoch 2/100

Epoch 00002: acc improved from 0.25485 to 0.25983, saving model to ../dat/lm_model.h5
Epoch 3/100

Epoch 00003: acc improved from 0.25983 to 0.26414, saving model to ../dat/lm_model.h5
Epoch 4/100

Epoch 00004: acc improved from 0.26414 to 0.26828, saving model to ../dat/lm_model.h5
Epoch 5/100

Epoch 00005: acc improved from 0.26828 to 0.27248, saving model to ../dat/lm_model.h5
Epoch 6/100

Epoch 00006: acc improved from 0.27248 to 0.27787, saving model to ../dat/lm_model.h5
Epoch 7/100

Epoch 00007: acc improved from 0.27787 to 0.28275, saving model to ../dat/lm_model.h5
Epoch 8/100

Epoch 00008: acc improved from 0.28275 to 0.28610, saving model to ../dat/lm_model.h5
Epoch 9/100

Epoch 00009: acc improved from 0.28610 to 0.28974, saving model to ../dat/lm_model.h5
Epoch 10/100

Epoch 00010: acc improved from 0.28974 to 0.29076, saving model to ../dat/lm_model.h5
Epoch 11/100


Epoch 00045: acc did not improve from 0.36287
Epoch 46/100

Epoch 00046: acc did not improve from 0.36287
Epoch 47/100

Epoch 00047: acc did not improve from 0.36287
Epoch 48/100

Epoch 00048: acc did not improve from 0.36287
Epoch 49/100

Epoch 00049: acc improved from 0.36287 to 0.36397, saving model to ../dat/lm_model.h5
Epoch 50/100

Epoch 00050: acc improved from 0.36397 to 0.36654, saving model to ../dat/lm_model.h5
Epoch 51/100

Epoch 00051: acc improved from 0.36654 to 0.37172, saving model to ../dat/lm_model.h5
Epoch 52/100

Epoch 00052: acc improved from 0.37172 to 0.37452, saving model to ../dat/lm_model.h5
Epoch 53/100

Epoch 00053: acc did not improve from 0.37452
Epoch 54/100

Epoch 00054: acc improved from 0.37452 to 0.37587, saving model to ../dat/lm_model.h5
Epoch 55/100

Epoch 00055: acc did not improve from 0.37587
Epoch 56/100

Epoch 00056: acc did not improve from 0.37587
Epoch 57/100

Epoch 00057: acc did not improve from 0.37587
Epoch 58/100

Epoch 00058: acc di


Epoch 00094: acc did not improve from 0.39596
Epoch 95/100

Epoch 00095: acc did not improve from 0.39596
Epoch 96/100

Epoch 00096: acc improved from 0.39596 to 0.39625, saving model to ../dat/lm_model.h5
Epoch 97/100

Epoch 00097: acc did not improve from 0.39625
Epoch 98/100

Epoch 00098: acc did not improve from 0.39625
Epoch 99/100

Epoch 00099: acc did not improve from 0.39625
Epoch 100/100

Epoch 00100: acc did not improve from 0.39625


<keras.callbacks.History at 0x7faa084a7a90>

In [33]:
word_enc.save(os.path.join(gdrive_path, word_enc_model_file_name))

# HATT

## Data preparation

In [34]:
from nltk import tokenize
def prepare_hier_data(in_texts, in_labels):
    
    reviews = []
    labels = []
    texts = []
    
    for idx in range(len(in_texts)):
        text = in_texts[idx]
        label = in_labels[idx]
        if label != 2:
          #print('Parsing review ' + str(idx))
          texts.append(text)
          sentences = tokenize.sent_tokenize(text)
          reviews.append(sentences)       
          labels.append(label)
    return reviews, labels

In [35]:

from keras.utils.np_utils import to_categorical

def binarize_hier_data(reviews, labels, tokenizer):
    data_lst = []
    labels_lst = []
    for i, sentences in enumerate(reviews):
        data = UNK_ID * np.ones((MAX_SENTS, MAX_SENT_LENGTH), dtype='int32') # Init all as UNK
        for j, sent in enumerate(sentences):
            if j< MAX_SENTS:
                wordTokens = text_to_word_sequence(sent)
                k=0
                for _, word in enumerate(wordTokens):
                    if word in tokenizer.word_index:
                      if k<MAX_SENT_LENGTH and tokenizer.word_index[word]<MAX_NB_WORDS:
                          data[j,k] = tokenizer.word_index[word]
                          k=k+1
        data_lst.append(data)
        labels_lst.append(labels[i])
    data = np.array(data_lst)
    targets = np.array(labels_lst) 
    targets = to_categorical(np.asarray(targets))
    return data, targets

In [77]:
train_texts_, train_labels_ = prepare_hier_data(trn_texts, trn_labels)
train_data, train_targets = binarize_hier_data(train_texts_, train_labels_, tokenizer)


In [78]:
# 25k only are training out of 75k, becasue 50k are unsup --> label = 2
print('Shape of data tensor:', train_data.shape)
print('Shape of label tensor:', train_targets.shape)

Shape of data tensor: (25000, 15, 50)
Shape of label tensor: (25000, 2)


## Split train/val

In [79]:

VALIDATION_SPLIT = 0.2

indices = np.arange(train_data.shape[0])
np.random.shuffle(indices)
train_data = train_data[indices]
train_targets = train_targets[indices]
nb_validation_samples = int(VALIDATION_SPLIT * train_data.shape[0])

x_train = train_data[:-nb_validation_samples]
y_train = train_targets[:-nb_validation_samples]
x_val = train_data[-nb_validation_samples:]
y_val = train_targets[-nb_validation_samples:]

print('Number of positive and negative reviews in traing and validation set')
print(y_train.sum(axis=0))
print(y_val.sum(axis=0))

Number of positive and negative reviews in traing and validation set
[10012.  9988.]
[2488. 2512.]


## Model

In [80]:
from keras.models import load_model
def load_word_enc_model(word_enc_model_file_name):
    word_enc_model = load_model(word_enc_model_file_name)

    '''
    embeddings_file = os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')
    embedding_matrix = load_embeddings(embeddings_file)


    embedding_layer = Embedding(VOCAB_SIZE+1,
                              EMBEDDING_DIM,
                              weights=[embedding_matrix],
                              input_length=MAX_SENT_LENGTH,
                              trainable=True)
    sentence_input = Input(shape=(LM_SEQ_LEN,), dtype='int32')
    embedded_sequences = embedding_layer(sentence_input)
    l_lstm = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
    l_word_enc = TimeDistributed(Dense(200))(l_lstm)


    word_enc_model = Model(sentence_input, l_word_enc)  
    '''
    print(word_enc_model.summary())
    return word_enc_model

In [87]:

word_enc_model = load_word_enc_model(os.path.join(gdrive_path, word_enc_model_file_name))#model.load(word_enc_model_file_name)


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 50)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 50, 100)           6000100   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 50, 200)           120600    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 50, 200)           40200     
Total params: 6,160,900
Trainable params: 6,160,900
Non-trainable params: 0
_________________________________________________________________
None




In [88]:

embeddings_file_name = os.path.join(GLOVE_DIR, 'glove.6B.100d.txt')

# building Hierachical Attention network
embedding_matrix = load_embeddings(embeddings_file_name)

        
embedding_layer = Embedding(VOCAB_SIZE,
                            EMBEDDING_DIM,
                            weights=[embedding_matrix],
                            input_length=MAX_SENT_LENGTH,
                            trainable=True)

class AttLayer(Layer):
    def __init__(self, **kwargs):
        self.init = initializers.he_normal()
        super(AttLayer, self).__init__(**kwargs)
    '''
    def build(self, input_shape):
        assert len(input_shape)==3
        self.W = self.init((input_shape[-1],1))
        self.trainable_weights = [self.W]
        super(AttLayer, self).build(input_shape)  # be sure you call this somewhere!
    '''
    def build(self, input_shape):
        assert len(input_shape)==3
        # Create a trainable weight variable for this layer.
        self.W = self.add_weight(name='kernel', 
                                      shape=(input_shape[-1], 1),
                                      initializer='uniform',
                                      trainable=True)
        super(AttLayer, self).build(input_shape)  # Be sure to call this at the end
    
    def call(self, x, mask=None):
        eij = K.tanh(K.dot(x, self.W))
        
        ai = K.exp(eij)
        weights = ai/tf.expand_dims(K.sum(ai, axis=1), 1)
        
        weighted_input = x*weights
        return tf.reduce_sum(weighted_input, axis=1)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])

sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
'''
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
l_dense = TimeDistributed(Dense(200))(l_lstm)
'''
l_dense = word_enc_model(sentence_input)
l_att = AttLayer()(l_dense)
sentEncoder = Model(sentence_input, l_att)

review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
l_dense_sent = TimeDistributed(Dense(200))(l_lstm_sent)
l_att_sent = AttLayer()(l_dense_sent)
preds = Dense(2, activation='softmax')(l_att_sent)
model = Model(review_input, preds)

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

Total 7396 word vectors.


In [89]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_17 (InputLayer)        (None, 15, 50)            0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, 15, 200)           6161100   
_________________________________________________________________
bidirectional_10 (Bidirectio (None, 15, 200)           180600    
_________________________________________________________________
time_distributed_17 (TimeDis (None, 15, 200)           40200     
_________________________________________________________________
att_layer_14 (AttLayer)      (None, 200)               200       
_________________________________________________________________
dense_19 (Dense)             (None, 2)                 402       
Total params: 6,382,502
Trainable params: 221,602
Non-trainable params: 6,160,900
____________________________________________________________

In [65]:
hatt_model = 'hatt_model.h5'
load_prev_model = False
filepath = os.path.join(gdrive_path, hatt_model)
if load_prev_model and os.path.exists(filepath):
    model = load_model(filepath) 
    print('HATT model loaded')

ValueError: Unknown layer: AttLayer

In [90]:
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

In [91]:
callbacks_lst = [checkpoint]

In [None]:
word_enc_model.trainable = False
model.summary()
# Must call compile for trainable to take effect
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

In [92]:

NUM_EPOCHS = 100
BATCH_SIZE = 50
print("model fitting - Hierachical attention network")
model.fit(x_train, y_train, validation_data=(x_val, y_val), nb_epoch=NUM_EPOCHS, batch_size=BATCH_SIZE, callbacks=callbacks_lst)

model fitting - Hierachical attention network


  """


Train on 20000 samples, validate on 5000 samples
Epoch 1/100

Epoch 00001: val_acc improved from -inf to 0.53960, saving model to ../dat/lm_model.h5
Epoch 2/100

Epoch 00002: val_acc improved from 0.53960 to 0.55420, saving model to ../dat/lm_model.h5
Epoch 3/100

Epoch 00003: val_acc did not improve from 0.55420
Epoch 4/100
 2450/20000 [==>...........................] - ETA: 25s - loss: 0.6823 - acc: 0.5567

KeyboardInterrupt: 

In [100]:
word_enc_model.trainable = True
model.summary()
# Must call compile for trainable to take effect
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_17 (InputLayer)        (None, 15, 50)            0         
_________________________________________________________________
time_distributed_16 (TimeDis (None, 15, 200)           6161100   
_________________________________________________________________
bidirectional_10 (Bidirectio (None, 15, 200)           180600    
_________________________________________________________________
time_distributed_17 (TimeDis (None, 15, 200)           40200     
_________________________________________________________________
att_layer_14 (AttLayer)      (None, 200)               200       
_________________________________________________________________
dense_19 (Dense)             (None, 2)                 402       
Total params: 221,602
Trainable params: 221,602
Non-trainable params: 0
_________________________________________________________________


  'Discrepancy between trainable weights and collected trainable'


In [101]:

NUM_EPOCHS = 100
BATCH_SIZE = 50
print("model fitting - Hierachical attention network")
model.fit(x_train, y_train, validation_data=(x_val, y_val), nb_epoch=NUM_EPOCHS, batch_size=BATCH_SIZE, callbacks=callbacks_lst)

model fitting - Hierachical attention network


  """


Train on 20000 samples, validate on 5000 samples
Epoch 1/100

Epoch 00001: val_acc improved from 0.59380 to 0.79940, saving model to ../dat/lm_model.h5
Epoch 2/100

Epoch 00002: val_acc improved from 0.79940 to 0.81240, saving model to ../dat/lm_model.h5
Epoch 3/100

Epoch 00003: val_acc improved from 0.81240 to 0.82140, saving model to ../dat/lm_model.h5
Epoch 4/100

Epoch 00004: val_acc improved from 0.82140 to 0.85120, saving model to ../dat/lm_model.h5
Epoch 5/100

Epoch 00005: val_acc improved from 0.85120 to 0.85380, saving model to ../dat/lm_model.h5
Epoch 6/100

Epoch 00006: val_acc improved from 0.85380 to 0.86220, saving model to ../dat/lm_model.h5
Epoch 7/100

Epoch 00007: val_acc did not improve from 0.86220
Epoch 8/100

Epoch 00008: val_acc did not improve from 0.86220
Epoch 9/100

KeyboardInterrupt: 

# Test

In [None]:
test_texts_, test_labels_ = prepare_hier_data(val_texts, val_labels)
test_data, test_targets = binarize_hier_data(test_texts_, test_labels_, tokenizer)

In [None]:

print('Shape of data tensor:', test_data.shape)
print('Shape of label tensor:', test_targets.shape)

In [None]:
for i, rev in enumerate(test_texts_):
    print(rev)
    test_input = test_data[i].copy()
    test_input = np.reshape(test_input, (1,test_input.shape[0], test_input.shape[1]))
    prediction = model.predict(test_input)
    print('Prediction: ', prediction)
    sentiment = np.argmax(prediction)
    print('Sentiment: ' + str(sentiment))