## Neural part of speech tagging. 
This is an optional assignment. Turn back whilst thou still can
We're now going to solve the same problem of POS tagging with neural networks.
From deep learning perspective, this is a task of predicting a sequence of outputs aligned to a sequence of inputs. There are several problems that match this formulation:
 * Part of Speech Tagging  - an auxuliary task for many NLP problems
 * Named Entity Recognition - for chat bots and web crawlers
 * Protein structure prediction - for bioinformatics

In [3]:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU') #используйте параметр ограничения роста памяти графического процессора
tf.config.experimental.set_memory_growth(gpus[0], True)

In [2]:
import nltk
import sys
import numpy as np

nltk.download('brown')
nltk.download('universal_tagset')
data = nltk.corpus.brown.tagged_sents(tagset='universal')
all_tags = ['#EOS#','#UNK#','ADV', 'NOUN', 'ADP', 'PRON', 'DET', '.', 'PRT', 'VERB', 'X', 'NUM', 'CONJ', 'ADJ']

data = np.array([ [(word.lower(),tag) for word,tag in sentence] for sentence in data ])

[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\Vera_Romantsova\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package universal_tagset to
[nltk_data]     C:\Users\Vera_Romantsova\AppData\Roaming\nltk_data...
[nltk_data]   Package universal_tagset is already up-to-date!


In [3]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data,test_size=0.25,random_state=42)

In [4]:
from IPython.display import HTML, display
def draw(sentence):
    words,tags = zip(*sentence)
    display(HTML('<table><tr>{tags}</tr>{words}<tr></table>'.format(
                words = '<td>{}</td>'.format('</td><td>'.join(words)),
                tags = '<td>{}</td>'.format('</td><td>'.join(tags)))))
    
    
draw(data[11])
draw(data[10])
draw(data[7])

0,1,2,3,4,5,6,7,8,9,10,11,12,13
NOUN,ADP,NOUN,NOUN,NOUN,NOUN,VERB,ADV,VERB,ADP,DET,ADJ,NOUN,.
,,,,,,,,,,,,,


0,1,2,3,4,5,6,7,8,9,10,11,12,13
PRON,VERB,ADP,DET,NOUN,.,VERB,NOUN,PRT,VERB,.,DET,NOUN,.
,,,,,,,,,,,,,


0,1
NOUN,VERB
,


In [5]:
data[11]

[('implementation', 'NOUN'),
 ('of', 'ADP'),
 ("georgia's", 'NOUN'),
 ('automobile', 'NOUN'),
 ('title', 'NOUN'),
 ('law', 'NOUN'),
 ('was', 'VERB'),
 ('also', 'ADV'),
 ('recommended', 'VERB'),
 ('by', 'ADP'),
 ('the', 'DET'),
 ('outgoing', 'ADJ'),
 ('jury', 'NOUN'),
 ('.', '.')]

Building vocabularies

Just like before, we have to build a mapping from tokens to integer ids. This time around, our model operates on a word level,
processing one word per RNN step. This means we'll have to deal with far larger vocavulary.
Luckily for us, we only receive those words as input i.e. we don't have to predict them. This means we can have a large vocabulary 
for free by using word embeddings.

In [6]:
from collections import Counter
word_counts = Counter()
for sentence in data:
    words,tags = zip(*sentence)
    word_counts.update(words)

all_words = ['#EOS#','#UNK#'] + list(list(zip(*word_counts.most_common(10000)))[0])

#let's measure what fraction of data words are in the dictionary

print("Coverage = %.5f" % (float(sum(word_counts[w] for w in all_words)) / sum(word_counts.values())))

Coverage = 0.92876


In [7]:
from collections import defaultdict
word_to_id = defaultdict(lambda:1, { word: i for i, word in enumerate(all_words) })
tag_to_id = { tag: i for i, tag in enumerate(all_tags)}

convert words and tags into fix-size matrix

In [8]:
def to_matrix(lines, token_to_id, max_len=None, pad=0, dtype='int32', time_major=False):
    """Converts a list of names into rnn-digestable matrix with paddings added after the end
    """
    max_len = max_len or max(map(len,lines))
    matrix = np.empty([len(lines), max_len],dtype)
    matrix.fill(pad)

    for i in range(len(lines)):
        line_ix = list(map(token_to_id.__getitem__,lines[i]))[:max_len]
        matrix[i,:len(line_ix)] = line_ix

    return matrix.T if time_major else matrix

In [9]:
batch_words, batch_tags = zip(*[zip(*sentence) for sentence in data[-3:]])

print("Word ids:")
print(to_matrix(batch_words, word_to_id))
print("Tag ids:")
print(to_matrix(batch_tags, tag_to_id))

Word ids:
[[   2 3057    5    2 2238 1334 4238 2454    3    6   19   26 1070   69
     8 2088    6    3    1    3  266   65  342    2    1    3    2  315
     1    9   87  216 3322   69 1558    4    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0]
 [  45   12    8  511 8419    6   60 3246   39    2    1    1    3    2
   845    1    3    1    3   10 9910    2    1 3470    9   43    1    1
     3    6    2 1046  385   73 4562    3    9    2    1    1 3250    3
    12   10    2  861 5240   12    8 8936  121    1    4]
 [  33   64   26   12  445    7 7346    9    8 3337    3    1 2811    3
     2  463  572    2    1    1 1649   12    1    4    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0]]
Tag ids:
[[ 6  3  4  6  3  3  9  9  7 12  4  5  9  4  6  3 12  7  9  7  9  8  4  6
   3  7  6 13  3  4  6  3  9  4  3  7  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0

In [10]:
data[-3:]

array([list([('the', 'DET'), ('doors', 'NOUN'), ('of', 'ADP'), ('the', 'DET'), ('d', 'NOUN'), ('train', 'NOUN'), ('slid', 'VERB'), ('shut', 'VERB'), (',', '.'), ('and', 'CONJ'), ('as', 'ADP'), ('i', 'PRON'), ('dropped', 'VERB'), ('into', 'ADP'), ('a', 'DET'), ('seat', 'NOUN'), ('and', 'CONJ'), (',', '.'), ('exhaling', 'VERB'), (',', '.'), ('looked', 'VERB'), ('up', 'PRT'), ('across', 'ADP'), ('the', 'DET'), ('aisle', 'NOUN'), (',', '.'), ('the', 'DET'), ('whole', 'ADJ'), ('aviary', 'NOUN'), ('in', 'ADP'), ('my', 'DET'), ('head', 'NOUN'), ('burst', 'VERB'), ('into', 'ADP'), ('song', 'NOUN'), ('.', '.')]),
       list([('she', 'PRON'), ('was', 'VERB'), ('a', 'DET'), ('living', 'VERB'), ('doll', 'NOUN'), ('and', 'CONJ'), ('no', 'DET'), ('mistake', 'NOUN'), ('--', '.'), ('the', 'DET'), ('blue-black', 'ADJ'), ('bang', 'NOUN'), (',', '.'), ('the', 'DET'), ('wide', 'ADJ'), ('cheekbones', 'NOUN'), (',', '.'), ('olive-flushed', 'ADJ'), (',', '.'), ('that', 'PRON'), ('betrayed', 'VERB'), ('the',

Build model
Unlike our previous lab, this time we'll focus on a high-level keras interface to recurrent neural networks. It is as simple as you can get with RNN, allbeit somewhat constraining for complex tasks like seq2seq.

By default, all keras RNNs apply to a whole sequence of inputs and produce a sequence of hidden states (return_sequences=True or just the last hidden state (return_sequences=False). All the recurrence is happening under the hood.

At the top of our model we need to apply a Dense layer to each time-step independently. As of now, by default keras.layers.Dense would apply once to all time-steps concatenated. We use keras.layers.TimeDistributed to modify Dense layer so that it would apply across both batch and time axes.

In [11]:
import keras
import keras.layers as L

model = keras.models.Sequential()
model.add(L.InputLayer([None],dtype='int32'))
model.add(L.Embedding(len(all_words),50))
model.add(L.SimpleRNN(64,return_sequences=True))

#add top layer that predicts tag probabilities
stepwise_dense = L.Dense(len(all_tags),activation='softmax')
stepwise_dense = L.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [12]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 50)          500100    
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, None, 64)          7360      
_________________________________________________________________
time_distributed (TimeDistri (None, None, 14)          910       
Total params: 508,370
Trainable params: 508,370
Non-trainable params: 0
_________________________________________________________________


Training: in this case we don't want to prepare the whole training dataset in advance. The main cause is that the length of every batch depends on the maximum sentence length within the batch. This leaves us two options: use custom training code as in previous seminar or use generators.

Keras models have a model.fit_generator method that accepts a python generator yielding one batch at a time. But first we need to implement such generator:

In [13]:
from keras.utils.np_utils import to_categorical
BATCH_SIZE=32
def generate_batches(sentences,batch_size=BATCH_SIZE,max_len=None,pad=0):
    assert isinstance(sentences,np.ndarray),"Make sure sentences is q numpy array"
    
    while True:
        indices = np.random.permutation(np.arange(len(sentences)))
        for start in range(0,len(indices)-1,batch_size):
            batch_indices = indices[start:start+batch_size]
            batch_words,batch_tags = [],[]
            for sent in sentences[batch_indices]:
                words,tags = zip(*sent)
                batch_words.append(words)
                batch_tags.append(tags)

            batch_words = to_matrix(batch_words,word_to_id,max_len,pad)
            batch_tags = to_matrix(batch_tags,tag_to_id,max_len,pad)

            batch_tags_1hot = to_categorical(batch_tags,len(all_tags)).reshape(batch_tags.shape+(-1,))
            yield batch_words,batch_tags_1hot

Callbacks: Another thing we need is to measure model performance. The tricky part is not to count accuracy after sentence ends (on padding) and making sure we count all the validation data exactly once.

While it isn't impossible to persuade Keras to do all of that, we may as well write our own callback that does that. Keras callbacks allow you to write a custom code to be ran once every epoch or every minibatch. We'll define one via LambdaCallback

In [14]:
def compute_test_accuracy(model):
    test_words,test_tags = zip(*[zip(*sentence) for sentence in test_data])
    test_words,test_tags = to_matrix(test_words,word_to_id),to_matrix(test_tags,tag_to_id)

    #predict tag probabilities of shape [batch,time,n_tags]
    predicted_tag_probabilities = model.predict(test_words,verbose=1)
    predicted_tags = predicted_tag_probabilities.argmax(axis=-1)

    #compute accurary excluding padding
    numerator = np.sum(np.logical_and((predicted_tags == test_tags),(test_words != 0)))
    denominator = np.sum(test_words != 0)
    return float(numerator)/denominator


class EvaluateAccuracy(keras.callbacks.Callback):
    def on_epoch_end(self,epoch,logs=None):
        sys.stdout.flush()
        print("\nMeasuring validation accuracy...")
        acc = compute_test_accuracy(self.model)
        print("\nValidation accuracy: %.5f\n"%acc)
        sys.stdout.flush()
        

In [15]:
model.compile('adam','categorical_crossentropy')

model.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)



Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.93845

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.94447

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.94609

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.94650

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.94467



<tensorflow.python.keras.callbacks.History at 0x21dc9aef1c0>

Measure final accuracy on the whole test set.

In [16]:
acc = compute_test_accuracy(model)
print("Final accuracy: %.5f"%acc)
assert acc>0.94, "Keras has gone on a rampage again, please contact course staff."

Final accuracy: 0.94467


Going bidirectional
Since we're analyzing a full sequence, it's legal for us to look into future data.

A simple way to achieve that is to go both directions at once, making a bidirectional RNN.

In Keras you can achieve that both manually (using two LSTMs and Concatenate) and by using keras.layers.Bidirectional.

This one works just as TimeDistributed we saw before: you wrap it around a recurrent layer (SimpleRNN now and LSTM/GRU later) and it actually creates two layers under the hood.

Your first task is to use such a layer our POS-tagger.

In [15]:
#Define a model that utilizes bidirectional SimpleRNN
model2 = keras.models.Sequential()
model2.add(L.InputLayer([None],dtype='int32'))
model2.add(L.Embedding(len(all_words),50))
model2.add(L.Bidirectional(L.SimpleRNN(64,return_sequences=True))) #Bidirectional
model2.add(L.Dense(len(all_tags),activation='softmax'))

In [16]:
model2.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 50)          500100    
_________________________________________________________________
bidirectional (Bidirectional (None, None, 128)         14720     
_________________________________________________________________
dense_1 (Dense)              (None, None, 14)          1806      
Total params: 516,626
Trainable params: 516,626
Non-trainable params: 0
_________________________________________________________________


In [19]:
model2.compile('adam','categorical_crossentropy')

model2.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95658

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96131

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96270

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96270

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96223



<tensorflow.python.keras.callbacks.History at 0x21dea5b4e50>

In [20]:
acc = compute_test_accuracy(model2)
print("\nFinal accuracy: %.5f"%acc)

assert acc>0.96, "Bidirectional RNNs are better than this!"
print("Well done!")


Final accuracy: 0.96223
Well done!


Task I: Structured loss functions (more bonus points)

Since we're tagging the whole sequence at once, we might as well train our network to do so. Remember linear CRF from the lecture? You can also use it as a loss function for your RNN

There's more than one way to do so, but we'd recommend starting with Conditional Random Fields
You can plug CRF as a loss function and still train by backprop. There's even some neat tensorflow implementation for you.
Alternatively, you can condition your model on previous tags (make it autoregressive) and perform beam search over that model.

In [16]:
# !pip install git+https://www.github.com/keras-team/keras-contrib.git
from keras_contrib.layers import CRF

In [17]:
# TimeDistributed + CRF
model3 = keras.models.Sequential()
model3.add(L.InputLayer([None],dtype='int32'))
model3.add(L.Embedding(len(all_words),50))
model3.add(L.Bidirectional(L.SimpleRNN(64,return_sequences=True)))
model3.add(L.TimeDistributed(L.Dense(len(all_tags), activation="relu")))
crf_layer = CRF(len(all_tags))
model3.add(crf_layer)

In [18]:
model3.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 50)          500100    
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 128)         14720     
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 14)          1806      
_________________________________________________________________
crf (CRF)                    (None, None, 14)          434       
Total params: 517,060
Trainable params: 517,060
Non-trainable params: 0
_________________________________________________________________


In [86]:
model3.compile(optimizer='adam', loss = crf_layer.loss_function, metrics=[crf_layer, 'accuracy'])

In [89]:
model3.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5


AttributeError: in user code:

    C:\Users\Vera_Romantsova\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:805 train_function  *
        return step_function(self, iterator)
    C:\Users\Vera_Romantsova\anaconda3\lib\site-packages\keras_contrib\losses\crf_losses.py:54 crf_loss  *
        crf, idx = y_pred._keras_history[:2]

    AttributeError: 'Tensor' object has no attribute '_keras_history'


https://github.com/keras-team/keras/issues/14464
    
у меня версия tf 2.4.1. - по ссылке проблема еще не решена - поэтому доделать не смогла

In [20]:
# Заменим SimpleRNN на LSTM
model4 = keras.models.Sequential()
model4.add(L.InputLayer([None],dtype='int32'))
model4.add(L.Embedding(len(all_words),50))
model4.add(L.Bidirectional(L.LSTM(64,return_sequences=True))) #LSTM
model4.add(L.Dense(len(all_tags),activation='softmax'))

In [18]:
model4.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 50)          500100    
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 128)         58880     
_________________________________________________________________
dense_2 (Dense)              (None, None, 14)          1806      
Total params: 560,786
Trainable params: 560,786
Non-trainable params: 0
_________________________________________________________________


In [19]:
model4.compile('adam','categorical_crossentropy')

model4.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)



Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95334

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.95977

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96345

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96454

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96500



<tensorflow.python.keras.callbacks.History at 0x137be90e550>

In [21]:
acc4 = compute_test_accuracy(model4)
print("\nFinal accuracy: %.5f"%acc4)

assert acc4>0.96223, "last model are better than this!"
print("Well done!")


Final accuracy: 0.96500
Well done!


In [None]:
# Немного вырос accuracy, значительно выросла скорость обучения, так как добавили еще слои - попробуем увеличить кол-во эпох до 10

In [23]:
model5 = keras.models.Sequential()
model5.add(L.InputLayer([None],dtype='int32'))
model5.add(L.Embedding(len(all_words),50))
model5.add(L.Bidirectional(L.LSTM(64,return_sequences=True)))
model5.add(L.Dense(len(all_tags),activation='softmax'))
model5.compile('adam','categorical_crossentropy')
model5.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=10,) # epochs=10

Epoch 1/10

Measuring validation accuracy...

Validation accuracy: 0.95370

Epoch 2/10

Measuring validation accuracy...

Validation accuracy: 0.95984

Epoch 3/10

Measuring validation accuracy...

Validation accuracy: 0.96355

Epoch 4/10

Measuring validation accuracy...

Validation accuracy: 0.96387

Epoch 5/10

Measuring validation accuracy...

Validation accuracy: 0.96521

Epoch 6/10

Measuring validation accuracy...

Validation accuracy: 0.96546

Epoch 7/10

Measuring validation accuracy...

Validation accuracy: 0.96482

Epoch 8/10

Measuring validation accuracy...

Validation accuracy: 0.96437

Epoch 9/10

Measuring validation accuracy...

Validation accuracy: 0.96346

Epoch 10/10

Measuring validation accuracy...

Validation accuracy: 0.96222



<tensorflow.python.keras.callbacks.History at 0x1380b503760>

In [24]:
acc5 = compute_test_accuracy(model5)
print("\nFinal accuracy: %.5f"%acc5)

assert acc5>acc4, "last model are better than this!"
print("Well done!")


Final accuracy: 0.96222


AssertionError: last model are better than this!

In [None]:
# При увеличении числа эпох модель переобучается (loss снижается и val_accuracy тоже) - пока оставим 5 эпох

In [25]:
# Попробуем использовать GRU вместо LSTM
model6 = keras.models.Sequential()
model6.add(L.InputLayer([None],dtype='int32'))
model6.add(L.Embedding(len(all_words),50))
model6.add(L.Bidirectional(L.GRU(64,return_sequences=True))) #GRU
model6.add(L.Dense(len(all_tags),activation='softmax'))
model6.compile('adam','categorical_crossentropy')
model6.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95752

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96256

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96428

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96543

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96623



<tensorflow.python.keras.callbacks.History at 0x1380ead0f70>

In [26]:
acc6 = compute_test_accuracy(model6)
print("\nFinal accuracy: %.5f"%acc6)

assert acc6>acc4, "last model are better than this!" # тк model5 завернули
print("Well done!")


Final accuracy: 0.96623
Well done!


In [27]:
# Немного вырос accuracy по сравнению с LSTM, оставим
# Добавим Dropout
model7 = keras.models.Sequential()
model7.add(L.InputLayer([None],dtype='int32'))
model7.add(L.Embedding(len(all_words),50))
model7.add(L.SpatialDropout1D(0.2))
model7.add(L.Bidirectional(L.GRU(64,return_sequences=True)))
model7.add(L.Dense(len(all_tags),activation='softmax'))
model7.compile('adam','categorical_crossentropy')
model7.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95638

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96215

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96447

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96596

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96642



<tensorflow.python.keras.callbacks.History at 0x13819b8cf70>

In [28]:
acc7 = compute_test_accuracy(model7)
print("\nFinal accuracy: %.5f"%acc7)

assert acc7>acc6, "last model are better than this!"
print("Well done!")


Final accuracy: 0.96642
Well done!


In [40]:
# Немного вырос accuracy, оставим
# Добавим Conv1D
model9 = keras.models.Sequential()
model9.add(L.InputLayer([None],dtype='int32'))
model9.add(L.Embedding(len(all_words),50))
model9.add(L.SpatialDropout1D(0.2))
model9.add(L.Conv1D(64, kernel_size = 1)) #Conv1D
model9.add(L.Bidirectional(L.GRU(64,return_sequences=True)))
model9.add(L.Dropout(0.2))
model9.add(L.Bidirectional(L.GRU(64,return_sequences=True)))
model9.add(L.Dense(len(all_tags),activation='softmax'))

In [41]:
model9.summary()

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_11 (Embedding)     (None, None, 50)          500100    
_________________________________________________________________
spatial_dropout1d_7 (Spatial (None, None, 50)          0         
_________________________________________________________________
conv1d_12 (Conv1D)           (None, None, 64)          3264      
_________________________________________________________________
bidirectional_15 (Bidirectio (None, None, 128)         49920     
_________________________________________________________________
dropout_6 (Dropout)          (None, None, 128)         0         
_________________________________________________________________
bidirectional_16 (Bidirectio (None, None, 128)         74496     
_________________________________________________________________
dense_10 (Dense)             (None, None, 14)        

In [42]:
model9.compile('adam','categorical_crossentropy')
model9.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95803

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96255

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96496

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96642

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96712



<tensorflow.python.keras.callbacks.History at 0x1c6ad686760>

In [44]:
acc9 = compute_test_accuracy(model9)
print("\nFinal accuracy: %.5f"%acc9)

assert acc9>0.96642, "last model are better than this!" #перезапускала ноутбук, чтобы все не прогонять, вставила accuracy с прошлой модели model8
print("Well done!")


Final accuracy: 0.96712
Well done!


In [60]:
model10 = keras.models.Sequential()
model10.add(L.InputLayer([None],dtype='int32'))
model10.add(L.Embedding(len(all_words),50))
model10.add(L.SpatialDropout1D(0.2))
model10.add(L.Conv1D(64, kernel_size = 1)) 
model10.add(L.Bidirectional(L.GRU(64,return_sequences=True, dropout = 0.2))) #dropout
model10.add(L.Bidirectional(L.GRU(64,return_sequences=True, dropout = 0.1))) #dropout
model10.add(L.Dense(len(all_tags),activation='softmax'))

In [61]:
model10.summary()

Model: "sequential_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_17 (Embedding)     (None, None, 50)          500100    
_________________________________________________________________
spatial_dropout1d_13 (Spatia (None, None, 50)          0         
_________________________________________________________________
conv1d_18 (Conv1D)           (None, None, 64)          3264      
_________________________________________________________________
bidirectional_27 (Bidirectio (None, None, 128)         49920     
_________________________________________________________________
bidirectional_28 (Bidirectio (None, None, 128)         74496     
_________________________________________________________________
dense_16 (Dense)             (None, None, 14)          1806      
Total params: 629,586
Trainable params: 629,586
Non-trainable params: 0
_______________________________________________

In [62]:
model10.compile('adam','categorical_crossentropy')

model10.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=10,) #epochs=10

Epoch 1/10

Measuring validation accuracy...

Validation accuracy: 0.95828

Epoch 2/10

Measuring validation accuracy...

Validation accuracy: 0.96340

Epoch 3/10

Measuring validation accuracy...

Validation accuracy: 0.96518

Epoch 4/10

Measuring validation accuracy...

Validation accuracy: 0.96653

Epoch 5/10

Measuring validation accuracy...

Validation accuracy: 0.96729

Epoch 6/10

Measuring validation accuracy...

Validation accuracy: 0.96756

Epoch 7/10

Measuring validation accuracy...

Validation accuracy: 0.96828

Epoch 8/10

Measuring validation accuracy...

Validation accuracy: 0.96808

Epoch 9/10

Measuring validation accuracy...

Validation accuracy: 0.96806

Epoch 10/10

Measuring validation accuracy...

Validation accuracy: 0.96775



<tensorflow.python.keras.callbacks.History at 0x1c6b8885d00>

In [63]:
acc10 = compute_test_accuracy(model10)
print("\nFinal accuracy: %.5f"%acc10)

assert acc10>acc9, "last model are better than this!"
print("Well done!")


Final accuracy: 0.96775
Well done!


In [64]:
model11 = keras.models.Sequential()
model11.add(L.InputLayer([None],dtype='int32'))
model11.add(L.Embedding(len(all_words),50))
model11.add(L.SpatialDropout1D(0.2))
model11.add(L.Conv1D(64, kernel_size = 1)) 
model11.add(L.Bidirectional(L.GRU(128,return_sequences=True, dropout = 0.2))) #128
model11.add(L.Bidirectional(L.GRU(64,return_sequences=True, dropout = 0.1))) #dropout
model11.add(L.Dense(len(all_tags),activation='softmax'))

In [65]:
model11.summary()

Model: "sequential_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_18 (Embedding)     (None, None, 50)          500100    
_________________________________________________________________
spatial_dropout1d_14 (Spatia (None, None, 50)          0         
_________________________________________________________________
conv1d_19 (Conv1D)           (None, None, 64)          3264      
_________________________________________________________________
bidirectional_29 (Bidirectio (None, None, 256)         148992    
_________________________________________________________________
bidirectional_30 (Bidirectio (None, None, 128)         123648    
_________________________________________________________________
dense_17 (Dense)             (None, None, 14)          1806      
Total params: 777,810
Trainable params: 777,810
Non-trainable params: 0
_______________________________________________

In [66]:
model11.compile('adam','categorical_crossentropy')

model11.fit_generator(generate_batches(train_data),len(train_data)/BATCH_SIZE,
                    callbacks=[EvaluateAccuracy()], epochs=10,) #epochs=10

Epoch 1/10

Measuring validation accuracy...

Validation accuracy: 0.95928

Epoch 2/10

Measuring validation accuracy...

Validation accuracy: 0.96347

Epoch 3/10

Measuring validation accuracy...

Validation accuracy: 0.96596

Epoch 4/10

Measuring validation accuracy...

Validation accuracy: 0.96669

Epoch 5/10

Measuring validation accuracy...

Validation accuracy: 0.96756

Epoch 6/10

Measuring validation accuracy...

Validation accuracy: 0.96787

Epoch 7/10

Measuring validation accuracy...

Validation accuracy: 0.96817

Epoch 8/10

Measuring validation accuracy...

Validation accuracy: 0.96822

Epoch 9/10

Measuring validation accuracy...

Validation accuracy: 0.96844

Epoch 10/10

Measuring validation accuracy...

Validation accuracy: 0.96838



<tensorflow.python.keras.callbacks.History at 0x1c6ef2b1520>

In [67]:
acc11 = compute_test_accuracy(model11)
print("\nFinal accuracy: %.5f"%acc11)

assert acc11>acc10, "last model are better than this!"
print("Well done!")


Final accuracy: 0.96838
Well done!


Some tips
Here there are a few more tips on how to improve training that are a bit trickier to impliment. We strongly suggest that you try them after you've got a good initial model.

* Use pre-trained embeddings: you can use pre-trained weights from there to kickstart your Embedding layer.
* Embedding layer has a matrix W (layer.W) which contains word embeddings for each word in the dictionary. You can just overwrite them with tf.assign.
* When using pre-trained embeddings, pay attention to the fact that model's dictionary is different from your own.
* You may want to switch trainable=False for embedding layer in first few epochs as in regular fine-tuning.

* Go beyond SimpleRNN: there's keras.layers.LSTM and keras.layers.GRU
* If you want to use a custom recurrent Cell, read this
* You can also use 1D Convolutions (keras.layers.Conv1D). They are often as good as recurrent layers but with less overfitting.

* Stack more layers: if there is a common motif to this course it's about stacking layers
* You can just add recurrent and 1dconv layers on top of one another and keras will understand it
* Just remember that bigger networks may need more epochs to train

* Regularization: you can apply dropouts as usual but also in an RNN-specific way
* keras.layers.Dropout works inbetween RNN layers
* Recurrent layers also have recurrent_dropout parameter

* Gradient clipping: If your training isn't as stable as you'd like, set clipnorm in your optimizer.
* Which is to say, it's a good idea to watch over your loss curve at each minibatch. Try tensorboard callback or something similar.

* Word Dropout: tl;dr randomly replace words with UNK during training.
* This can also simulate increased amount of unknown words in the test set

* Larger vocabulary: You can obtain greater performance by expanding your model's input dictionary from 5000 to up to every single word!
* Just make sure your model doesn't overfit due to so many parameters.
* Combined with regularizers or pre-trained word-vectors this could be really good cuz right now our model is blind to >5% of words.

* More efficient batching: right now TF spends a lot of time iterating over "0"s
* This happens because batch is always padded to the length of a longest sentence
* You can speed things up by pre-generating batches of similar lengths and feeding it with randomly chosen pre-generated batch.
* This technically breaks the i.i.d. assumption, but it works unless you come up with some insane rnn architectures.

* The most important advice: don't cram in everything at once!
* If you stuff in a lot of modiffications, some of them almost inevitably gonna be detrimental and you'll never know which of them are.
* Try to instead go in small iterations and record experiment results to guide further search.
Good hunting!