# Домашнее задание к лекции «Введение в рекуррентные НС»

## Neural Part Of Speech Tagging

We're now going to solve the same problem of POS tagging with neural networks.
<img src=https://i.stack.imgur.com/6pdIT.png width=320>

From deep learning perspective, this is a task of predicting a sequence of outputs aligned to a sequence of inputs. There are several problems that match this formulation:
* Part Of Speech Tagging -  an auxuliary task for many NLP problems
* Named Entity Recognition - for chat bots and web crawlers
* Protein structure prediction - for bioinformatics

In [2]:
import nltk
import sys
import numpy as np

import tensorflow as tf
import keras
from keras.utils.np_utils import to_categorical

from collections import Counter
from collections import defaultdict

from sklearn.model_selection import train_test_split


from IPython.display import HTML, display

import warnings
warnings.filterwarnings('ignore')

In [3]:
nltk.download('brown')
nltk.download('universal_tagset')
data = nltk.corpus.brown.tagged_sents(tagset='universal')
all_tags = ['#EOS#','#UNK#','ADV', 'NOUN', 'ADP', 'PRON', 'DET', '.', 'PRT', 'VERB', 'X', 'NUM', 'CONJ', 'ADJ']

data = np.array([ [(word.lower(),tag) for word,tag in sentence] for sentence in data ])

[nltk_data] Downloading package brown to
[nltk_data]     C:\Users\Elena\AppData\Roaming\nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package universal_tagset to
[nltk_data]     C:\Users\Elena\AppData\Roaming\nltk_data...
[nltk_data]   Package universal_tagset is already up-to-date!


In [5]:
train_data, test_data = train_test_split(data,test_size=0.25,random_state=42)

In [6]:
from IPython.display import HTML, display
def draw(sentence):
    words,tags = zip(*sentence)
    display(HTML('<table><tr>{tags}</tr>{words}<tr></table>'.format(
                words = '<td>{}</td>'.format('</td><td>'.join(words)),
                tags = '<td>{}</td>'.format('</td><td>'.join(tags)))))
    
    
draw(data[11])
draw(data[10])
draw(data[7])

0,1,2,3,4,5,6,7,8,9,10,11,12,13
NOUN,ADP,NOUN,NOUN,NOUN,NOUN,VERB,ADV,VERB,ADP,DET,ADJ,NOUN,.
,,,,,,,,,,,,,


0,1,2,3,4,5,6,7,8,9,10,11,12,13
PRON,VERB,ADP,DET,NOUN,.,VERB,NOUN,PRT,VERB,.,DET,NOUN,.
,,,,,,,,,,,,,


0,1
NOUN,VERB
,


### Building vocabularies

Just like before, we have to build a mapping from tokens to integer ids. This time around, our model operates on a word level, processing one word per RNN step. This means we'll have to deal with far larger vocabulary.

Luckily for us, we only receive those words as input i.e. we don't have to predict them. This means we can have a large vocabulary for free by using word embeddings.

In [7]:
word_counts = Counter()
for sentence in data:
    words,tags = zip(*sentence)
    word_counts.update(words)

all_words = ['#EOS#','#UNK#'] + list(list(zip(*word_counts.most_common(10000)))[0])

#let's measure what fraction of data words are in the dictionary
print("Coverage = %.5f" % (float(sum(word_counts[w] for w in all_words)) / sum(word_counts.values())))

Coverage = 0.92876


In [8]:
word_to_id = defaultdict(lambda:1, { word: i for i, word in enumerate(all_words) })
tag_to_id = { tag: i for i, tag in enumerate(all_tags)}

In [9]:
def to_matrix(lines, token_to_id, max_len=None, pad=0, dtype='int32', time_major=False):
    """Converts a list of names into rnn-digestable matrix with paddings added after the end"""
    
    max_len = max_len or max(map(len,lines))
    matrix = np.empty([len(lines), max_len],dtype)
    matrix.fill(pad)

    for i in range(len(lines)):
        line_ix = list(map(token_to_id.__getitem__,lines[i]))[:max_len]
        matrix[i,:len(line_ix)] = line_ix

    return matrix.T if time_major else matrix

In [10]:
batch_words, batch_tags = zip(*[zip(*sentence) for sentence in data[-3:]])

print("Word ids:")
print(to_matrix(batch_words, word_to_id))
print("Tag ids:")
print(to_matrix(batch_tags, tag_to_id))

Word ids:
[[   2 3057    5    2 2238 1334 4238 2454    3    6   19   26 1070   69
     8 2088    6    3    1    3  266   65  342    2    1    3    2  315
     1    9   87  216 3322   69 1558    4    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0]
 [  45   12    8  511 8419    6   60 3246   39    2    1    1    3    2
   845    1    3    1    3   10 9910    2    1 3470    9   43    1    1
     3    6    2 1046  385   73 4562    3    9    2    1    1 3250    3
    12   10    2  861 5240   12    8 8936  121    1    4]
 [  33   64   26   12  445    7 7346    9    8 3337    3    1 2811    3
     2  463  572    2    1    1 1649   12    1    4    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0]]
Tag ids:
[[ 6  3  4  6  3  3  9  9  7 12  4  5  9  4  6  3 12  7  9  7  9  8  4  6
   3  7  6 13  3  4  6  3  9  4  3  7  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0

In [11]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50))
model.add(tf.keras.layers.SimpleRNN(64,return_sequences=True))

#add top layer that predicts tag probabilities
stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

__Training:__ in this case we don't want to prepare the whole training dataset in advance. The main cause is that the length of every batch depends on the maximum sentence length within the batch. This leaves us two options: use custom training code as in previous seminar or use generators.

Keras models have a __`model.fit_generator`__ method that accepts a python generator yielding one batch at a time. But first we need to implement such generator:

In [12]:
BATCH_SIZE=32

def generate_batches(sentences,batch_size=BATCH_SIZE,max_len=None,pad=0):
    assert isinstance(sentences,np.ndarray),"Make sure sentences is q numpy array"
    
    while True:
        indices = np.random.permutation(np.arange(len(sentences)))
        for start in range(0,len(indices)-1,batch_size):
            batch_indices = indices[start:start+batch_size]
            batch_words,batch_tags = [],[]
            for sent in sentences[batch_indices]:
                words,tags = zip(*sent)
                batch_words.append(words)
                batch_tags.append(tags)

            batch_words = to_matrix(batch_words,word_to_id,max_len,pad)
            batch_tags = to_matrix(batch_tags,tag_to_id,max_len,pad)

            batch_tags_1hot = to_categorical(batch_tags,len(all_tags)).reshape(batch_tags.shape+(-1,))
            yield batch_words,batch_tags_1hot

__Callbacks:__ Another thing we need is to measure model performance. The tricky part is not to count accuracy after sentence ends (on padding) and making sure we count all the validation data exactly once.

While it isn't impossible to persuade Keras to do all of that, we may as well write our own callback that does that.
Keras callbacks allow you to write a custom code to be ran once every epoch or every minibatch. We'll define one via LambdaCallback

In [13]:
def compute_test_accuracy(model):
    test_words,test_tags = zip(*[zip(*sentence) for sentence in test_data])
    test_words,test_tags = to_matrix(test_words,word_to_id),to_matrix(test_tags,tag_to_id)

    #predict tag probabilities of shape [batch,time,n_tags]
    predicted_tag_probabilities = model.predict(test_words,verbose=1)
    predicted_tags = predicted_tag_probabilities.argmax(axis=-1)
    
    #compute accurary excluding padding
    numerator = np.sum(np.logical_and((predicted_tags == test_tags),(test_words != 0)))
    denominator = np.sum(test_words != 0)
    return float(numerator)/denominator


class EvaluateAccuracy(keras.callbacks.Callback):
    def on_epoch_end(self,epoch,logs=None):
        sys.stdout.flush()
        print("\nMeasuring validation accuracy...")
        acc = compute_test_accuracy(self.model)
        print("\nValidation accuracy: %.5f\n"%acc)
        sys.stdout.flush()

In [318]:
model.compile('adam','categorical_crossentropy')

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.94026

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.94374

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.94509

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.94589

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.94524



<keras.callbacks.History at 0x285373e7940>

In [16]:
acc = compute_test_accuracy(model)
print("Final accuracy: %.5f"%acc)

assert acc>0.94, "Keras has gone on a rampage again, please contact course staff."

Final accuracy: 0.94565


### Going bidirectional (Задание 1)

Since we're analyzing a full sequence, it's legal for us to look into future data.

A simple way to achieve that is to go both directions at once, making a __bidirectional RNN__.

In Keras you can achieve that both manually (using two LSTMs and Concatenate) and by using __`keras.layers.Bidirectional`__. 

This one works just as `TimeDistributed` we saw before: you wrap it around a recurrent layer (SimpleRNN now and LSTM/GRU later) and it actually creates two layers under the hood.

Your first task is to use such a layer our POS-tagger.

In [17]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.SimpleRNN(64,return_sequences=True)))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [18]:
model.compile('adam','categorical_crossentropy')

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95649

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96094

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96288

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96315

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96148



<keras.callbacks.History at 0x2868ec04490>

Measure final accuracy on the whole test set.

In [19]:
acc = compute_test_accuracy(model)
print("\nFinal accuracy: %.5f"%acc)

assert acc>0.96, "Bidirectional RNNs are better than this!"
print("Well done!")


Final accuracy: 0.96148
Well done!


### Task I: Structured loss functions (more bonus points) (Задание 2)

Since we're tagging the whole sequence at once, we might as well train our network to do so. Remember linear CRF from the lecture? You can also use it as a loss function for your RNN


  * There's more than one way to do so, but we'd recommend starting with [Conditional Random Fields](http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/)
  * You can plug CRF as a loss function and still train by backprop. There's even some neat tensorflow [implementation](https://www.tensorflow.org/addons/api_docs/python/tfa/layers/CRF) for you.
  * Alternatively, you can condition your model on previous tags (make it autoregressive) and perform __beam search__ over that model.

Так как задание необязательное, я его пропускаю. 

Лучшая моя попытка подключить CRF выглядит так:

In [None]:
import tensorflow_addons as tfa

In [None]:
inputs = tf.keras.layers.Input([None], dtype='int32')
x = tf.keras.layers.Embedding(len(all_words), 50)(inputs)
x = tf.keras.layers.SimpleRNN(64, return_sequences=True)(x)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(len(all_tags), activation='softmax'))(x)  

crf = tfa.CRF(len(all_tags))

decoded_sequence, potentials, sequence_length, chain_kernel = crf(x)

model = tf.keras.Model(inputs=inputs, outputs=potentials, name="our_first_model")

model.add_loss(tf.abs(tf.reduce_mean(inputs)))

model.summary()

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

In [None]:
model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=2,)

Слой добавлен, но с моделью явно что-то не так. Скорее всего я либо неправильно передаю функцию в add_loss, либо outputs при создании модели должен быть другой. В документации tfa.CRF нет нормального примера (либо я не смогла его применить).

Буду благодарна за ссылку, где можно посмотреть нормальный пример, как правильно настраивать tfa.CRF (или хотя  бы куда копать)

#### Some tips
Here there are a few more tips on how to improve training that are a bit trickier to impliment. We strongly suggest that you try them _after_ you've got a good initial model.
* __Use pre-trained embeddings__: you can use pre-trained weights from [there](http://ahogrammer.com/2017/01/20/the-list-of-pretrained-word-embeddings/) to kickstart your Embedding layer.
  * Embedding layer has a matrix W (layer.W) which contains word embeddings for each word in the dictionary. You can just overwrite them with tf.assign.
  * When using pre-trained embeddings, pay attention to the fact that model's dictionary is different from your own.
  * You may want to switch trainable=False for embedding layer in first few epochs as in regular fine-tuning.  
* __Go beyond SimpleRNN__: there's `keras.layers.LSTM` and `keras.layers.GRU`
  * If you want to use a custom recurrent Cell, read [this](https://keras.io/layers/recurrent/#rnn)
  * You can also use 1D Convolutions (`keras.layers.Conv1D`). They are often as good as recurrent layers but with less overfitting.
* __Stack more layers__: if there is a common motif to this course it's about stacking layers
  * You can just add recurrent and 1dconv layers on top of one another and keras will understand it
  * Just remember that bigger networks may need more epochs to train
* __Regularization__: you can apply dropouts as usual but also in an RNN-specific way
  * `keras.layers.Dropout` works inbetween RNN layers
  * Recurrent layers also have `recurrent_dropout` parameter
* __Gradient clipping__: If your training isn't as stable as you'd like, set `clipnorm` in your optimizer.
  * Which is to say, it's a good idea to watch over your loss curve at each minibatch. Try tensorboard callback or something similar.
* __Word Dropout__: tl;dr randomly replace words with UNK during training. 
  * This can also simulate increased amount of unknown words in the test set
* __Larger vocabulary__: You can obtain greater performance by expanding your model's input dictionary from 5000 to up to every single word!
  * Just make sure your model doesn't overfit due to so many parameters.
  * Combined with regularizers or pre-trained word-vectors this could be really good cuz right now our model is blind to >5% of words.  
* __More efficient batching__: right now TF spends a lot of time iterating over "0"s
  * This happens because batch is always padded to the length of a longest sentence
  * You can speed things up by pre-generating batches of similar lengths and feeding it with randomly chosen pre-generated batch.
  * This technically breaks the i.i.d. assumption, but it works unless you come up with some insane rnn architectures.
* __The most important advice__: don't cram in everything at once!
  * If you stuff in a lot of modiffications, some of them almost inevitably gonna be detrimental and you'll never know which of them are.
  * Try to instead go in small iterations and record experiment results to guide further search.
    
Good hunting!


### LSTM

Для чистоты эксперимента пока без Bidirectional

In [319]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50))
model.add(tf.keras.layers.LSTM(64,return_sequences=True))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [320]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.93931

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.94527

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.94729

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.94884

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.94913



<keras.callbacks.History at 0x28531c2ad90>

LSTM показала результат чуть лучше, чем RNN, но обучилась намного быстрее

### GRU

In [322]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50))
model.add(tf.keras.layers.GRU(64, return_sequences=True))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [323]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.93931

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.94583

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.94763

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.94891

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.94952



<keras.callbacks.History at 0x2869f3ad6d0>

GRU сеть показала такой же результат, как LSTM

### LSTM / trainable=True

In [39]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50, trainable=True))
model.add(tf.keras.layers.LSTM(64,return_sequences=True))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [42]:
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, None, 50)          500100    
_________________________________________________________________
lstm_3 (LSTM)                (None, None, 64)          29440     
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 14)          910       
Total params: 530,450
Trainable params: 530,450
Non-trainable params: 0
_________________________________________________________________


In [325]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.93959

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.94468

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.94765

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.94842

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.94903



<keras.callbacks.History at 0x286c4647b50>

Результат даже немного ухудшился..

### LSTM / trainable=True / Bidirectional

In [38]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(len(all_words),50, trainable=True))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [23]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95417

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96013

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96309

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96405

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96547



<keras.callbacks.History at 0x28af93de8e0>

Bidirectional по прежнему работает

### LSTM / trainable=True / Bidirectional / pre-trained embeddings

In [15]:
from keras.preprocessing.text import Tokenizer
from gensim.models import KeyedVectors

In [16]:
X = []

for sentence in train_data:
    X_sentence = []
    for entity in sentence:         
        X_sentence.append(entity[0])
        
    X.append(X_sentence)

In [17]:
word_tokenizer = Tokenizer()
word_tokenizer.fit_on_texts(X) 

In [18]:
path = 'C:/Users/Elena/gensim-data/word2vec-google-news-300/GoogleNews-vectors-negative300.bin'

word2vec = KeyedVectors.load_word2vec_format(path, binary=True)

In [19]:
EMBEDDING_SIZE  = 300
VOCABULARY_SIZE = len(word_tokenizer.word_index) + 1

embedding_weights = np.zeros((VOCABULARY_SIZE, EMBEDDING_SIZE))

word2id = word_tokenizer.word_index

for word, index in word2id.items():
    try:
        embedding_weights[index, :] = word2vec[word]
    except KeyError:
        pass

In [34]:
embedding_weights.shape

(43851, 300)

In [43]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(VOCABULARY_SIZE, EMBEDDING_SIZE, trainable=True, weights=[embedding_weights]))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))

stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [45]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=10,)

Epoch 1/10

Measuring validation accuracy...

Validation accuracy: 0.96536

Epoch 2/10

Measuring validation accuracy...

Validation accuracy: 0.96432

Epoch 3/10

Measuring validation accuracy...

Validation accuracy: 0.96394

Epoch 4/10

Measuring validation accuracy...

Validation accuracy: 0.96208

Epoch 5/10

Measuring validation accuracy...

Validation accuracy: 0.96179

Epoch 6/10

Measuring validation accuracy...

Validation accuracy: 0.96200

Epoch 7/10

Measuring validation accuracy...

Validation accuracy: 0.96058

Epoch 8/10

Measuring validation accuracy...

Validation accuracy: 0.96016

Epoch 9/10

Measuring validation accuracy...

Validation accuracy: 0.96050

Epoch 10/10

Measuring validation accuracy...

Validation accuracy: 0.96032



<keras.callbacks.History at 0x28bec3c7550>

Что-то пошло не так.. Пока оставим

### Добавлю ещё один слой Bidirectional

In [46]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(VOCABULARY_SIZE, EMBEDDING_SIZE, trainable=True, weights=[embedding_weights]))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [47]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95942

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96371

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96514

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96703

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96708



<keras.callbacks.History at 0x28c0c6d64c0>

Стало чуть лучше

### Добавлю Dropout1D

In [48]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(VOCABULARY_SIZE, EMBEDDING_SIZE, trainable=True, weights=[embedding_weights]))
model.add(tf.keras.layers.SpatialDropout1D(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [49]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=5,)

Epoch 1/5

Measuring validation accuracy...

Validation accuracy: 0.95965

Epoch 2/5

Measuring validation accuracy...

Validation accuracy: 0.96418

Epoch 3/5

Measuring validation accuracy...

Validation accuracy: 0.96659

Epoch 4/5

Measuring validation accuracy...

Validation accuracy: 0.96707

Epoch 5/5

Measuring validation accuracy...

Validation accuracy: 0.96774



<keras.callbacks.History at 0x28bec3c7e80>

Модель не улучшилась, но и хуже не стало

### Добавлю свёртку

In [20]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input([None],dtype='int32'))
model.add(tf.keras.layers.Embedding(VOCABULARY_SIZE, EMBEDDING_SIZE, trainable=True, weights=[embedding_weights]))
model.add(tf.keras.layers.SpatialDropout1D(0.2))
model.add(tf.keras.layers.Conv1D(128, kernel_size=1))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True)))
stepwise_dense = tf.keras.layers.Dense(len(all_tags),activation='softmax')
stepwise_dense = tf.keras.layers.TimeDistributed(stepwise_dense)
model.add(stepwise_dense)

In [22]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
)

model.fit_generator(generate_batches(train_data), len(train_data)/BATCH_SIZE, callbacks=[EvaluateAccuracy()], epochs=10,)

Epoch 1/10

Measuring validation accuracy...

Validation accuracy: 0.96771

Epoch 2/10

Measuring validation accuracy...

Validation accuracy: 0.96788

Epoch 3/10

Measuring validation accuracy...

Validation accuracy: 0.96705

Epoch 4/10

Measuring validation accuracy...

Validation accuracy: 0.96796

Epoch 5/10

Measuring validation accuracy...

Validation accuracy: 0.96726

Epoch 6/10

Measuring validation accuracy...

Validation accuracy: 0.96698

Epoch 7/10

Measuring validation accuracy...

Validation accuracy: 0.96707

Epoch 8/10

Measuring validation accuracy...

Validation accuracy: 0.96655

Epoch 9/10

Measuring validation accuracy...

Validation accuracy: 0.96644

Epoch 10/10

Measuring validation accuracy...

Validation accuracy: 0.96600



<keras.callbacks.History at 0x265dd511f70>

По итогу, удалось улучшить качесвто модели, но не сильно