# Make LSTMs Great Again

# Named Entity Recognition on Twitter Data

## Reading the Data

Corpus contains tweets and named entity tags. A line in corpus is a token with a tag separated by a space.

Different tweets are separated by a new line.

Replace usernames that starts with @ with USR and url that starts with 'http:// || https://' with URL

In [1]:
def read_data(file_path):
    tokens = [] # List of list of words in a tweet, for all tweets
    tags = [] # List of list of tags in a tweet, for all tags corresponding to the tweet
    
    tweet_tokens = []
    tweet_tags = []
    for line in open(file_path, encoding='utf-8'):  
        line = line.strip() # remove leading and trailing space
        if not line:
            if tweet_tokens:
                tokens.append(tweet_tokens)
                tags.append(tweet_tags)
            tweet_tokens = []
            tweet_tags = []
        else:
            token, tag = line.split()
            if token.startswith("@"):
                token="<USR>" # Replace username with <USR>
            elif token.startswith("http://") or token.startswith("https://"):
                token="<URL>" # Replace links with <URL>
            tweet_tokens.append(token)
            tweet_tags.append(tag)
            
    return tokens, tags

### Loading the Train, Validation and Test Data

In [2]:
train_tokens, train_tags = read_data('Data/train.txt')
validation_tokens, validation_tags = read_data('Data/validation.txt')
test_tokens, test_tags = read_data('Data/test.txt')

### Exploring the Data

In [3]:
for word in train_tokens[0]: print(word, end=" ")

RT <USR> : Online ticket sales for Ghostland Observatory extended until 6 PM EST due to high demand . Get them before they sell out ... 

In [4]:
for tag in train_tags[0]: print(tag, end=" ")

O O O O O O O B-musicartist I-musicartist O O O O O O O O O O O O O O O O O 

Each element loaded to train tokens is a tweet, which in turn is a list of words.

In [5]:
print("We have", len(train_tokens), "tweets")

We have 5795 tweets


Checking for missing tags/tweet in the test, train and validation sets

In [6]:
if len(train_tokens) != len(train_tags): print("train mismatch")
elif len(validation_tokens) != len(validation_tags): print("validation mismatch")
elif len(test_tokens) != len(test_tags): print("test mismatch")
else: 
    all_right = 1
    print("Data all set")
assert all_right == 1

Data all set


## Preparing the Dictionaries

We need 2 mappings for training the NN

1. token --> tokenID
2. tag --> tagID

tokenID addresses the row in the embedding matrix

tagID is the ID of the tag - to getDummy

In [7]:
from collections import defaultdict

def build_dict(tokens_or_tags, special_tokens):
#     tokens_or_tags is list of list of tokens/tags
#     special_tokens are some special tokens
    # Create a dict with default value 0
    tok2idx = defaultdict(lambda: 0)
    idx2tok = []
    k = 0
    
    for line in special_tokens:
        tok2idx[line] = k
        k += 1
        idx2tok.append(line)
        
    for tokens in tokens_or_tags:
        for token in tokens:
            if token not in tok2idx:
                tok2idx[token] = k
                k += 1
                idx2tok.append(token)
    return tok2idx, idx2tok

Special Tokens:

UNK : Unknown tokens - the ones found outside of the vocabulary

PAD : Padding the sentence to the same length to create batches of sentence

In [8]:
special_tokens = ['<UNK>', '<PAD>']
special_tags = ['O']

# Create the Dictionaries
token2idx, idx2token = build_dict(train_tokens + validation_tokens, special_tokens) # for tokens
tag2idx, idx2tag = build_dict(train_tags, special_tags) # for tags

In [9]:
tag2idx

defaultdict(<function __main__.build_dict.<locals>.<lambda>()>,
            {'O': 0,
             'B-musicartist': 1,
             'I-musicartist': 2,
             'B-product': 3,
             'I-product': 4,
             'B-company': 5,
             'B-person': 6,
             'B-other': 7,
             'I-other': 8,
             'B-facility': 9,
             'I-facility': 10,
             'B-sportsteam': 11,
             'B-geo-loc': 12,
             'I-geo-loc': 13,
             'I-company': 14,
             'I-person': 15,
             'B-movie': 16,
             'I-movie': 17,
             'B-tvshow': 18,
             'I-tvshow': 19,
             'I-sportsteam': 20})

We have 19 named entity tag along with a non_named_entity tag denoted by O

Function to create the mapping between tokens and IDs for a sentence

In [10]:
def words2idxs(tokens_list):
    return [token2idx[word] for word in tokens_list]

def tags2idxs(tags_list):
    return [tag2idx[tag] for tag in tags_list]

def idxs2words(idxs):
    return [idx2token[idx] for idx in idxs]

def idxs2tags(idxs):
    return [idx2tag[idx] for idx in idxs]

## Generating Batches

In [11]:
def batches_generator(batch_size, tokens, tags,
                      shuffle=True, allow_smaller_last_batch=True):
    """Generates padded batches of tokens and tags."""
    
    n_samples = len(tokens)
    if shuffle:
        order = np.random.permutation(n_samples)
    else:
        order = np.arange(n_samples)

    n_batches = n_samples // batch_size
    if allow_smaller_last_batch and n_samples % batch_size:
        n_batches += 1

    for k in range(n_batches):
        batch_start = k * batch_size
        batch_end = min((k + 1) * batch_size, n_samples)
        current_batch_size = batch_end - batch_start
        x_list = []
        y_list = []
        max_len_token = 0
        for idx in order[batch_start: batch_end]:
            x_list.append(words2idxs(tokens[idx]))
            y_list.append(tags2idxs(tags[idx]))
            max_len_token = max(max_len_token, len(tags[idx]))
            
        # Fill in the data into numpy nd-arrays filled with padding indices.
        x = np.ones([current_batch_size, max_len_token], dtype=np.int32) * token2idx['<PAD>']
        y = np.ones([current_batch_size, max_len_token], dtype=np.int32) * tag2idx['O']
        lengths = np.zeros(current_batch_size, dtype=np.int32)
        for n in range(current_batch_size):
            utt_len = len(x_list[n])
            x[n, :utt_len] = x_list[n]
            lengths[n] = utt_len
            y[n, :utt_len] = y_list[n]
        yield x, y, lengths

## Building a Bidirectional RNN with Tensorflow

We need both right and left context of a token. Hence I am using Bidirectional RNN.

In [12]:
import tensorflow as tf
import numpy as np

class BiLSTMModel():
    pass

### Creating the Placeholders

Placeholders are created for the following data we need to input into the RNN: -------

input_batch — sequences of words (the shape equals to [batch_size, sequence_len]);

ground_truth_tags — sequences of tags (the shape equals to [batch_size, sequence_len]);

lengths — lengths of not padded sequences (the shape equals to [batch_size]);

dropout_ph — dropout keep probability; this placeholder has a predefined value 1;

learning_rate_ph — learning rate; we need this placeholder because we want to change the value during training.


In [13]:
# Function that declares the placeholders to be fed into the model
def declare_placeholders(self):
    # Placeholders for input and ground truth output.
    self.input_batch = tf.placeholder(dtype=tf.int32, shape=[None, None], name='input_batch') 
    self.ground_truth_tags = tf.placeholder(dtype=tf.int32, shape=[None, None], name='ground_truth_tags')
  
    # Placeholder for lengths of the sequences.
    self.lengths = tf.placeholder(dtype=tf.int32, shape=[None], name='lengths') 
    
    # Placeholder for a dropout keep probability. default set to 0
    self.dropout_ph = tf.placeholder_with_default(tf.cast(1.0, tf.float32), shape=[])
    
    # Placeholder for a learning rate (tf.float32).
    self.learning_rate_ph = tf.placeholder(dtype=tf.float32, shape=[], name='learning_rate_placeholder')

In [14]:
# add the declare placeholder function to the BiLSTMModel class
BiLSTMModel.__declare_placeholders = classmethod(declare_placeholders)

### Preparatory Steps for Tensorflow BiRNN Cells

1. Embedding matrix (tensorflow Variable) with random initialization

2. Create Bidirectional RNN Cells - Using BasicLSTM Cell

3. Wrap cells with Dropout Wrapper

4. Build Computational Graph
    
    Look up embeddings for input batch in the embedding matrix
    
    Pass embeddings through Bidirectional Dynamic RNN with specified forward and backwards cells
    (Use lengths placeholders to avoid computation on padding tokens)
    
    Create a dense layer on top with outputs set to loss function

In [15]:
def build_layers(self, vocabulary_size, embedding_dim, n_hidden_rnn, n_tags):
    # Embedding matrix:
    initial_embedding_matrix = np.random.randn(vocabulary_size, embedding_dim) / np.sqrt(embedding_dim)
    # Making it a tf var
    embedding_matrix_variable = tf.Variable(dtype=tf.float32, initial_value=initial_embedding_matrix,
                                           name='embeddings_matrix')
    # Forward and Backward LSTM cells
    
    forward_cell = tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn),
                                                 input_keep_prob=self.dropout_ph,
                                                 output_keep_prob=self.dropout_ph,
                                                 state_keep_prob=self.dropout_ph)
    backward_cell = tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn),
                                              input_keep_prob=self.dropout_ph,
                                              output_keep_prob=self.dropout_ph,
                                              state_keep_prob=self.dropout_ph)
    
#     forward_cell =  tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn),input_keep_prob=self.dropout_ph, output_keep_prob=self.dropout_ph, state_keep_prob=self.dropout_ph)
#     backward_cell = tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicLSTMCell(num_units=n_hidden_rnn),input_keep_prob=self.dropout_ph, output_keep_prob=self.dropout_ph, state_keep_prob=self.dropout_ph)
    # Look up for embeddings for self.input_batch
    embeddings = tf.nn.embedding_lookup(embedding_matrix_variable, self.input_batch)
    
    # Pass them (fw, bw cells; embeddings, length) through DynBiRNN
    (rnn_output_fw, rnn_output_bw), _ = tf.nn.bidirectional_dynamic_rnn(cell_fw=forward_cell,
                                                                        cell_bw=backward_cell,
                                                                        inputs=embeddings,
                                                                        sequence_length=self.lengths,
                                                                        dtype=tf.float32)
    rnn_output = tf.concat([rnn_output_fw, rnn_output_bw], axis=2)
    
    # Dense layer on top
    self.logits = tf.layers.dense(rnn_output, n_tags, activation=None)

In [16]:
BiLSTMModel.__build_layers = classmethod(build_layers)

Applying softmax to the last layer and applying argmax to get actual predictions

In [17]:
def compute_predictions(self):
    softmax_output = tf.nn.softmax(self.logits)
    self.predictions = tf.argmax(softmax_output, axis=-1)
    
BiLSTMModel.__compute_predictions = classmethod(compute_predictions)

### Loss functions

We use the cross entropy loss with logits(not softmax probabilities). Also mask PAD terms before computing mean.

In [18]:
def compute_loss(self, n_tags, PAD_index):
    ground_truth_tags_one_hot = tf.one_hot(self.ground_truth_tags, n_tags)
    loss_tensor = tf.nn.softmax_cross_entropy_with_logits_v2(labels=ground_truth_tags_one_hot,
                                                             logits=self.logits)
    mask = tf.cast(tf.not_equal(loss_tensor, PAD_index), tf.float32)
    self.loss = tf.reduce_mean(tf.multiply(loss_tensor, mask))
    
BiLSTMModel.__compute_loss = classmethod(compute_loss)

### Optimizers

Using Adam Optimimzer with the default B1, B2, epsilon and user defined eta.
To prevent exploding gradient, using clipping from clip_by_norm

In [19]:
def perform_optimization(self):
    # Creating optimzer with Adam (default Params)
    self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate_ph, beta1=0.9, beta2=0.999, epsilon=1e-8)
    self.grads_and_vars = self.optimizer.compute_gradients(self.loss)
    # Gradient Clipping
    clip_norm = tf.cast(1.0, tf.float32)
    # I didnt write this bottom epic one liner
    self.grads_and_vars =  [(None, var) if grad is None else (tf.clip_by_norm(grad, clip_norm), var) for grad, var in self.grads_and_vars]    
    self.train_op = self.optimizer.apply_gradients(self.grads_and_vars)
    
BiLSTMModel.__perform_optimization = classmethod(perform_optimization)

### Class Constructor

In [20]:
def init_model(self, vocabulary_size, n_tags, embedding_dim, n_hidden_rnn, PAD_index):
    self.__declare_placeholders()
    self.__build_layers(vocabulary_size, embedding_dim, n_hidden_rnn, n_tags)
    self.__compute_predictions()
    self.__compute_loss(n_tags, PAD_index)
    self.__perform_optimization()

BiLSTMModel.__init__ = classmethod(init_model)

## Training the Network and Predicting the Named Entity Tags

In [21]:
# Function to train on a batch
def train_on_batch(self, session, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability):
    feed_dict = {self.input_batch: x_batch,
                 self.ground_truth_tags: y_batch,
                 self.learning_rate_ph: learning_rate,
                 self.dropout_ph: dropout_keep_probability,
                 self.lengths: lengths}
    
    session.run(self.train_op, feed_dict=feed_dict)
BiLSTMModel.train_on_batch = classmethod(train_on_batch)

# Function to predict from a batch
def predict_for_batch(self, session, x_batch, lengths):
    feed_dict={self.input_batch:x_batch,
               self.lengths:lengths}
    predictions = session.run(self.predictions, feed_dict=feed_dict)
    return predictions
BiLSTMModel.predict_for_batch = classmethod(predict_for_batch)

## Model Evaluation Functions

In [25]:
## CODE TAKEN FROM
## https://github.com/hse-aml/natural-language-processing/blob/master/week2/evaluation.py
from collections import OrderedDict

def _update_chunk(candidate, prev, current_tag, current_chunk, current_pos, prediction=False):
    if candidate == 'B-' + current_tag:
        if len(current_chunk) > 0 and len(current_chunk[-1]) == 1:
                current_chunk[-1].append(current_pos - 1)
        current_chunk.append([current_pos])
    elif candidate == 'I-' + current_tag:
        if prediction and (current_pos == 0 or current_pos > 0 and prev.split('-', 1)[-1] != current_tag):
            current_chunk.append([current_pos])
        if not prediction and (current_pos == 0 or current_pos > 0 and prev == 'O'):
            current_chunk.append([current_pos])
    elif current_pos > 0 and prev.split('-', 1)[-1] == current_tag:
        if len(current_chunk) > 0:
            current_chunk[-1].append(current_pos - 1)

def _update_last_chunk(current_chunk, current_pos):
    if len(current_chunk) > 0 and len(current_chunk[-1]) == 1:
        current_chunk[-1].append(current_pos - 1)

def _tag_precision_recall_f1(tp, fp, fn):
    precision, recall, f1 = 0, 0, 0
    if tp + fp > 0:
        precision = tp / (tp + fp) * 100
    if tp + fn > 0:
        recall = tp / (tp + fn) * 100
    if precision + recall > 0:
        f1 = 2 * precision * recall / (precision + recall)
    return precision, recall, f1

def _aggregate_metrics(results, total_correct):
    total_true_entities = 0
    total_predicted_entities = 0
    total_precision = 0
    total_recall = 0
    total_f1 = 0
    for tag, tag_metrics in results.items():
        n_pred = tag_metrics['n_predicted_entities']
        n_true = tag_metrics['n_true_entities']
        total_true_entities += n_true
        total_predicted_entities += n_pred
        total_precision += tag_metrics['precision'] * n_pred
        total_recall += tag_metrics['recall'] * n_true
    
    accuracy = 0
    if total_true_entities > 0:
        accuracy = total_correct / total_true_entities * 100
    else:
        print('CAUTION! Accuracy equals zero because there are no '\
              'correct entities. Check the correctness of your data.')
    if total_predicted_entities > 0:
        total_precision = total_precision / total_predicted_entities
    total_recall = total_recall / total_true_entities
    if total_precision + total_recall > 0:
        total_f1 = 2 * total_precision * total_recall / (total_precision + total_recall)
    return total_true_entities, total_predicted_entities, \
           total_precision, total_recall, total_f1, accuracy

def _print_info(n_tokens, total_true_entities, total_predicted_entities, total_correct):
    print('processed {len} tokens ' \
          'with {tot_true} phrases; ' \
          'found: {tot_pred} phrases; ' \
          'correct: {tot_cor}.\n'.format(len=n_tokens,
                                         tot_true=total_true_entities,
                                         tot_pred=total_predicted_entities,
                                         tot_cor=total_correct))

def _print_metrics(accuracy, total_precision, total_recall, total_f1):
    print('precision:  {tot_prec:.2f}%; ' \
          'recall:  {tot_recall:.2f}%; ' \
          'F1:  {tot_f1:.2f}\n'.format(acc=accuracy,
                                           tot_prec=total_precision,
                                           tot_recall=total_recall,
                                           tot_f1=total_f1))

def _print_tag_metrics(tag, tag_results):
    print(('\t%12s' % tag) + ': precision:  {tot_prec:6.2f}%; ' \
                               'recall:  {tot_recall:6.2f}%; ' \
                               'F1:  {tot_f1:6.2f}; ' \
                               'predicted:  {tot_predicted:4d}\n'.format(tot_prec=tag_results['precision'],
                                                                         tot_recall=tag_results['recall'],
                                                                         tot_f1=tag_results['f1'],
                                                                         tot_predicted=tag_results['n_predicted_entities']))

def precision_recall_f1(y_true, y_pred, print_results=True, short_report=False):
    # Find all tags
    tags = sorted(set(tag[2:] for tag in y_true + y_pred if tag != 'O'))

    results = OrderedDict((tag, OrderedDict()) for tag in tags)
    n_tokens = len(y_true)
    total_correct = 0

    # For eval_conll_try we find all chunks in the ground truth and prediction
    # For each chunk we store starting and ending indices
    for tag in tags:
        true_chunk = list()
        predicted_chunk = list()
        for position in range(n_tokens):
            _update_chunk(y_true[position], y_true[position - 1], tag, true_chunk, position)
            _update_chunk(y_pred[position], y_pred[position - 1], tag, predicted_chunk, position, True)

        _update_last_chunk(true_chunk, position)
        _update_last_chunk(predicted_chunk, position)

        # Then we find all correctly classified intervals
        # True positive results
        tp = sum(chunk in predicted_chunk for chunk in true_chunk)
        total_correct += tp

        # And then just calculate errors of the first and second kind
        # False negative
        fn = len(true_chunk) - tp
        # False positive
        fp = len(predicted_chunk) - tp
        precision, recall, f1 = _tag_precision_recall_f1(tp, fp, fn)

        results[tag]['precision'] = precision
        results[tag]['recall'] = recall
        results[tag]['f1'] = f1
        results[tag]['n_predicted_entities'] = len(predicted_chunk)
        results[tag]['n_true_entities'] = len(true_chunk)

    total_true_entities, total_predicted_entities, \
           total_precision, total_recall, total_f1, accuracy = _aggregate_metrics(results, total_correct)

    if print_results:
        _print_info(n_tokens, total_true_entities, total_predicted_entities, total_correct)
        _print_metrics(accuracy, total_precision, total_recall, total_f1)

        if not short_report:
            for tag, tag_results in results.items():
                _print_tag_metrics(tag, tag_results)
    return results

def predict_tags(model, session, token_idxs_batch, lengths):
    """Performs predictions and transforms indices to tokens and tags."""
    
    tag_idxs_batch = model.predict_for_batch(session, token_idxs_batch, lengths)
    
    tags_batch, tokens_batch = [], []
    for tag_idxs, token_idxs in zip(tag_idxs_batch, token_idxs_batch):
        tags, tokens = [], []
        for tag_idx, token_idx in zip(tag_idxs, token_idxs):
            tags.append(idx2tag[tag_idx])
            tokens.append(idx2token[token_idx])
        tags_batch.append(tags)
        tokens_batch.append(tokens)
        
    return tags_batch, tokens_batch
    
    
def eval_conll(model, session, tokens, tags, short_report=True):
    """Computes NER quality measures using CONLL shared task script."""
    
    y_true, y_pred = [], []
    for x_batch, y_batch, lengths in batches_generator(1, tokens, tags):
        tags_batch, tokens_batch = predict_tags(model, session, x_batch, lengths)
        if len(x_batch[0]) != len(tags_batch[0]):
            raise Exception("Incorrect length of prediction for the input, "
                            "expected length: %i, got: %i" % (len(x_batch[0]), len(tags_batch[0])))
        predicted_tags = []
        ground_truth_tags = []
        for gt_tag_idx, pred_tag, token in zip(y_batch[0], tags_batch[0], tokens_batch[0]): 
            if token != '<PAD>':
                ground_truth_tags.append(idx2tag[gt_tag_idx])
                predicted_tags.append(pred_tag)

        # We extend every prediction and ground truth sequence with 'O' tag
        # to indicate a possible end of entity.
        y_true.extend(ground_truth_tags + ['O'])
        y_pred.extend(predicted_tags + ['O'])
        
    results = precision_recall_f1(y_true, y_pred, print_results=True, short_report=short_report)
    return results

## Running the training and Prediction

In [23]:
tf.reset_default_graph()

model = BiLSTMModel(len(idx2token),len(idx2tag),200,200,token2idx['<PAD>'])

batch_size = 32
n_epochs = 5
learning_rate = 0.007
learning_rate_decay = np.sqrt(2)
dropout_keep_probability = 0.5

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use keras.layers.dense instead.


### Train the Net

In [26]:
%%time

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print('Start training... \n')
for epoch in range(n_epochs):
    # For each epoch evaluate the model on train and validation data
    print('-' * 20 + ' Epoch {} '.format(epoch+1) + 'of {} '.format(n_epochs) + '-' * 20)
    print('Train data evaluation:')
    eval_conll(model, sess, train_tokens, train_tags, short_report=True)
    print('Validation data evaluation:')
    eval_conll(model, sess, validation_tokens, validation_tags, short_report=True)
    
    # Train the model
    for x_batch, y_batch, lengths in batches_generator(batch_size, train_tokens, train_tags):
        model.train_on_batch(sess, x_batch, y_batch, lengths, learning_rate, dropout_keep_probability)
        
    # Decaying the learning rate
    learning_rate = learning_rate / learning_rate_decay
    
print('...training finished.')

Start training... 

-------------------- Epoch 1 of 5 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 80755 phrases; correct: 204.

precision:  0.25%; recall:  4.54%; F1:  0.48

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 9701 phrases; correct: 30.

precision:  0.31%; recall:  5.59%; F1:  0.59

-------------------- Epoch 2 of 5 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 1793 phrases; correct: 561.

precision:  31.29%; recall:  12.50%; F1:  17.86

Validation data evaluation:
processed 12836 tokens with 537 phrases; found: 163 phrases; correct: 59.

precision:  36.20%; recall:  10.99%; F1:  16.86

-------------------- Epoch 3 of 5 --------------------
Train data evaluation:
processed 105778 tokens with 4489 phrases; found: 4570 phrases; correct: 2282.

precision:  49.93%; recall:  50.84%; F1:  50.38

Validation data evaluation:
processed 12836 tokens with 537 p

## Evaluation on Test Set

In [27]:
print('-' * 20 + ' Train set quality: ' + '-' * 20)
train_results = eval_conll(model, sess, train_tokens, train_tags, short_report=False)

print('-' * 20 + ' Validation set quality: ' + '-' * 20)
validation_results =eval_conll(model, sess, validation_tokens, validation_tags, short_report=False)

print('-' * 20 + ' Test set quality: ' + '-' * 20)
test_results = eval_conll(model, sess, test_tokens, test_tags, short_report=False)

-------------------- Train set quality: --------------------
processed 105778 tokens with 4489 phrases; found: 4608 phrases; correct: 4166.

precision:  90.41%; recall:  92.80%; F1:  91.59

	     company: precision:   92.60%; recall:   95.33%; F1:   93.95; predicted:   662

	    facility: precision:   87.12%; recall:   90.45%; F1:   88.75; predicted:   326

	     geo-loc: precision:   94.30%; recall:   97.99%; F1:   96.11; predicted:  1035

	       movie: precision:   63.86%; recall:   77.94%; F1:   70.20; predicted:    83

	 musicartist: precision:   80.48%; recall:   87.07%; F1:   83.64; predicted:   251

	       other: precision:   88.12%; recall:   89.17%; F1:   88.64; predicted:   766

	      person: precision:   94.97%; recall:   95.94%; F1:   95.45; predicted:   895

	     product: precision:   87.80%; recall:   92.77%; F1:   90.21; predicted:   336

	  sportsteam: precision:   88.63%; recall:   86.18%; F1:   87.38; predicted:   211

	      tvshow: precision:   72.09%; recall:  

# Conclusion

Read Above Results

# Future Work

Get GPUs from Negi Sir

Use GRUs instead of LSTMs and see the difference