# Train and visualize a model in Tensorflow - Part 5: Document embeddings

How can you know how good your representation of data is? How well it corresponds to the true classes you are trying to predict?

Tensorboard has a (very popular) embeddings tab to proyect and visualize any matrix in 2D or 3D. It also allows to add metadata and plot the points with the additional information. In this notebook we show how to load the dataset instances and labels, and store it in the expected format for Tensorboard.

In [None]:
import numpy as np
import os
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector

First we load the dataset as before:

In [None]:
newsgroups = np.load('./resources/newsgroup.npz')

Once we have the dataset, we have to create a tensorflow Variable to be written. We use the function `tf.get_variable` specifying the name of the new variable. Why don't we use the `tf.Variable` class directly? In this case we don't have a complex graph, but for real models the `tf.get_variable` avoids name collitions and allows the program to access and reuse any variable in the graph.

To give a value to the variable, we first transform the dataset into a Tensor with `tf.constant`, and then pass it as an initializer to the `tf.get_variable` function. We can't directly use the constant to log the embeddings, as the summary operations only work with variables.

In [None]:
tf.reset_default_graph()
sess = tf.InteractiveSession()
embedding_var = tf.get_variable('document_embeddings', dtype=tf.float32,
                                initializer=tf.constant(newsgroups['train_data'], dtype=tf.float32))
sess.run(tf.global_variables_initializer())

In the next cell we write the labels as metadata for the embedding points. The metadata has to be stored in a .tsv file that can have multiple columns. In our case, we only have the label to store, so we create a file with a single value per row.

In [None]:
# Add the metadata
embeddings_dir = os.path.join('20news_mlp_summaries', 'embeddings')
try:
    os.makedirs(embeddings_dir)
except FileExistsError:
    pass
metadata_filename = os.path.join(embeddings_dir, 'metadata.tsv')
with open(metadata_filename, 'w') as f:
    for label in newsgroups['train_target']:
        f.write(str(label) + '\n')

The last thing to do is to write the embeddings with a `tf.summary.FileWriter` instance and add the metadata to the embedding proyector. It is important to use a different directory, as this operation will rewrite the model graph.

In [None]:
summary_writer = tf.summary.FileWriter(embeddings_dir, sess.graph)
config = projector.ProjectorConfig()
embedding_conf = config.embeddings.add()
embedding_conf.tensor_name = embedding_var.name
embedding_conf.metadata_path = metadata_filename
projector.visualize_embeddings(summary_writer, config)

# save the model
saver = tf.train.Saver()
saver.save(sess, os.path.join(embeddings_dir, 'embedding_model.ckpt'))