<a href="https://colab.research.google.com/github/c-w-m/pnlp/blob/master/Ch03/10_Visualizing_Embeddings_using_Tensorboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 3.10: Visualizing Embeddings Using __Tensorboard__

In this notebook we will demonstrate how you can use Tensorboard to visualize word embeddings which we created in the Training_embeddings_using_gensim.ipynb notebook

In [1]:
#installing the required libraries
!pip install tensorflow==1.14.0

Collecting tensorflow==1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/f4/28/96efba1a516cdacc2e2d6d081f699c001d414cc8ca3250e6d59ae657eb2b/tensorflow-1.14.0-cp37-cp37m-manylinux1_x86_64.whl (109.3MB)
[K     |████████████████████████████████| 109.3MB 92kB/s 
Collecting tensorboard<1.15.0,>=1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)
[K     |████████████████████████████████| 3.2MB 43.3MB/s 
Collecting keras-applications>=1.0.6
[?25l  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
[K     |████████████████████████████████| 51kB 4.4MB/s 
[?25hCollecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed051

In [2]:
#making the required imports
import warnings #ignoring the generated warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector
tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np
from gensim.models import KeyedVectors
import os

In [3]:
# FOR GOOGLE COLAB USERS
# upload the "word2vec.bin" file in form the repository which can be found in the same folder as this notebook.
isCoLab = True
try:
    from google.colab import files
    uploaded = files.upload()
except ModuleNotFoundError:
    print("Not using colab")
    isCoLab = False

Saving word2vec_cbow.bin to word2vec_cbow.bin


In [4]:
# load model
if isCoLab:
    model = KeyedVectors.load_word2vec_format(r'word2vec_cbow.bin', binary=True)
else:
    model = KeyedVectors.load_word2vec_format(r'Models\word2vec_cbow.bin', binary=True)

In [5]:
#get the model's vocabulary size
max_size = len(model.wv.vocab)-1

In [6]:
#make a numpy array of 0s with the size of the vocabulary and dimensions of our model
w2v = np.zeros((max_size,model.wv.vector_size))

In [7]:
#Now we create a new file called metadata.tsv where we save all the words in our model 
#we also store the embedding of each word in the w2v matrix
if not os.path.exists('projections'):
    os.makedirs('projections')
    
with open("projections/metadata.tsv", 'w+',encoding="utf-8") as file_metadata: #changed    added encoding="utf-8"
    
    for i, word in enumerate(model.wv.index2word[:max_size]):
        
        #store the embeddings of the word
        w2v[i] = model.wv[word]
        
        #write the word to a file 
        file_metadata.write(word + '\n')

In [8]:
#initializing tf session
sess = tf.InteractiveSession()

In [9]:
#Initialize the tensorflow variable called embeddings that holds the word embeddings:
with tf.device("/cpu:0"):
    embedding = tf.Variable(w2v, trainable=False, name='embedding')

In [10]:
#Initialize all variables
tf.global_variables_initializer().run()

In [11]:
#object of the saver class which is actually used for saving and restoring variables to and from our checkpoints
saver = tf.train.Saver()

In [12]:
#with FileWriter,we save summary and events to the event file
writer = tf.summary.FileWriter('projections', sess.graph)

In [13]:
# Initialize the projectors and add the embeddings
config = projector.ProjectorConfig()
embed= config.embeddings.add()

In [14]:
#specify our tensor_name as embedding and metadata_path to the metadata.tsv file
embed.tensor_name = 'embedding'
embed.metadata_path = 'metadata.tsv'

In [15]:
#save the model
projector.visualize_embeddings(writer, config)

saver.save(sess, 'projections/model.ckpt', global_step=max_size)

'projections/model.ckpt-161017'

Open a terminal window and type the following command

tensorboard --logdir=projections --port=8000

If the tensorboard does not work for you try providing the absolute path for projections and re-run the above command

If youve done everything right until you will get a link in your terminal through which you can access the tensorboard. Click on the link or copy paste it in your browser. You should see something similar to this.
![TensorBoard-1](https://github.com/c-w-m/pnlp/blob/master/Ch03/Images/TensorBoard-1.png?raw=1)
<br>
In the top right corner near "INACTIVE" click the dropdown arrow. And select PROJECTIONS from te dropdown menu
![TensorBoard-2](https://github.com/c-w-m/pnlp/blob/master/Ch03/Images/TensorBoard-2.png?raw=1)
<br>
Wait for a few seconds for it to load. You can now see your embeddings there are a lot of setting you can play around and experiment with.
![TensorBoard-3](https://github.com/c-w-m/pnlp/blob/master/Ch03/Images/TensorBoard-3.png?raw=1)
<br>
Output when we search for a specific word in this case "human" and isolate only those points
![TensorBoard-4](https://github.com/c-w-m/pnlp/blob/master/Ch03/Images/TensorBoard-4.png?raw=1)