
This notebook helps to explain and visualize the structure and relationships within the high-dimensional space of BERT word embeddings. By reducing the dimensions to 2D and 3D using t-SNE, we can see how different words cluster together or apart based on their semantic similarity, making it easier to understand the model's interpretation of word meanings. The table provides a detailed view of specific dimensions of the embeddings, allowing for closer inspection of the numerical values that represent word relationships in the BERT model. This comprehensive visualization aids in interpreting the performance and behavior of the BERT embeddings.

## Installing necessary libraries

In [None]:
!pip install jupyter numpy matplotlib seaborn scikit-learn transformers torch ipykernel matplotlib

# Importing necessary libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.manifold import TSNE
from transformers import AutoTokenizer, AutoModel
import torch
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation


## Load the tokenizer and model


This code loads a pre-trained BERT tokenizer and model from the 'bert-base-uncased' variant using the Hugging Face Transformers library. The tokenizer is used to convert text into tokens that the model can process, and the model is used to generate embeddings from these tokenized inputs.

In [None]:
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

## Wordlist

Define the list of words

In [None]:
words = ["king", "kitten", "men", "houses", "peach", "apple", "woman", "cat", "queen", "fox", "sleep", "lazy", "jumps", "dog", "banana", "fish", "water", "spider", "castle", "fly", "web"]

## tokenizes a list of words into tensors

This code tokenizes a list of words into tensors with padding and truncation as needed, using the BERT tokenizer. It then passes these tokenized inputs through the BERT model without updating model weights (no gradient calculation), and extracts the embeddings by averaging the last hidden state across the sequence dimension. Finally, it converts these averaged embeddings to a NumPy array.

In [None]:
inputs = tokenizer(words, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).numpy()

## Create table 

This code creates a table visualization of the first 7 dimensions of the embeddings, rounded to two decimal places, and labels each row with corresponding words and colors. The table is formatted with custom font size and scaling, and the text color of each word label is set to match the given color palette.

In [None]:
fig, ax_table = plt.subplots(figsize=(10, 4))
ax_table.axis('off')
table_data = np.round(embeddings[:, :7], 2)  # Example selection of 7 dimensions
table = ax_table.table(cellText=table_data, rowLabels=words, colLabels=[f'dim_{i+1}' for i in range(table_data.shape[1])], cellLoc='center', loc='center')
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.0, 1.2)
palette = sns.color_palette("husl", len(words))
for i, (word, color) in enumerate(zip(words, palette)):
    table[(i + 1, -1)].set_text_props(color=color, fontweight='bold')

# Create 2D plot 

This code performs a t-SNE dimensionality reduction on high-dimensional embeddings to create a 2D visualization, then plots these reduced embeddings with scatter points and labels for each word using a unique color from a color palette. It sets up a plot with labeled axes, a title, and a legend, providing a visual representation of BERT embeddings.

In [None]:
tsne = TSNE(n_components=2, perplexity=2, random_state=42)
embeddings_2d = tsne.fit_transform(embeddings)

fig, ax2d = plt.subplots(figsize=(10, 10))
palette = sns.color_palette("husl", len(words))
for i, (word, color) in enumerate(zip(words, palette)):
    ax2d.scatter(embeddings_2d[i, 0], embeddings_2d[i, 1], color=color, label=word)
    ax2d.annotate(word, (embeddings_2d[i, 0], embeddings_2d[i, 1]), fontsize=14, color=color)
    ax2d.set_title("2D Visualization of BERT Embeddings using t-SNE")
ax2d.set_xlabel("t-SNE component 1")
ax2d.set_ylabel("t-SNE component 2")
ax2d.legend(loc='best')
plt.tight_layout()

## Create 3D plot
This code performs t-SNE dimensionality reduction to convert high-dimensional embeddings into a 3D space, then plots these 3D embeddings with scatter points and text labels for each word using distinct colors. It sets up a 3D plot with labeled axes and a title, providing a visual representation of BERT embeddings. The plot is displayed with a legend indicating the words corresponding to each point.

In [None]:
tsne_3d = TSNE(n_components=3, perplexity=2, random_state=42)
embeddings_3d = tsne_3d.fit_transform(embeddings)

fig = plt.figure(figsize=(10, 10))
ax3d = fig.add_subplot(111, projection='3d')
for i, (word, color) in enumerate(zip(words, palette)):
    ax3d.scatter(embeddings_3d[i, 0], embeddings_3d[i, 1], embeddings_3d[i, 2], color=color, label=word)
    ax3d.text(embeddings_3d[i, 0], embeddings_3d[i, 1], embeddings_3d[i, 2], word, color=color)
ax3d.set_title("3D Visualization of BERT Embeddings using t-SNE")
ax3d.set_xlabel("t-SNE component 1")
ax3d.set_ylabel("t-SNE component 2")
ax3d.set_zlabel("t-SNE component 3")