# Visualising Embeddings: Paris, Berlin, Germany, France.

In the blog post I mention the relationship between the embeddings of Paris, Berlin, Germany, and France. I wanted to demonstrate this visually using Python. 

In [1]:
import json
import boto3

client = boto3.client(service_name="bedrock-runtime")
raw_texts = ['paris', 'berlin', 'germany', 'france']
model_id = 'cohere.embed-english-v3',
input_type = 'clustering'

response = client.invoke_model(modelId='cohere.embed-english-v3', body=json.dumps({ "texts": raw_texts, "input_type": input_type}))

embeddings = json.loads(response["body"].read())['embeddings']

Once we have these embeddings, we want to reduce their dimensionality down to 2, so that we can visualise them in a 2d space. We do this by using UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction). This is a dimension reduction technique, that can take a highly dimensional set of vectors, and reduce them down while retaining some of the structural similiarities. 

In [None]:
import umap.umap_ as umap
import numpy as np

matrix =  np.array(embeddings)
umap_instance = umap.UMAP(n_components=2, random_state=42)
vis_dims = umap_instance.fit_transform(matrix)
vis_dims.shape

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

def drawCountryCapitalDiagram(): 
    colors = ["red", "red", "gold", "gold"]
    topic_to_color = dict(zip(raw_texts, colors))

    x = [x for x,_ in vis_dims]
    y = [y for _,y in vis_dims]

    fig, ax = plt.subplots()

    ax.scatter(x, y, color=[topic_to_color[i] for i in topic_to_color], alpha=0.5)

    for idx, arr in enumerate(vis_dims):
        x,y = vis_dims[idx]
        ax.annotate(raw_texts[idx], (x, y + .03), size=8)

        handles = [mpatches.Patch(color = 'gold', label = 'country'), mpatches.Patch(color = 'red', label = 'capital')]
        legend = ax.legend(handles=handles, prop={'size': 8})
    return fig, ax

fig, ax = drawCountryCapitalDiagram()

We can already see the similarities between the countries, and their capitals here. The distance and direction between the Country and it's capital is similar between France and Paris, and Germany and Berlin. 

In [None]:
fig, ax = drawCountryCapitalDiagram()
(paris, berlin, germany, france) = vis_dims
arr1 = ax.arrow(france[0], france[1], paris[0] - france[0], paris[1] - france[1])
arr2 = ax.arrow(germany[0], germany[1], berlin[0] - germany[0], berlin[1] - germany[1])


So now we will demonstrate some of the properties of this by demonstrating the operation `paris - france + germany` We should see the output as close to Berlin. 

This is because the embeddings capture the relationships between the words 'Paris' and 'France,' and 'Germany' and 'Berlin' – specifically, that Paris and Berlin are both capital cities of their respective countries. When we subtract 'France' from 'Paris,' we are essentially removing the country semantics, and the resulting vector represents the concept of 'capital city.' By then adding 'Germany' to this vector we are left with something closely resembling 'Berlin,' the capital of Germany.

In [None]:
# Here we operate on the 2D embeddings that we produced earlier
fig, ax = drawCountryCapitalDiagram()

operated_x = paris[0] - france[0] + germany[0];
operated_y = paris[1] - france[1] + germany[1];


paris_min_france = paris - (france-paris)
paris_min_france_plus_germany = paris_min_france + (germany - paris)

ax.scatter(paris_min_france[0], paris_min_france[1], marker='x', color='red', )
ax.scatter(paris_min_france_plus_germany[0], paris_min_france_plus_germany[1], marker='x', color='red', )
ax.annotate('paris - france', (paris_min_france[0], paris_min_france[1] + .03), size=8)
ax.annotate('paris - france + germany', (paris_min_france_plus_germany[0], paris_min_france_plus_germany[1] - .03), size=8)

paris_france_arrow = ax.arrow(paris[0], paris[1], paris[0] - france[0], paris[1] - france[1])
paris_france_germany_arrow = ax.arrow(paris_min_france[0], paris_min_france[1], germany[0] - paris[0], germany[1] - paris[1])