# Lab: Experiment with Embeddings
## Purpose:
- Compute similarities between the embeddings in a Gemma model
- Experiment with embedding visualization techniques
- Refreshing my memory of matrix multiplication
- Fun with numpy

### Topics:
- Token Embeddings
- Cosine similarity
- Vectors
- Matrices
- NumPy

### Steps
* Load a part of the Gemma embedding matrix.
* Implement functions to extract embeddings from the embedding matrix and compute dot products.
* Implement a function that prints similarities for pairs of tokens.
* Visualize individual embedding dimensions.
* Experiment with dimensionality-reduction techniques such as t-SNE.

Date: 2026-02-20

Source: https://colab.research.google.com/github/google-deepmind/ai-foundations/blob/master/course_2/gdm_lab_2_5_experiment_with_embeddings.ipynb

References: https://github.com/google-deepmind/ai-foundations
- GDM GH repo used in AI training courses at the university & college level.

In [None]:
%%capture
# Install the custom package for this course.
!pip install "git+https://github.com/google-deepmind/ai-foundations.git@main"

import numpy as np # For working with vectors and matrices.
# For loading and projecting embeddings.
from ai_foundations import embeddings as emb
# For providing feedback.
from ai_foundations.feedback.course_2 import embeddings as emb_feedback

## Load Gemma embeddings for 24 tokens.

Gemma's tokenizer uses a vocabulary of more than 260,000 tokens, which means its embedding table also has entries for more than 260,000 tokens.
This lab works with 24 token embeddings to reduce the memory requirements and speed up computations.
```
The token labels are:
  king
  queen
  man
  woman
  apple
  etc.
```

In [None]:
# Load Gemma embeddings.
embeddings, labels = emb.load_gemma_embeddings("https://storage.googleapis.com/dm-educational/assets/ai_foundations/gemma_embeddings.npz")

print(f"The number of tokens are {len(labels)}.")
print(f"The token labels are:\n  {'\n  '.join(labels)}")

## Working with matrices and vectors using `numpy`

------
>The table that stores the embeddings is usually a matrix where each row is a vector that contains the embedding for one token.
>Machine learning models involve a lot of vector and matrix operations.
>

> Define a 3-dimensional vector `v` with the elements 1 2 3.
>
>v = np.array([1, 2, 3])
>
>
> Define a 2x3 dimensional matrix `M` (2 rows, 3 columns) with the following elements:
>
>   6 1 4
>
>   9 0 2
>
>M = np.array( [[ 6,  1,  4 ], [ 9,  0,  2 ]] )
>
>```

### 1: Define vectors and matrices
------
> Define the following vectors and matrices using `np.array`.
>
> $$\mathbf{a} = \begin{pmatrix} 7 \\ 3 \\ 1 \\ 4  \end{pmatrix} \ \ \  \ \ \ \ \mathbf{b} = \begin{pmatrix} 1.5 \\ -2.5 \end{pmatrix} \ \ \ \ \ \ \mathbf{c} = \begin{pmatrix} 4 \\ 4 \\ 4 \end{pmatrix}$$
>
> <br>
> $$P = \begin{pmatrix} 7 & 4\\ 3 & 5 \\ 1 & 6 \\ 4 & 7  \end{pmatrix} \ \ \  \ \ \ \ Q = \begin{pmatrix} 7 & 3 & 1 & 4 \\ 4 & 5 & 6 & 7 \end{pmatrix} \ \ \ \ \ \ R = \begin{pmatrix} 4 & 4 & 4 \end{pmatrix}$$
-----

In [None]:
a = np.array([7, 3, 1, 4])
b = np.array([1.5, -2.5])
c = np.array([4, 4, 4])

P = np.array([[7,4], [3,5], [1,6], [4,7]])
Q = np.array([[7, 3, 1, 4], [4, 5, 6, 7]])
R = np.array([[4, 4, 4]])

### 2: The shape of vectors and matrices

**Shape**: the dimension of a vector or a matrix.
- For vectors, it's the number of elements in the vector.
- For matrices it's the number of rows and columns in the matrix.

`numpy.shape` returns a tuple with the number of rows and number of columns (if it is a matrix).

In [None]:
# Print the shapes of the above vectors & matrices.
print(f"a = {a}")
print(f"Shape of a: {a.shape}")
print("-" * 20)

print(f"b = {b}")
print(f"Shape of b: {b.shape}")
print("-" * 20)

print(f"c = {c}")
print(f"Shape of c: {c.shape}")
print("-" * 20)

print(f"P =\n{P}")
print(f"Shape of P: {P.shape}")
print("-" * 20)

print(f"Q =\n{Q}")
print(f"Shape of Q: {Q.shape}")
print("-" * 20)

print(f"R =\n{R}")
print(f"Shape of R: {R.shape}")

In [None]:
# get the number of columns in the embedding matrix
embedding_dim = embeddings.shape[1]

### 3: Access rows and columns


```python
M[row_or_rows, column_or_columns]
```

Use `:` to get all rows/comuns.

Ex. all columns of the 3rd row

```python
M[2, :]
```

Ex. All rows of the 4th column of `M`:

```python
M[:, 3]
```

In [None]:
third_row = embeddings[2, :]
seventh_column = embeddings[:, 6]

print(f"Shape of third_row: {third_row.shape}")
print(f"Shape of seventh_column: {seventh_column.shape}")

### Dot product

The dot product between $K$-dimensional vectors $\mathbf{u} \in \mathbb{R}^K$ and $\mathbf{v} \in \mathbb{R}^K$ is defined as:

$$
\mathbf{u} \cdot \mathbf{v} = \sum_{k=1}^K u_k v_k
\;=\;
u_1 v_1 + u_2 v_2 + \cdots + u_K v_K $$


It is also sometimes written as $\mathbf{u}^T \mathbf{v}$. The superscript ${}^T$ indicates that the vector should be transposed, that is, a column vector should be transformed into a row vector.

In Python, you can compute the dot product between the vectors `u` and `v` using either the `np.dot` function, or the more general  `np.matmul` function that is used to multiply matrices:

```python
dot_product = np.dot(u, v)

dot_product = np.matmul(u.T, v)
```

Note that if you use matmul, you have to make sure that the second dimension of the first argument and the first dimension of the second argument agree. This may involve computing the transpose, which is done here using `u.T`.

In [None]:
# compute the dot product between the third and  fourth rows of embeddings

third_row = embeddings[2,:]         # row 3
fourth_row = embeddings[3,:]        # row 6

dot_product = np.matmul(third_row.T, fourth_row)

print(f"Dot product: {dot_product:.4f}")

# Dot product: 0.4967

The dot product indicates similarity of two vectors.
![](https://storage.googleapis.com/dm-educational/assets/ai_foundations/inner-products.png)

- Negative dot product: when $\mathbf{u}^T \mathbf{v} < 0$, the angle between them is greater than 90 degrees. The two vectors are pointing in opposite directions and this indicates a high level of dissimilarity.
- Zero dot product: when $\mathbf{u}^T \mathbf{v} = 0$, they are orthogonal and the angle between them is 90 degrees. Usually the embeddings are unrelated.
- Positive dot product: when $\mathbf{u}^T \mathbf{v} > 0$, the angle between them is less than 90 degrees. The vectors are pointing in a similar direction, meaning the embeddings are similar.

When you placed the embeddings for "apple" and "banana" on a 2D plane, you most likely placed them so that the angle between the two embeddings is small, and intuitively placed them so that $\mathbf{u}^T \mathbf{v} > 0$.

## Cosine similarity

The dot product indicates the similarity of two embeddings, but it can become very big or very small when a vector has many dimensions because you are summing over a lot of values. **Normalize** the similarities to make them less dependent on the specific values and the number of dimensions, such that it always returns a value between -1 and +1.

The **cosine similarity** does exactly that:

$$
\text{cosine}\ \bigl(\mathbf{u},\mathbf{v}\bigr)
\;=\;
\frac{\mathbf{u}\,\cdot\,\mathbf{v}}
     {\lVert \mathbf{u} \rVert \,\lVert \mathbf{v} \rVert}
$$

where $\mathbf{u}\cdot\mathbf{v}$ is the dot product of the two vectors, and ${\lVert \mathbf{u} \rVert \,\lVert \mathbf{v} \rVert}$
are the magnitudes (lengths) of the vectors $\mathbf{u}$ and $\mathbf{v}$, respectively.

Cosine similarity measures how similar vectors are by computing the dot product, scaled by their lengths. This captures the cosine of the angle between them rather than their magnitude.

The cosine similarity is +1 for identical directions, 0 for orthogonal vectors (e.g., embeddings of unrelated tokens), and -1 for opposite vectors (e.g., embeddings of strong antonyms).

In [None]:
# Compute the cosine similarity between two vectors

def cos_sim(u: np.ndarray, v: np.ndarray) -> float:
    """Computes the cosine similarity between two 1-D numpy arrays u and v.
    Args:
      u: A vector of dimension (k,).
      v: A vector of dimension (k,).
    Returns:
      The dot product between u and v.
    """

    dot_uv = np.matmul(u.T,v)

    # np.linalg.norm(u, 2) computes the length of the vector u (its L2-norm).
    len_u = np.linalg.norm(u, 2)
    len_v = np.linalg.norm(v, 2)

    # u . v / (||u|| * ||v||).
    cosine_sim = dot_uv / (len_u * len_v)

    # Turn 1x1 numpy array into a float.
    cosine_sim = cosine_sim.item()

    return cosine_sim

### 5: Access the embedding for a token
Before you can compute the cosine similarity between two token embeddings, you need to write a function that returns the embedding for a specific token, e.g., "apple".

For this you need to determine the index of the token in the embedding matrix. The list labels that was loaded at the top of this lab contains all tokens with corresponding embeddings in embeddings. The embedding of the first element in the list is the first row of embeddings, the embedding of the second element in the list is the second embedding, etc.

To determine the index of the embedding, you can use the .index method of the list. For example, labels.index("apple") returns the index of the row of the embedding for "apple".

In [None]:
def get_embedding(
    token: str, embeddings: np.ndarray = embeddings, labels: list[str] = labels
) -> np.ndarray:
    """Returns the embedding for `token` from `embeddings`.

    Args:
      token: The token for which the embedding should be retrieved.
      embeddings: The embedding matrix with embeddings for all tokens in
        `labels`.
      labels: The list of tokens indicating the order of embeddings in
        `embeddings`.

    Returns:
      The token embedding (a vector) for `token`.

    Raises:
      ValueError if no embeddings for `token` exists.
    """

    if token not in labels:
        raise ValueError(f"No embeddings for {token} exist.")

    token_idx =  labels.index(token)
    embedding =  embeddings[token_idx]

    return embedding


###  6: Compute the cosine similarity
Use the implementation of cos_sim to define a function that prints the similarity between the embeddings of two tokens.

In [None]:
def print_similarity(
    token1: str,
    token2: str,
    embeddings: np.ndarray = embeddings,
    labels: list[str] = labels,
) -> float:
    """
    Computes and prints the cosine similarity between the embeddings of `token1`
      and `token2`.

    Args:
      token1: The first token for the similarity computation.
      token2: The second token for the similarity computation.
      embeddings: The embedding matrix with embeddings for `token1` and
        `token2`.
      labels: The list of tokens indicating the order of embeddings in
        `embeddings`.

    Returns:
      The cosine similarity between `token1` and `token2`.

    Raises:
      ValueError if no embedding for `token1` or `token2` exists.

    """

    embedding1 = get_embedding(token1, embeddings, labels)
    embedding2 = get_embedding(token2, embeddings, labels)

    similarity = cos_sim(embedding1, embedding2)
    print(
        f'Cosine similarity between "{token1}" and "{token2}" '
        f'\t= {similarity:.2f}'
    )
    return similarity

In [None]:
print_similarity("king", "king")
print_similarity("king", "queen")
print_similarity("queen", "king")
print_similarity("joy", "happy")
print_similarity("good", "bad")
print_similarity("sad", "happy")
print_similarity("king", "bus")
print_similarity("car", "banana")
print()

Expected output
```
Cosine similarity between "king" and "king" 	= 1.00
Cosine similarity between "king" and "queen" 	= 0.42
Cosine similarity between "queen" and "king" 	= 0.42
Cosine similarity between "joy" and "happy" 	= 0.29
Cosine similarity between "good" and "bad" 	= 0.42
Cosine similarity between "sad" and "happy" 	= 0.22
Cosine similarity between "king" and "bus" 	= 0.04
Cosine similarity between "car" and "banana" 	= 0.07
```

### Visualizing high-dimensional embeddings


In [None]:
# @title Plot individual dimensions
# Adjust the numeric values to choose one of 1151 possible dimensions (It doesn't work well)
dimension_1 = 0  #@param {type: 'slider', min:0, max:1151}
dimension_2 = 1151  #@param {type: 'slider', min:0, max:1151}
emb.plot_embeddings_dimensions(embeddings,
                           labels,
                           dim_x=dimension_1,
                           dim_y=dimension_2)

### Visualizing  embeddings with t-SNE

Meaningful word relationships typically arise from combinations of many embedding dimensions.

Dimensionality reduction techniques make high-dimensional embedding space more interpretable. They try to compress the embedding space into fewer dimensions with the goal of capturing as much information as possible from the original embedding space.

[t-SNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) has been particularly well-suited for projecting embeddings for visualizations. t-SNE is a visualization method that **preserves the pairwise similarities** between data points in a lower-dimensional space. t-SNE provides an "at a glance" map of neighborhoods hidden in the high-dimensional space, keeping local distances faithful, so clusters that are closely related pop out as tight clouds that are easy to label and debug.

The 2D points can be used to generate a visualization of how the words are distributed in the high-dimensional space. Data points close together in the high-dimensional space will appear closer together in 2D using t-SNE.

In [None]:
# This makes much more sense than the last one.

emb.plot_embeddings_tsne(embeddings, labels)