# Pairwise cosine similarity of a tensor

How can we easily compute the pairwise cosine similarity between every pair of
elements in a batch of vectors?

The input is a 2-dimensional tensor, with the first dimension being the batch
element (i.e. element index), and the second dimension being the vector (i.e.
the value).

The original idea comes from this post https://github.com/pytorch/pytorch/issues/48306
This notebook is merely trying to disect the terse notation and explain what's
going on under the hood.

Terse notation:

```
cosine_similarity(x[:,:,None], x.t()[None,:,:])
```

In this notebook, we use a slightly different notation, which is the following:

```
cosine_similarity(x[None,:,:], x[:,None,:], dim=-1)
```

In [67]:
import torch
from torch import nn
from torch.nn import functional as F

In [68]:
_ = torch.manual_seed(21)

## Simple usage of cosine_similarity()

In [69]:
x, y = torch.randn(10, 4, 5), torch.randn(10, 4, 5)
print(F.cosine_similarity(x, y, dim=2).shape)

torch.Size([10, 4])


## Unsqueeze to add a dimension

In [70]:
x = torch.randn(3)
# Indexing with None does the same thing as unsqueezing the tensor
# at that dimension. After this indexing operation, the tensors
# x_row_dup and x_col_dup will have 1 additional dimension at
# dimensions 0 and 1 respectively.
x_row_dup, x_col_dup = x[None,:], x[:,None]

print(x, x.shape)
print(x_row_dup, x_row_dup.shape)
print(x_col_dup, x_col_dup.shape)

tensor([-1.2756,  1.1559, -0.0660]) torch.Size([3])
tensor([[-1.2756,  1.1559, -0.0660]]) torch.Size([1, 3])
tensor([[-1.2756],
        [ 1.1559],
        [-0.0660]]) torch.Size([3, 1])


## Expand the tensor at the newly added dimensions

In [71]:
x_row_dup, x_col_dup = x_row_dup.expand(3, 3), x_col_dup.expand(3, 3)

print("x stretched across rows")
print("-----------------------")
print(x_row_dup, x_row_dup.shape)
print("")
print("x stretched across columns")
print("--------------------------")
print(x_col_dup, x_col_dup.shape)

x stretched across rows
-----------------------
tensor([[-1.2756,  1.1559, -0.0660],
        [-1.2756,  1.1559, -0.0660],
        [-1.2756,  1.1559, -0.0660]]) torch.Size([3, 3])

x stretched across columns
--------------------------
tensor([[-1.2756, -1.2756, -1.2756],
        [ 1.1559,  1.1559,  1.1559],
        [-0.0660, -0.0660, -0.0660]]) torch.Size([3, 3])


## Compute pairwise cosine similarity

In [72]:
# Add a dummy dimension at the end so that we can perform cosine
# similarity on that last dimension.
x_row_dup = x_row_dup.reshape(3, 3, 1)
x_col_dup = x_col_dup.reshape(3, 3, 1)
x_cosine_similarity = F.cosine_similarity(x_row_dup, x_col_dup, dim=-1)
print(x_cosine_similarity)

tensor([[ 1., -1.,  1.],
        [-1.,  1., -1.],
        [ 1., -1.,  1.]])


Turns out that the cosine similarity of 2 vectors of size 1 is always 1.

That's because a single element vector has an angle of 0 degree and no rotation
of vectors is possible in a 1 dimensional space.

Hence 2 vectors with different magnitudes have the same angle between them.

Let's try the same thing with a vector of size 2 instead of 1.


## Cosine similarity of a batch of vectors

Our batch has 3 vectors, each of size 2.

In [73]:
def str_pair(x):
    """Returns a readable (converted to string) tuple of integers
    from a tensor 'x' with exactly 2 elements.
    """
    s0, s1 = "" if x[0] < 0 else " ", "" if x[1] < 0 else " "
    return f"({s0}{x[0]:.4f},{s1}{x[1]:.4f})"

def display_3d_tensor(x):
    """Displays a 3d tensor with the last dimension having value 2 as
    a 2d matix of tuples for easy visualization.
    """
    assert len(x.size()) == 3
    assert x.size(2) == 2
    for i in range(x.size(0)):
        elements = []
        for j in range(x.size(1)):
            elements.append(str_pair(x[i][j].tolist()))
        # end for j
        print(f"[{', '.join(elements)}]")
    # end for i
# end def


x = torch.randn(3, 2)
x_row_dup, x_col_dup = x[None,:,:], x[:,None,:]

print(x, x.shape)
print(x_row_dup, x_row_dup.shape)
print(x_col_dup, x_col_dup.shape)

# Let's pretty print these tensors. Each element of the 2d matrix is a
# tensor with 2 elements (a pair of values).

print("\nx with unsqueezed row dimension")
display_3d_tensor(x_row_dup)
print("\nx with unsqueezed column dimension")
display_3d_tensor(x_col_dup)

tensor([[1.8099, 1.0222],
        [0.6217, 0.6769],
        [0.9647, 0.8141]]) torch.Size([3, 2])
tensor([[[1.8099, 1.0222],
         [0.6217, 0.6769],
         [0.9647, 0.8141]]]) torch.Size([1, 3, 2])
tensor([[[1.8099, 1.0222]],

        [[0.6217, 0.6769]],

        [[0.9647, 0.8141]]]) torch.Size([3, 1, 2])

x with unsqueezed row dimension
[( 1.8099, 1.0222), ( 0.6217, 0.6769), ( 0.9647, 0.8141)]

x with unsqueezed column dimension
[( 1.8099, 1.0222)]
[( 0.6217, 0.6769)]
[( 0.9647, 0.8141)]


## Stretch the tensors along row and column using .expand(...)

In [74]:
# Now, lets expand the tensors in-place to stretch them across rows and columns.
x_row_dup, x_col_dup = x_row_dup.expand(3, 3, 2), x_col_dup.expand(3, 3, 2)

# and pretty-print them so that the 2d-matrix of a tuple (pair) of elements
# representation becomes clear.
print("x stretched across rows")
print("-----------------------")
display_3d_tensor(x_row_dup)
print("")
print("x stretched across columns")
print("--------------------------")
display_3d_tensor(x_col_dup)

x stretched across rows
-----------------------
[( 1.8099, 1.0222), ( 0.6217, 0.6769), ( 0.9647, 0.8141)]
[( 1.8099, 1.0222), ( 0.6217, 0.6769), ( 0.9647, 0.8141)]
[( 1.8099, 1.0222), ( 0.6217, 0.6769), ( 0.9647, 0.8141)]

x stretched across columns
--------------------------
[( 1.8099, 1.0222), ( 1.8099, 1.0222), ( 1.8099, 1.0222)]
[( 0.6217, 0.6769), ( 0.6217, 0.6769), ( 0.6217, 0.6769)]
[( 0.9647, 0.8141), ( 0.9647, 0.8141), ( 0.9647, 0.8141)]


## Pairwise cosine similarity on expanded tensors

In [75]:
# Since each value of the 2d matrix is already
# multi-dimensional, we don't need to add a dummy
# dimension at the end of the tensors.
x_cosine_similarity = F.cosine_similarity(x_row_dup, x_col_dup, dim=-1)
print(x_cosine_similarity)

tensor([[1.0000, 0.9512, 0.9826],
        [0.9512, 1.0000, 0.9920],
        [0.9826, 0.9920, 1.0000]])


## Pairwise cosine similarity using broadcasting

In [76]:
# Now that we know how to compute the self-cosine similarity of every
# vector in an array of vectors, let's do the same thing using
# PyTorch's broadcasting semantics (i.e. without an explicit call
# to .expand(...)).
x_row_dup, x_col_dup = x[None,:,:], x[:,None,:]
x_cosine_similarity = F.cosine_similarity(x_row_dup, x_col_dup, dim=-1)

# This should print the same matrix as above.
print(x_cosine_similarity)

tensor([[1.0000, 0.9512, 0.9826],
        [0.9512, 1.0000, 0.9920],
        [0.9826, 0.9920, 1.0000]])


## Links

* https://stackoverflow.com/questions/69797614/indexing-a-tensor-with-none-in-pytorch
* https://discuss.pytorch.org/t/pairwise-cosine-distance/30961
* https://github.com/pytorch/pytorch/issues/48306
