# 0. About

This notebook is to explore the `cosine_similarity` in sklearn.

In [1]:
import numpy as np
from matplotlib import pyplot as plt

from sklearn.metrics.pairwise import cosine_similarity

rs_num = 10
rng = np.random.default_rng(rs_num)

# 1. Dataset Construction

In [2]:
# the dimensionality for both matrices
dim_num = 3 

# sample numbers
A_sample_num = 2
B_sample_num = 3

# construct matrices
A = rng.normal(0.0, 1.0, (A_sample_num, dim_num))
B = rng.normal(0.0, 1.0, (B_sample_num, dim_num))

# 2. Compute Cosine Similarity

For both inputs, the shape should be `(n_samples, n_features)`, where the output kernel matrix would have shape `(n_samples_A, n_samples_B)`

In [3]:
print('A shape:', A.shape)
print('B shape:', B.shape)

A shape: (2, 3)
B shape: (3, 3)


In [4]:
print('cosine_similarity: ')
print(cosine_similarity(A, B))

cosine_similarity: 
[[-0.9694305   0.91431303  0.74275982]
 [ 0.14401097 -0.07722639 -0.50670406]]


In [5]:
for A_sample_i in range(A_sample_num):
    for B_sample_i in range(B_sample_num):
        print(
            'For No.{} sample from A and No.{} sample from B, the cosine similarity is:'.format(
                A_sample_i+1, B_sample_i+1
            )
        )
        print(
            '  ', 
            cosine_similarity(
                A[A_sample_i, :].reshape(1, -1), 
                B[B_sample_i, :].reshape(1, -1)
            )[0][0]
        )

For No.1 sample from A and No.1 sample from B, the cosine similarity is:
   -0.9694304974733258
For No.1 sample from A and No.2 sample from B, the cosine similarity is:
   0.9143130302775656
For No.1 sample from A and No.3 sample from B, the cosine similarity is:
   0.7427598240016121
For No.2 sample from A and No.1 sample from B, the cosine similarity is:
   0.14401097449553169
For No.2 sample from A and No.2 sample from B, the cosine similarity is:
   -0.077226387303547
For No.2 sample from A and No.3 sample from B, the cosine similarity is:
   -0.5067040646651788
