# Vectors and Similarity

The process of converting or transforming a data set into a set of vectors is called __Vectorization__.<br>
We will explore several methods of vectorisation.

Objectives: 
- understanding the basic concepts of vectors in Linear Algebra 
- practicing implementation of vectorisation algorithms in Python programming

Tasks: 
1. Create a function for estimating the similarity between two vectors by means of _cosine similarity_ measure
2. Test the function by comparing variety of numeric test data
3. Test the function by comparing text data 
3. Apply the function in a Q&A (questions answering) application 

## 1. Cosine Similarity
See the figure below. Be sure to find out the meaning of a, b - __vectors__, and |a|, |b| - __magnitudes__.
Remember and apply what you already know from the Pythagoras theorem about sides of a triangle.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

We are searching for the size of the angle between the two vectors: <br>
__smaller angle__ means __closer vectors__  means __bigger similarity__

$${(cosine-similarity-coef) = 1-cos(angle)}$$

According the formula, we can calculate __cos(angle)__ as division of two vectors components - we call them __nominator__ (above the division line, __a.b__) and __denominator__ (below the division line, __||a||*||b||__).<br>

- The fist component requires _dot product_ of two vectors.
- The second one requres calculating magnitudes.

In the function, we calculate each component separately and then divide both.

In [1]:
%%writefile cosimfunc.py
# Calculate dot product of two vectors, divide it by the magnitudes to find the cos(angle between them)
import numpy as np
import math

# calculate coeficient of cosine similarity between two vectors
def cosim(vector1, vector2) -> float:
    
    # nominator as a dot product
    nominator = sum([i*j for (i, j) in zip(vector1, vector2)])
    
    # denominator as a product of two magnitudes 
    # call the second function below
    mag1 = magnitude(vector1)    
    mag2 = magnitude(vector2)
    denominator = mag1 * mag2
    
    # divide
    if not denominator:  # we cannot divide if it is null
         sim = 0.0
    else:
         sim = float(nominator)/denominator
    print('Cosine similarity: ', sim)
    return sim

# calculate one magnitude
def magnitude(v) -> float:
    # square() returns the element-wise square of the input
    # math.sqrt() returns sqrt of a number
    mag = math.sqrt(sum(np.square(v))) 
    return mag

Overwriting cosimfunc.py


## 2. Numeric Tests

In [1]:
import cosimfunc
from cosimfunc import cosim

In [2]:
import importlib 
importlib.reload(cosimfunc)

<module 'cosimfunc' from '/Users/tdi/Documents/Teaching/AI/2024/Code/NLP/cosimfunc.py'>

#### Test 1

In [3]:
# import two vectors
a = [1, 2, 3]
b = [1, 2, 3]

In [4]:
# calculate similarity
k = cosim(a, b)

Cosine similarity:  1.0


#### Test 2

In [5]:
# import two vectors
a = [1, 2, 3]
b = [-1, -2, -3]

In [6]:
# calculate similarity
k = cosim(a, b)

Cosine similarity:  -1.0


Make more tests here!

In [7]:
# import two vectors
a = [1, 2, 3]
b = [45, -20, 33]

In [8]:
# calculate similarity
k = cosim(a, b)

Cosine similarity:  0.46888727087265386


![image.png](attachment:image.png)