# **Numeric Vector Similarity**

In [23]:
import numpy as np
from scipy.stats import pearsonr
import gensim.downloader as downloader

- **Cosine similarity** : is a measure of similarity between two non-zero vectors in an inner product space **(Higher is better)**
- **Euclidean Distance** : the straight-line distance between two points in Euclidean space **(Lower is better)**
- **Manhattan Distance** : a way to calculate the distance between two points in a grid-like path **(Lower is better)**
- **Pearson Correlation** : is a statistical measure that quantifies the linear relationship between two continuous variables **(Higher is better)**

In [13]:
hassan = np.array([9, 8, 7, 6, 7, 8, 6]) 
red_bull = np.array([10, 9, 6, 7, 6, 9, 5] )
ferrari = np.array([9, 7, 6, 6, 7, 7, 5]) 
mercedes = np.array([8, 6, 8, 9, 9, 5, 9])

In [14]:
def cosine_similarity(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

In [15]:
print("Cosine Similarity:")
print(f"Red Bull: {cosine_similarity(hassan, red_bull):.4f}")
print(f"Ferrari: {cosine_similarity(hassan, ferrari):.4f}")
print(f"Mercedes: {cosine_similarity(hassan, mercedes):.4f}")

Cosine Similarity:
Red Bull: 0.9918
Ferrari: 0.9973
Mercedes: 0.9564


In [16]:
def euclidean_dist(v1, v2):
    return np.linalg.norm(v1 - v2)

In [17]:
print("Euclidean Distance:")
print(f"Red Bull: {euclidean_dist(hassan, red_bull):.4f}")
print(f"Ferrari: {euclidean_dist(hassan, ferrari):.4f}")
print(f"Mercedes: {euclidean_dist(hassan, mercedes):.4f}")

Euclidean Distance:
Red Bull: 2.6458
Ferrari: 2.0000
Mercedes: 6.0828


In [18]:
print("\nPearson Correlation:")
print(f"Red Bull: {pearsonr(hassan, red_bull)[0]:.4f}")
print(f"Ferrari: {pearsonr(hassan, ferrari)[0]:.4f}")
print(f"Mercedes: {pearsonr(hassan, mercedes)[0]:.4f}")


Pearson Correlation:
Red Bull: 0.8773
Ferrari: 0.9047
Mercedes: -0.6005


In [19]:
def manhattan_dist(v1, v2):
    return np.sum(np.abs(v1 - v2))

In [21]:
print("\nManhattan Distance:")
print(f"Red Bull: {manhattan_dist(hassan, red_bull):.4f}")
print(f"Ferrari: {manhattan_dist(hassan, ferrari):.4f}")
print(f"Mercedes: {manhattan_dist(hassan, mercedes):.4f}")


Manhattan Distance:
Red Bull: 7.0000
Ferrari: 4.0000
Mercedes: 15.0000


- **Based on the numerical analysis, Ferrari is the best team for Hassan.**

# **Word Embedding Analysis**

In [25]:
# Load pre-trained GloVe model
glove = downloader.load('glove-wiki-gigaword-100')



In [26]:
hassan_traits = [
    "speed", "aggression", "adaptability", "technical skill",
    "teamwork", "risk-taking", "consistency"
    ]

red_bull_traits = [
    "speed", "aggression", "adaptability", "technical skill",
    "teamwork", "risk-taking", "inconsistency"
    ]

ferrari_traits = [
    "passion", "emotion", "adaptability", "technical skill",
    "teamwork", "risk-taking", "inconsistency"
    ]

mercedes_traits = [
    "precision", "discipline", "adaptability", "technical skill",
    "teamwork", "control", "consistency"
    ]

In [29]:
def get_embeddings(word):
    try:
        return glove[word]
    except KeyError:
        return glove['unknown']

def semantic_similarity(v1, v2):
    vec1 = [get_embeddings(word) for word in v1]
    vec2 = [get_embeddings(word) for word in v2]
    similarities = [cosine_similarity(w1,w2) for w1,w2 in zip(vec1,vec2)]
    return np.mean(similarities)

In [30]:
print("\nSemantic Similarity:")
print(f"Red Bull: {semantic_similarity(hassan_traits, red_bull_traits):.4f}")
print(f"Ferrari: {semantic_similarity(hassan_traits, ferrari_traits):.4f}")
print(f"Mercedes: {semantic_similarity(hassan_traits, mercedes_traits):.4f}")


Semantic Similarity:
Red Bull: 0.9380
Ferrari: 0.7442
Mercedes: 0.6635


- **Based on Embeddings the Red Bull is the Best**

### **This difference occurs because**
- **The numerical approach focuses on quantitative alignment of scores**
- **The semantic approach captures deeper meaning relationships between traits**