Cosine Similarity Function: A Python function calculates cosine similarity between two vectors, explaining its purpose and handling edge cases for robustness. An example demonstrates its usage.

Vector Operations: Fundamental NumPy operations for vectors are explored, including calculating magnitude (L2 norm), dot products, and scalar multiplication.

Cosine Similarity Comparison: Cosine similarity is calculated for student marks using both a step-by-step manual approach and optimized NumPy functions. Both methods yield equivalent results, highlighting NumPy's efficiency.



In [3]:
import numpy as np # Ensure numpy is imported if this cell is run independently

def cosine_similarity(a, b):
    """
    Calculates the cosine similarity between two vectors.

    Cosine similarity measures the cosine of the angle between two non-zero vectors.
    It is a measure of similarity between two non-zero vectors of an inner product space.
    The cosine of 0Â° is 1, and for any other angle it is less than 1.

    Args:
        a (list or np.array): The first vector.
        b (list or np.array): The second vector.

    Returns:
        float: The cosine similarity between vectors a and b.
    """
    # Calculate the dot product of the two vectors
    dot_product = np.dot(a, b)
    # Calculate the L2 norm (magnitude) of each vector
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)

    # Avoid division by zero if either norm is zero, which would mean a zero vector
    if norm_a == 0 or norm_b == 0:
        return 0.0 # Cosine similarity is undefined for zero vectors; returning 0.0 is a common convention

    # Calculate and return the cosine similarity
    return dot_product / (norm_a * norm_b)

# Example usage of the cosine_similarity function
vector1 = np.array([88, 90, 75, 78, 93])
vector2 = np.array([44, 27, 56, 19, 62])
similarity_score = cosine_similarity(vector1, vector2)
print(f"Cosine similarity between vector1 and vector2: {similarity_score:.4f}")

Cosine similarity between vector1 and vector2: 0.9326


In [None]:
import numpy as np

a = np.array([2, 3])
b = np.array([4, 1])

# magnitude
C = np.linalg.norm(a)
print(C)

# dot product (also manually)
sum(a[i] * b[i] for i in range(len(a)))

3.605551275463989


np.int64(11)

In [5]:
import numpy as np

# Define student marks as NumPy arrays for easier vector operations.
# The marks represent scores in subjects like history, geography, math, English, and arts.
student_1_marks = np.array([45, 60, 50, 55, 50])
student_2_marks = np.array([50, 70, 65, 40, 55])

print("--- Manual Calculation of Cosine Similarity ---")
# 1. Calculate the dot product manually
# The dot product is the sum of the products of the corresponding entries of the two sequences of numbers.
dot_product_manual = sum(student_1_marks[i] * student_2_marks[i] for i in range(len(student_1_marks)))
print(f"Manual Dot Product: {dot_product_manual}")

# 2. Calculate the magnitude (L2 norm) of each vector manually
# The magnitude of a vector is its length, calculated as the square root of the sum of the squares of its components.
magnitude_1_manual = np.linalg.norm(student_1_marks)
magnitude_2_manual = np.linalg.norm(student_2_marks)
print(f"Magnitude of Student 1 (Manual): {magnitude_1_manual:.4f}")
print(f"Magnitude of Student 2 (Manual): {magnitude_2_manual:.4f}")

# 3. Calculate the cosine similarity
# Cosine Similarity = (Dot Product) / (Magnitude A * Magnitude B)
cosine_similarity_manual = dot_product_manual / (magnitude_1_manual * magnitude_2_manual)
print(f"Manual Cosine Similarity: {cosine_similarity_manual:.4f}")

print("\n--- Optimized Calculation using NumPy Functions ---")
# NumPy provides highly optimized functions for vector operations, which are generally faster and more concise.

# Calculate the dot product using np.dot()
dot_product_numpy = np.dot(student_1_marks, student_2_marks)
print(f"NumPy Dot Product: {dot_product_numpy}")

# Calculate the magnitudes using np.linalg.norm()
magnitude_1_numpy = np.linalg.norm(student_1_marks)
magnitude_2_numpy = np.linalg.norm(student_2_marks)
print(f"Magnitude of Student 1 (NumPy): {magnitude_1_numpy:.4f}")
print(f"Magnitude of Student 2 (NumPy): {magnitude_2_numpy:.4f}")

# Calculate the cosine similarity using NumPy functions directly
cosine_similarity_numpy = dot_product_numpy / (magnitude_1_numpy * magnitude_2_numpy)
print(f"NumPy Cosine Similarity: {cosine_similarity_numpy:.4f}")

# Verify that both methods yield approximately the same result
print(f"\nAre manual and NumPy results approximately equal? {np.isclose(cosine_similarity_manual, cosine_similarity_numpy)}")

--- Manual Calculation of Cosine Similarity ---
Manual Dot Product: 14650
Magnitude of Student 1 (Manual): 116.8332
Magnitude of Student 2 (Manual): 127.4755
Manual Cosine Similarity: 0.9837

--- Optimized Calculation using NumPy Functions ---
NumPy Dot Product: 14650
Magnitude of Student 1 (NumPy): 116.8332
Magnitude of Student 2 (NumPy): 127.4755
NumPy Cosine Similarity: 0.9837

Are manual and NumPy results approximately equal? True


In [None]:
import numpy as np

A = np.array([34, 56])
B = np.array([1, -6])
sum(A[i] * B[i] for i in range(len(B)))
D = 3 * A
print(D)
for n in range(1, 26):
    print(A * n)

[102 168]
[34 56]
[ 68 112]
[102 168]
[136 224]
[170 280]
[204 336]
[238 392]
[272 448]
[306 504]
[340 560]
[374 616]
[408 672]
[442 728]
[476 784]
[510 840]
[544 896]
[578 952]
[ 612 1008]
[ 646 1064]
[ 680 1120]
[ 714 1176]
[ 748 1232]
[ 782 1288]
[ 816 1344]
[ 850 1400]
