###1. You are working as a data analyst in an e-commerce company. The company wants to compare customer preferences to recommend products. Each customer’s preferences are represented as a vector of features (e.g., ratings of different product categories).

####Customer ratings for 5 product categories (scale 1-5)
####customer_A = [4, 5, 2, 3, 4]
####customer_B = [5, 3, 2, 4, 5]

#### Binary preferences (1 = liked, 0 = disliked)
####customer_A_binary = [1, 0, 1, 1, 0, 1]
####customer_B_binary = [1, 1, 1, 0, 0, 1]
To measure how similar or different two customers are, write a Python program that
computes the following measures between their preference vectors:
1. Euclidean Distance – to measure overall difference in ratings.
2. Manhattan Distance – to measure absolute deviation in preferences.
3. Cosine Similarity – to check alignment of customer interests regardless of
magnitude.
4. Hamming Distance – if preferences are represented in binary (liked/disliked).
5. Jaccard Similarity – to measure similarity in items both customers liked.


In [2]:
import numpy as np
from scipy.spatial import distance

# Customer ratings for 5 product categories
customer_A = np.array([4, 5, 2, 3, 4])
customer_B = np.array([5, 3, 2, 4, 5])

# Binary preferences
customer_A_binary = np.array([1, 0, 1, 1, 0, 1])
customer_B_binary = np.array([1, 1, 1, 0, 0, 1])

# 1. Euclidean Distance
euclidean_dist = np.linalg.norm(customer_A - customer_B)

# 2. Manhattan Distance
manhattan_dist = np.sum(np.abs(customer_A - customer_B))

# 3. Cosine Similarity
cosine_sim = 1 - distance.cosine(customer_A, customer_B)

# 4. Hamming Distance (for binary preferences)
hamming_dist = distance.hamming(customer_A_binary, customer_B_binary) * len(customer_A_binary)

# 5. Jaccard Similarity (for binary preferences)
intersection = np.sum(np.logical_and(customer_A_binary, customer_B_binary))
union = np.sum(np.logical_or(customer_A_binary, customer_B_binary))
jaccard_sim = intersection / union


print(f"Euclidean Distance: {euclidean_dist:.4f}")
print(f"Manhattan Distance: {manhattan_dist}")
print(f"Cosine Similarity: {cosine_sim:.4f}")
print(f"Hamming Distance: {hamming_dist}")
print(f"Jaccard Similarity: {jaccard_sim:.4f}")


Euclidean Distance: 2.6458
Manhattan Distance: 5
Cosine Similarity: 0.9548
Hamming Distance: 2.0
Jaccard Similarity: 0.6000


###2. An online movie platform wants to analyze how similar two users are based on their movie ratings. Each user rates movies on a scale of 1 to 5.

### Write a Python program to compute the following between two users:
1. Chebyshev Distance – to measure the maximum difference in their ratings.
2. Minkowski Distance (with p=3) – a generalized distance measure.

### Use the following data:
### Movie ratings (scale 1-5)
### user1 = [5, 3, 4, 4, 2]
### user2 = [4, 2, 5, 4, 3]

In [4]:
import numpy as np
from scipy.spatial import distance

# Movie ratings (scale 1-5)
user1 = np.array([5, 3, 4, 4, 2])
user2 = np.array([4, 2, 5, 4, 3])

# 1. Chebyshev Distance
chebyshev_dist = distance.chebyshev(user1, user2)

# 2. Minkowski Distance (p=3)
minkowski_dist = distance.minkowski(user1, user2, 3)


print(f"Chebyshev Distance: {chebyshev_dist}")
print(f"Minkowski Distance (p=3): {minkowski_dist:.4f}")


Chebyshev Distance: 1
Minkowski Distance (p=3): 1.5874
