In [1]:
import numpy as np

# Mini Recommendation Engine using NumPy


## Objective
Build a small-scale recommendation system using NumPy to understand
user–item matrices, similarity metrics, and vector-based predictions.


## Data Design

- Rows represent users
- Columns represent items (songs)
- Ratings range from 1 to 5
- Missing values indicate no interaction


## User–Item Rating Matrix Design

The following table represents a small-scale user–item interaction matrix
used for building a recommendation system.

- Rows represent users
- Columns represent songs
- Ratings range from 1 to 5
- Missing values (❓) indicate no interaction between the user and the song

| User ↓ / Song → | Song A | Song B | Song C | Song D | Song E | Song F |
|---------------|--------|--------|--------|--------|--------|--------|
| User 1 | 5 | 4 | ❓ | 1 | ❓ | 2 |
| User 2 | 4 | ❓ | 5 | 2 | 1 | ❓ |
| User 3 | ❓ | 2 | 4 | ❓ | 3 | 5 |
| User 4 | 1 | ❓ | 2 | 4 | 5 | ❓ |
| User 5 | 2 | 5 | ❓ | 3 | ❓ | 4 |

This matrix intentionally includes missing values to simulate real-world
data sparsity commonly found in recommendation systems.


In [None]:
ratings = np.array([
    [5, 4, np.nan, 1, np.nan, 2],
    [4, np.nan, 5, 2, 1, np.nan],
    [np.nan, 2, 4, np.nan, 3, 5],
    [1, np.nan, 2, 4, 5, np.nan],
    [2, 5, np.nan, 3, np.nan, 4]
])

print(ratings.shape)
ratings

(5, 6)


array([[ 5.,  4., nan,  1., nan,  2.],
       [ 4., nan,  5.,  2.,  1., nan],
       [nan,  2.,  4., nan,  3.,  5.],
       [ 1., nan,  2.,  4.,  5., nan],
       [ 2.,  5., nan,  3., nan,  4.]])

## Rating Normalization (Mean-Centering)

Users have different rating habits: some rate generously, while others are more strict.
To ensure that similarity calculations focus on relative preferences rather than
absolute rating scales, we normalize ratings by mean-centering per user.

Mean-centering is performed per user by subtracting each user's
average rating from their rated items. This ensures that similarity
calculations are based on relative preferences rather than absolute
rating values.


In [14]:
mean_per_user = np.nanmean(ratings, axis=1, keepdims=True)
ratings_normalized = ratings - mean_per_user
ratings_normalized

array([[ 2. ,  1. ,  nan, -2. ,  nan, -1. ],
       [ 1. ,  nan,  2. , -1. , -2. ,  nan],
       [ nan, -1.5,  0.5,  nan, -0.5,  1.5],
       [-2. ,  nan, -1. ,  1. ,  2. ,  nan],
       [-1.5,  1.5,  nan, -0.5,  nan,  0.5]])

## User–User Cosine Similarity

To measure similarity between users based on their rating patterns,
we use cosine similarity. Cosine similarity compares the angle between
two vectors rather than their magnitude, making it suitable for
capturing preference patterns after mean-centering.

#### Formula

$$
\text{Cosine Similarity}(A, B)
=
\frac{A \cdot B}{\|A\| \, \|B\|}
$$


Similarity values range from -1 to 1:
- 1   → very similar preferences
- 0   → no correlation
- -1  → opposite preferences


In [None]:
num_users = ratings_normalized.shape[0]

# Initialize similarity matrix
cosine_sim = np.zeros((num_users, num_users))

for user1 in range(num_users):
    for user2 in range(num_users):
        
        v1 = ratings_normalized[user1]
        v2 = ratings_normalized[user2]
        
        # Mask for co-rated items
        mask = (~np.isnan(v1)) & (~np.isnan(v2))
        
        v1_masked = v1[mask]
        v2_masked = v2[mask]
        
        # If no common ratings, similarity = 0
        if len(v1_masked) == 0:
            cosine_sim[user1, user2] = 0
            continue
        
        # Calculating magnitudes of the vectors
        norm1 = np.linalg.norm(v1_masked)
        norm2 = np.linalg.norm(v2_masked)
        
        # Avoiding division by zero
        if norm1 == 0 or norm2 == 0:
            cosine_sim[user1, user2] = 0
        else:
            cosine_sim[user1, user2] = np.dot(v1_masked, v2_masked) / (norm1 * norm2)

print(cosine_sim)

[[ 1.          1.         -1.         -0.9486833  -0.14142136]
 [ 1.          1.          1.         -0.9        -0.4472136 ]
 [-1.          1.          1.         -0.9486833  -0.4472136 ]
 [-0.9486833  -0.9        -0.9486833   1.          0.70710678]
 [-0.14142136 -0.4472136  -0.4472136   0.70710678  1.        ]]
