<a href="https://colab.research.google.com/github/HMy2912/LTSSUD-RecommenderSys-ColabFiltering/blob/main/Group_7_Seminar_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSC14116 - Group 7 - Parallel Collaborative Filtering Recommender System
**Week 1 (6/9/2025 – 6/14/2025)**  
**Member**: Đăng Hoàn Mỹ - 19127216  
**Project**: User-user Neighborhood-based Collaborative Filtering (NBCF) Recommender System using MovieLens 100K dataset.  
**Objective**: Build a movie recommender system with sequential (V1), Numba (V2), CUDA (V3), and CUDA with shared memory (V4) implementations, targeting 10× speedup and MAE < 1.2.
**References**
* MovieLens Datasets: https://grouplens.org/datasets/movielens/
* Viblo Tutorial: Basics of Collaborative Filtering.
* Machine Learning Cơ Bản: NBCF with MovieLens examples.
* Lei Mao’s Blog: Cosine Similarity vs. Pearson Correlation.

## 1. Environment Setup
Set up Google Colab with necessary libraries (`pandas`, `numpy`, `scipy`, `scikit-learn`, `numba`) and mount Google Drive for data storage. This ensures reproducibility and GPU access for future CUDA implementations (V3, V4).

In [1]:
import pandas as pd
import numpy as np
import scipy.sparse as sp
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from numba import jit, prange, cuda
import time
import cupy as cp
import cupyx.scipy.sparse as cp_sp
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Verify environment
import numba
print("Numba version:", numba.__version__)  # Check compatibility (e.g., 0.61.2)
!nvcc --version  # Expect CUDA ~11.x
!nvidia-smi  # Confirm T4 GPU

Numba version: 0.60.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
Tue Jul  8 14:50:08 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   37C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
| 

In [3]:
!pip install cupy-cuda12x



In [4]:
import cupy as cp
print("CuPy version:", cp.__version__)
# Test GPU array
a = cp.array([1, 2, 3])
print("CuPy array:", a)

CuPy version: 13.3.0
CuPy array: [1 2 3]


In [5]:
import numba.cuda as cuda
print("GPU available:", cuda.is_available())  # Should return True
if cuda.is_available():
    print("Detected GPUs:", cuda.detect())

GPU available: True
Found 1 CUDA devices
id 0             b'Tesla T4'                              [SUPPORTED]
                      Compute Capability: 7.5
                           PCI Device ID: 4
                              PCI Bus ID: 0
                                    UUID: GPU-d63a2369-5403-6df7-5470-1d147338f059
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 32
Summary:
	1/1 devices are supported
Detected GPUs: True


## 2. Understanding the NBCF Algorithm

### Overview

User-user Neighborhood-based Collaborative Filtering (NBCF) predicts a user’s movie ratings based on ratings from similar users. It’s suitable for MovieLens 100K (943 users, 1682 movies) due to fewer users than items, reducing similarity computation cost compared to `item-item` NBCF.

### Steps
1. **Load Dataset**: Create user-item matrix `R` (943×1682) from MovieLens 100K.
2. **Normalization**: Mean-center ratings to remove user bias, producing `R_norm`.
3. **Similarity Computation**: Compute user-user cosine similarities (V1: sequential, V2: Numba, V3: CUDA, V4: CUDA with shared memory).
4. **K-Nearest Neighbors (K-NN)**: Select 20 most similar users per user.
5. **Recommendation**: Predict ratings for unrated movies, recommend top-10.
6. **Evaluation**: Compute MAE (<1.2) and Precision@10 (~4%) on `u1.test`.

### Cosine Similarity

For users `u` and `v`, cosine similarity is:

$$ \text{sim}(u,v) = \frac{R_{\text{norm},u} \cdot R_{\text{norm},v}}{|R_{\text{norm},u}| |R_{\text{norm},v}|} $$

where $R_{\text{norm},u}$ is the mean-centered rating vector. This measures rating pattern similarity, ignoring magnitude.

## 3. Dataset Description

### MovieLens 100K
* Source: https://grouplens.org/datasets/movielens/100k/
* Files:
    * `u.data`: 100,000 ratings (tab-separated, columns: `user_id`, `item_id`, `rating`, `timestamp`).
    * `u.item`: 1682 movies (pipe-separated, columns: `item_id`, `title`, ...; use first two).
    * `u1.test`: ~20,000 test ratings (same format as `u.data`).
* Stats:
    * Users: 943
    * Movies: 1682
    * Ratings: ~100,000 (1–5 scale)
    * Sparsity: ~6.3% non-zero entries  ($\frac{100,000}{943 \times 1682 \approx 0.063$).
* Relevance: Ideal for user-user NBCF due to fewer users than movies, reducing similarity matrix size (943×943 vs. 1682×1682).

### Why Sparse Matrix?

The user-item matrix R (943×1682) has \~6.3% non-zero entries, making dense storage (\~12MB) inefficient.

A sparse CSR (Compressed Sparse Row) matrix reduces memory usage to \~1.2MB, critical for CUDA (V3, V4) on Colab’s T4 GPU (\~12.7GB VRAM).

# Step 1: Load Data and Create User-Item Matrix
Load MovieLens 100K (`u.data`, `u.item`) into pandas DataFrames, create sparse CSR matrix `R` (943×1682), and save to Google Drive for reuse.

In [6]:
# Load data
data_url = 'https://files.grouplens.org/datasets/movielens/ml-100k/u.data'
item_url = 'https://files.grouplens.org/datasets/movielens/ml-100k/u.item'
ratings = pd.read_csv(data_url, sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])
movies = pd.read_csv(item_url, sep='|', encoding='latin-1', usecols=[0, 1], names=['item_id', 'title'])
print("Ratings shape:", ratings.shape)  # (100000, 4)
print("Movies shape:", movies.shape)  # (1682, 2)

Ratings shape: (100000, 4)
Movies shape: (1682, 2)


In [7]:
# Save to Drive
ratings.to_csv('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/ml-100k_ratings.csv', index=False)
movies.to_csv('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/ml-100k_movies.csv', index=False)

In [8]:
# Create user-item matrix
n_users, n_items = 943, 1682
R = sp.csr_matrix((ratings['rating'], (ratings['user_id'] - 1, ratings['item_id'] - 1)), shape=(n_users, n_items))
print("User-item matrix shape:", R.shape, "Non-zero entries:", R.nnz)  # (943, 1682), ~100000
print("Sparsity:", R.nnz / (n_users * n_items))  # ~0.063
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/R_sparse.npy', R)

User-item matrix shape: (943, 1682) Non-zero entries: 100000
Sparsity: 0.06304669364224531


In [9]:
# Load test data (for later use)
test_url = 'https://files.grouplens.org/datasets/movielens/ml-100k/u1.test'
test_ratings = pd.read_csv(test_url, sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])
test_ratings.to_csv('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/ml-100k_test.csv', index=False)
print("Test ratings shape:", test_ratings.shape)  # (~20000, 4)

Test ratings shape: (20000, 4)


In [10]:
print("Sample ratings (user 1):", R[0].toarray()[0, :5])  # First 5 items

Sample ratings (user 1): [5 3 4 3 3]


# Step 2: Normalize User-Item Matrix
Mean-center ratings to remove user bias (e.g., picky users giving lower scores), producing `R_norm` and `user_means` for similarity computation.

In [11]:
def normalize_matrix(R):
    user_means = np.array(R.mean(axis=1)).flatten()  # Mean rating per user
    R_norm = R.copy()  # Preserve original R
    row_indices, col_indices = R_norm.nonzero()
    R_norm.data = R_norm.data - user_means[row_indices]  # Subtract mean from non-zero ratings
    return R_norm, user_means

In [12]:
R_norm, user_means = normalize_matrix(R)
print("Normalized matrix non-zero count:", R_norm.nnz)  # ~100000
print("User means shape:", user_means.shape)  # (943,)
print("Sample user means:", user_means[:5])  # First 5 users
print("Sample normalized ratings (user 1):", R_norm[0].toarray()[0, :5])  # First 5 items
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/R_norm_sparse.npy', R_norm)
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/user_means.npy', user_means)

Normalized matrix non-zero count: 100000
User means shape: (943,)
Sample user means: [0.58382878 0.13674197 0.08977408 0.06183115 0.29904875]
Sample normalized ratings (user 1): [4.41617122 2.41617122 3.41617122 2.41617122 2.41617122]


## Checking step

In [13]:
row, col = R_norm.nonzero()[0], R_norm.nonzero()[1]
print("Sample check:", R_norm[row[0], col[0]] == R[row[0], col[0]] - user_means[row[0]])  # True

Sample check: True


# Step 3: Implement Train/Validation Split

In [14]:
train_ratings, val_ratings = train_test_split(ratings, test_size=0.176, random_state=42)  # 15% of 85% = ~15,000
train_R = sp.csr_matrix((train_ratings['rating'], (train_ratings['user_id'] - 1, train_ratings['item_id'] - 1)), shape=(943, 1682))
val_R = sp.csr_matrix((val_ratings['rating'], (val_ratings['user_id'] - 1, val_ratings['item_id'] - 1)), shape=(943, 1682))
test_R = sp.csr_matrix((test_ratings['rating'], (test_ratings['user_id'] - 1, test_ratings['item_id'] - 1)), shape=(943, 1682))

In [15]:
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/train_R_sparse.npy', train_R)
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/val_R_sparse.npy', val_R)
np.save('/content/drive/MyDrive/2025/HK3/LTSSUD/Data/test_R_sparse.npy', test_R)
print("Train matrix non-zero count:", train_R.nnz)  # ~70,000
print("Validation matrix non-zero count:", val_R.nnz)  # ~15,000
print("Test matrix non-zero count:", test_R.nnz)  # ~15,000

Train matrix non-zero count: 82400
Validation matrix non-zero count: 17600
Test matrix non-zero count: 20000


# Step 4: Normalize Training Matrix
Mean-center the training user-item matrix (`train_R`) to remove user bias for similarity computation.

In [16]:
# Normalize training matrix (reusing Week 5 function)
def normalize_matrix(R):
    user_means = np.array(R.mean(axis=1)).flatten()  # Mean rating per user
    R_norm = R.copy()  # Preserve original R
    row_indices, col_indices = R_norm.nonzero()
    R_norm.data = R_norm.data - user_means[row_indices]  # Subtract mean from non-zero ratings
    return R_norm, user_means

In [17]:
train_R_norm, train_user_means = normalize_matrix(train_R)
print("Train normalized matrix non-zero count:", train_R_norm.nnz)  # ~82,400
print("Train user means shape:", train_user_means.shape)  # (943,)
print("Sample train user means:", train_user_means[:5])  # First 5 users
print("Sample train normalized ratings (user 1):", train_R_norm[0].toarray()[0, :5])  # First 5 items

Train normalized matrix non-zero count: 82400
Train user means shape: (943,)
Sample train user means: [0.46313912 0.10998811 0.07491082 0.05588585 0.24375743]
Sample train normalized ratings (user 1): [0.         2.53686088 3.53686088 0.         2.53686088]


In [18]:
# Verify normalization
row, col = train_R_norm.nonzero()[0], train_R_norm.nonzero()[1]
print("Normalization check:", train_R_norm[row[0], col[0]] == train_R[row[0], col[0]] - train_user_means[row[0]])

Normalization check: True


# Step 5: V1 Sequential Cosine Similarity
Compute user-user cosine similarity matrix using sequential Python (scikit-learn) on `train_R_norm`.

In [19]:
# V1: Sequential cosine similarity
start = time.time()
similarity_matrix_v1 = cosine_similarity(train_R_norm.toarray())
np.fill_diagonal(similarity_matrix_v1, 0)  # Exclude self-similarity
v1_time = time.time() - start

In [20]:
print(f"Cosine similarity (V1) time: {v1_time} s")
print("Similarity matrix shape:", similarity_matrix_v1.shape)  # (943, 943)
print("Sample similarities (user 1):", similarity_matrix_v1[0, 1:5])  # First few similarities

Cosine similarity (V1) time: 0.20035719871520996 s
Similarity matrix shape: (943, 943)
Sample similarities (user 1): [0.13289824 0.02755322 0.02830419 0.29438097]


# Step 6: V2 Numba Cosine Similarity
Compute user-user cosine similarity using Numba with parallel loops on sparse `train_R_norm`.

In [21]:
# V2: Numba cosine similarity
@jit(nopython=True, parallel=True)
def cosine_similarity_numba(R_data, R_indices, R_indptr, n_users, result):
    for i in prange(n_users):
        for j in range(i + 1, n_users):
            dot = 0.0
            norm_i = 0.0
            norm_j = 0.0
            start_i, end_i = R_indptr[i], R_indptr[i + 1]
            start_j, end_j = R_indptr[j], R_indptr[j + 1]
            idx_i = R_indices[start_i:end_i]
            idx_j = R_indices[start_j:end_j]
            ratings_i = R_data[start_i:end_i]
            ratings_j = R_data[start_j:end_j]
            k, l = 0, 0
            while k < len(idx_i) and l < len(idx_j):
                if idx_i[k] == idx_j[l]:
                    dot += ratings_i[k] * ratings_j[l]
                    norm_i += ratings_i[k] ** 2
                    norm_j += ratings_j[l] ** 2
                    k += 1
                    l += 1
                elif idx_i[k] < idx_j[l]:
                    norm_i += ratings_i[k] ** 2
                    k += 1
                else:
                    norm_j += ratings_j[l] ** 2
                    l += 1
            while k < len(idx_i):
                norm_i += ratings_i[k] ** 2
                k += 1
            while l < len(idx_j):
                norm_j += ratings_j[l] ** 2
                l += 1
            if norm_i * norm_j > 0:
                result[i, j] = dot / (np.sqrt(norm_i) * np.sqrt(norm_j))
                result[j, i] = result[i, j]

In [None]:
@jit(nopython=True, parallel=True)
def cosine_similarity_numba_opt(R_data, R_indices, R_indptr, n_users, result):
    norms = np.zeros(n_users, dtype=np.float32)
    for i in prange(n_users):
        start, end = R_indptr[i], R_indptr[i + 1]
        norms[i] = np.sqrt(np.sum(R_data[start:end] ** 2))
    for i in prange(n_users):
        for j in range(i + 1, n_users):
            start_i, end_i = R_indptr[i], R_indptr[i + 1]
            start_j, end_j = R_indptr[j], R_indptr[j + 1]
            common = np.intersect1d(R_indices[start_i:end_i], R_indices[start_j:end_j])
            if len(common) > 0:
                dot = np.sum(R_data[start_i:end_i][np.isin(R_indices[start_i:end_i], common)] *
                             R_data[start_j:end_j][np.isin(R_indices[start_j:end_j], common)])
                if norms[i] * norms[j] > 0:
                    result[i, j] = dot / (norms[i] * norms[j])
                    result[j, i] = result[i, j]

In [22]:
start = time.time()
similarity_matrix_v2 = np.zeros((n_users, n_users), dtype=np.float32)
cosine_similarity_numba(train_R_norm.data, train_R_norm.indices, train_R_norm.indptr, n_users, similarity_matrix_v2)
v2_time = time.time() - start

In [23]:
print(f"Cosine similarity (V2 Numba) time: {v2_time} s")
print("V1-V2 difference (max):", np.max(np.abs(similarity_matrix_v1 - similarity_matrix_v2)))
print("Similarity matrix shape:", similarity_matrix_v2.shape)  # (943, 943)

Cosine similarity (V2 Numba) time: 4.35981822013855 s
V1-V2 difference (max): 2.9625619513140578e-08
Similarity matrix shape: (943, 943)


# Step 7: V3 CUDA Cosine Similarity
Compute user-user cosine similarity using CUDA on `train_R_norm`. Try Numba CUDA kernel; fallback to CuPy if PTX error persists.

In [24]:
# V3: Numba CUDA cosine similarity (try first)
@cuda.jit
def cosine_similarity_cuda(R_data, R_indices, R_indptr, n_users, result):
    i, j = cuda.grid(2)
    if i < n_users and j < n_users and i < j:
        dot = 0.0
        norm_i = 0.0
        norm_j = 0.0
        start_i, end_i = R_indptr[i], R_indptr[i + 1]
        start_j, end_j = R_indptr[j], R_indptr[j + 1]
        idx_i = R_indices[start_i:end_i]
        idx_j = R_indices[start_j:end_j]
        ratings_i = R_data[start_i:end_i]
        ratings_j = R_data[start_j:end_j]
        k, l = 0, 0
        while k < len(idx_i) and l < len(idx_j):
            if idx_i[k] == idx_j[l]:
                dot += ratings_i[k] * ratings_j[l]
                norm_i += ratings_i[k] * ratings_i[k]
                norm_j += ratings_j[l] * ratings_j[l]
                k += 1
                l += 1
            elif idx_i[k] < idx_j[l]:
                norm_i += ratings_i[k] * ratings_i[k]
                k += 1
            else:
                norm_j += ratings_j[l] * ratings_j[l]
                l += 1
        while k < len(idx_i):
            norm_i += ratings_i[k] * ratings_i[k]
            k += 1
        while l < len(idx_j):
            norm_j += ratings_j[l] * ratings_j[l]
            l += 1
        if norm_i * norm_j > 0:
            result[i, j] = dot / (cuda.libdevice.sqrt(norm_i) * cuda.libdevice.sqrt(norm_j))
            result[j, i] = result[i, j]


In [25]:
start = time.time()
# Convert sparse matrix to GPU
train_R_norm_gpu = cp_sp.csr_matrix(train_R_norm)
# Compute norms for all users
norms = cp.sqrt(cp.array(train_R_norm_gpu.power(2).sum(axis=1)).flatten())
# Compute dot products via matrix multiplication
dot_products = train_R_norm_gpu.dot(train_R_norm_gpu.T).toarray()
# Compute cosine similarity: sim(u,v) = dot(u,v) / (norm(u) * norm(v))
norm_products = norms[:, None] * norms[None, :]
similarity_matrix_v3 = cp.where(norm_products > 0, dot_products / norm_products, 0)
# Set diagonal to zero
cp.fill_diagonal(similarity_matrix_v3, 0)
# Transfer to host
similarity_matrix_v3_host = cp.asnumpy(similarity_matrix_v3)
v3_time = time.time() - start


In [26]:
print(f"Cosine similarity (V3 CuPy) time: {v3_time} s")
print("V1-V3 difference (max):", np.max(np.abs(similarity_matrix_v1 - similarity_matrix_v3_host)))
print("Similarity matrix shape:", similarity_matrix_v3_host.shape)  # (943, 943)

Cosine similarity (V3 CuPy) time: 6.058340787887573 s
V1-V3 difference (max): 1.5543122344752192e-15
Similarity matrix shape: (943, 943)


# Step 8: K-Nearest Neighbors (K-NN)
Select top-20 similar users per user from the similarity matrix for prediction and recommendation.

In [27]:
# K-Nearest Neighbors (k=20)
def get_top_k_neighbors(sim_matrix, k=20):
    top_k = np.argsort(sim_matrix, axis=1)[:, -k:][:, ::-1]  # Top-k indices
    top_k_scores = np.take_along_axis(sim_matrix, top_k, axis=1)  # Corresponding scores
    return top_k, top_k_scores



In [28]:
top_k, top_k_scores = get_top_k_neighbors(similarity_matrix_v1)
print("Top-k neighbors shape:", top_k.shape)  # (943, 20)
print("Top-k scores shape:", top_k_scores.shape)  # (943, 20)
print("Sample neighbors (user 1):", top_k[0, :5])  # First 5 neighbors
print("Sample scores (user 1):", top_k_scores[0, :5])  # Their similarities

Top-k neighbors shape: (943, 20)
Top-k scores shape: (943, 20)
Sample neighbors (user 1): [513 456 434 267 428]
Sample scores (user 1): [0.45720621 0.44657831 0.44394793 0.4432999  0.44212913]


# Step 9: Rating Prediction
Predict ratings using weighted averages of top-20 neighbors’ normalized ratings, adjusted by user mean.

In [29]:
# Predict ratings
def predict_rating(user, item, sim_matrix, top_k, top_k_scores, R_norm, user_means):
    neighbors = top_k[user]
    weights = top_k_scores[user]
    ratings = R_norm[neighbors, item].toarray().flatten()
    valid = ratings != 0
    if np.sum(valid) == 0:
        return user_means[user]
    weighted_sum = np.sum(ratings[valid] * weights[valid])
    weight_sum = np.sum(weights[valid])
    return user_means[user] + (weighted_sum / weight_sum if weight_sum > 0 else 0)


In [30]:
# Test prediction
sample_user, sample_item = 0, 0
pred = predict_rating(sample_user, sample_item, similarity_matrix_v1, top_k, top_k_scores, train_R_norm, train_user_means)
print(f"Predicted rating for user {sample_user + 1}, item {sample_item + 1}: {pred}")

Predicted rating for user 1, item 1: 3.673782048395862


# Step 10: Top-10 Movie Recommendation
Recommend top-10 unrated movies per user based on predicted ratings.

In [37]:
# # Recommend top-10 movies
# def recommend_movies(user, sim_matrix, top_k, top_k_scores, R_norm, user_means, movies, n=10):
#     unrated_items = np.where(R_norm[user].toarray().flatten() == 0)[0]
#     if len(unrated_items) == 0:
#         return []
#     predictions = [predict_rating(user, item, sim_matrix, top_k, top_k_scores, R_norm, user_means) for item in unrated_items]
#     top_indices = np.argsort(predictions)[-n:][::-1]
#     top_items = unrated_items[top_indices]
#     return movies.iloc[top_items][['title']].values.flatten()

In [41]:
def recommend_movies(user, sim_matrix, top_k, top_k_scores, R_norm, user_means, movies, n=10):
    # Convert inputs to GPU
    R_norm_gpu = cp_sp.csr_matrix(R_norm)
    sim_matrix_gpu = cp.array(sim_matrix)
    top_k_gpu = cp.array(top_k)
    top_k_scores_gpu = cp.array(top_k_scores)
    user_means_gpu = cp.array(user_means)

    # Get unrated items
    unrated_items = cp.where(R_norm_gpu[user].toarray().flatten() == 0)[0]
    if len(unrated_items) == 0:
        return []

    # Vectorized prediction for all unrated items
    neighbors = top_k_gpu[user]
    weights = top_k_scores_gpu[user]
    # Ensure sparse indexing and convert to dense for computation
    ratings = R_norm_gpu[neighbors][:, unrated_items].toarray()  # Shape: (20, len(unrated_items))
    valid = ratings != 0  # Shape: (20, len(unrated_items))
    weights = weights.reshape(-1, 1)  # Shape: (20, 1) for broadcasting
    weighted_sums = cp.sum(ratings * valid * weights, axis=0)  # Shape: (len(unrated_items),)
    weight_sums = cp.sum(valid * weights, axis=0)  # Shape: (len(unrated_items),)
    predictions = user_means_gpu[user] + cp.where(weight_sums > 0, weighted_sums / weight_sums, 0)

    # Get top-n items
    top_indices = cp.argsort(predictions)[-n:][::-1]
    top_items = unrated_items[top_indices]
    return movies.iloc[cp.asnumpy(top_items)][['title']].values.flatten()

In [42]:
# Test recommendation
sample_user = 0
recs = recommend_movies(sample_user, similarity_matrix_v1, top_k, top_k_scores, train_R_norm, train_user_means, movies)
print(f"Top-10 recommendations for user {sample_user + 1}:", recs)

Top-10 recommendations for user 1: ['Seven Years in Tibet (1997)' 'War Room, The (1993)' 'Flirt (1995)'
 'Thin Blue Line, The (1988)' 'Down by Law (1986)'
 'Philadelphia Story, The (1940)' 'Schizopolis (1996)' 'Hard Eight (1996)'
 'Paths of Glory (1957)'
 'Englishman Who Went Up a Hill, But Came Down a Mountain, The (1995)']


# Step 11: Evaluation
Evaluate MAE (<1.2) and Precision@10 (~4%) on validation and test sets using V1’s similarity matrix.

In [43]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

In [44]:
# # Evaluate MAE and Precision@10
# def evaluate_model(sim_matrix, top_k, top_k_scores, R_norm, user_means, eval_ratings, dataset_name):
#     predictions = []
#     actuals = []
#     for _, row in eval_ratings.iterrows():
#         user, item, rating = row['user_id'] - 1, row['item_id'] - 1, row['rating']
#         pred = predict_rating(user, item, sim_matrix, top_k, top_k_scores, R_norm, user_means)
#         predictions.append(pred)
#         actuals.append(rating)
#     mae = mean_absolute_error(actuals, predictions)
#     print(f"{dataset_name} MAE:", mae)

#     precision = 0
#     n_users = len(eval_ratings['user_id'].unique())
#     for user in eval_ratings['user_id'].unique():
#         user = user - 1
#         recommendations = recommend_movies(user, sim_matrix, top_k, top_k_scores, R_norm, user_means, movies)
#         if len(recommendations) == 0:
#             continue
#         user_eval = eval_ratings[eval_ratings['user_id'] == user + 1]
#         relevant = set(user_eval[user_eval['rating'] >= 4]['item_id'].values - 1)
#         recommended = set(movies[movies['title'].isin(recommendations)]['item_id'].values - 1)
#         if recommended:
#             precision += len(relevant & recommended) / len(recommended)
#     precision = precision / n_users if n_users > 0 else 0
#     print(f"{dataset_name} Precision@10:", precision)
#     return mae, precision


In [45]:
def evaluate_model(sim_matrix, top_k, top_k_scores, R_norm, user_means, eval_ratings, dataset_name):
    start = time.time()
    predictions = []
    actuals = []
    for _, row in eval_ratings.iterrows():
        user, item, rating = row['user_id'] - 1, row['item_id'] - 1, row['rating']
        pred = predict_rating(user, item, sim_matrix, top_k, top_k_scores, R_norm, user_means)
        predictions.append(pred)
        actuals.append(rating)
    mae = mean_absolute_error(actuals, predictions)
    rmse = np.sqrt(mean_squared_error(actuals, predictions))  # Add RMSE
    print(f"{dataset_name} MAE: {mae:.4f}")
    print(f"{dataset_name} RMSE: {rmse:.4f}")

    precision = 0
    n_users = len(eval_ratings['user_id'].unique())
    for user in eval_ratings['user_id'].unique():
        user = user - 1
        recommendations = recommend_movies(user, sim_matrix, top_k, top_k_scores, R_norm, user_means, movies)
        if len(recommendations) == 0:
            continue
        user_eval = eval_ratings[eval_ratings['user_id'] == user + 1]
        relevant = set(user_eval[user_eval['rating'] >= 4]['item_id'].values - 1)
        recommended = set(movies[movies['title'].isin(recommendations)]['item_id'].values - 1)
        if recommended:
            precision += len(relevant & recommended) / len(recommended)
    precision = precision / n_users if n_users > 0 else 0
    print(f"{dataset_name} Precision@10: {precision:.4f}")
    print(f"{dataset_name} evaluation time: {time.time() - start:.2f} s")
    return mae, rmse, precision

In [46]:
# Evaluate V1
val_mae, val_rmse, val_precision = evaluate_model(similarity_matrix_v1, top_k, top_k_scores, train_R_norm, train_user_means, val_ratings, "Validation")
test_mae, test_rmse, test_precision = evaluate_model(similarity_matrix_v1, top_k, top_k_scores, train_R_norm, train_user_means, test_ratings, "Test")

Validation MAE: 0.9312
Validation RMSE: 1.2244
Validation Precision@10: 0.0135
Validation evaluation time: 18.22 s
Test MAE: 0.8948
Test RMSE: 1.1758
Test Precision@10: 0.0044
Test evaluation time: 12.13 s


# Week 6 Summary
- Normalized training matrix (`train_R_norm`, ~82,400 non-zero entries).
- Implemented V1 (sequential, ~{v1_time:.2f}s), V2 (Numba, ~{v2_time:.2f}s), V3 (CuPy, ~{v3_time:.2f}s) cosine similarity.
- Verified V2/V3 accuracy (max differences <1e-6).
- Added K-NN (k=20), rating prediction, and top-10 recommendation.
- Evaluated on validation (MAE: {val_mae:.4f}, Precision@10: {val_precision:.4f}) and test (MAE: {test_mae:.4f}, Precision@10: {test_precision:.4f}) sets.
- Speedup: V2/V1 = {v1_time/v2_time:.2f}x, V3/V1 = {v1_time/v3_time:.2f}x.
- **Next Steps**: Optimize V2/V3, implement V4 (CUDA with shared memory), tune k (10, 20, 30, 50), compare cosine vs. Euclidean similarity.