# Product Recommendation System - Movielens (Filtering)

This notebook implements **item-based collaborative filtering** using cosine similarity.
We train on `train` interactions, then generate Top-K recommendations for a given user

In [3]:
import sys
from pathlib import Path

PROJECT_ROOT = Path('..').resolve()

if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

In [4]:
import pandas as pd
import numpy as np

from src.data import load_movielens, time_based_split
from src.filtering import fit_item_cf, recommend_for_user_item_cf

In [10]:
DATA_DIR = Path('..') / 'data'
ratings, movies = load_movielens(DATA_DIR)

train, test = time_based_split(
    ratings,
    test_ratio = 0.2,
    min_ratings_per_user = 5
)

print('----- Train: -----')
print(train)
print('\n----- Test: -----')
print(test)

----- Train: -----
        userId  movieId  rating   timestamp
43           1      804     4.0   964980499
73           1     1210     5.0   964980499
120          1     2018     5.0   964980523
171          1     2628     4.0   964980523
183          1     2826     4.0   964980523
...        ...      ...     ...         ...
100272     610    55067     3.5  1493848671
100629     610   103219     3.5  1493848674
100231     610    51666     2.0  1493848680
100699     610   112727     3.0  1493848682
100407     610    71732     3.5  1493848688

[80672 rows x 4 columns]

----- Test: -----
        userId  movieId  rating   timestamp
76           1     1219     2.0   964983393
91           1     1348     4.0   964983393
174          1     2644     4.0   964983393
176          1     2654     5.0   964983393
83           1     1258     3.0   964983414
...        ...      ...     ...         ...
100612     610   101739     3.5  1495959269
99540      610       70     4.0  1495959282
99556      6

## Fit item-based CF model

We use an implicit interaction matrix by default:
- Any rating counts as an interaction (value = 1)

In [11]:
model = fit_item_cf(
    train_ratings = train,
    use_implicit = True,
    shrinkage = 0.0
)

print(f'Items: {len(model.movie_ids)}')
print(f'Users: {len(model.user_ids)}')
print(f'Similarity matrix shape: {model.item_item_sim.shape}')

Items: 8239
Users: 610
Similarity matrix shape: (8239, 8239)


## Recommend for a user (sanity check)

We'll pick a user from the training set and display Top-10 recommendations

In [13]:
user_id = int(train['userId'].sample(1, random_state = 7).iloc[0])
user_id

1

In [15]:
recommended = recommend_for_user_item_cf(
    model = model,
    user_id = user_id,
    train_ratings = train,
    k = 10,
    candidate_pool = 200
)

recommended

[1240, 1265, 1391, 2011, 1036, 2716, 2918, 3033, 1387, 1270]

In [16]:
# Show titles
rec_titles = movies[movies['movieId'].isin(recommended)][['movieId', 'title', 'genres']]
rec_titles

Unnamed: 0,movieId,title,genres
793,1036,Die Hard (1988),Action|Crime|Thriller
939,1240,"Terminator, The (1984)",Action|Sci-Fi|Thriller
964,1265,Groundhog Day (1993),Comedy|Fantasy|Romance
969,1270,Back to the Future (1985),Adventure|Comedy|Sci-Fi
1067,1387,Jaws (1975),Action|Horror
1071,1391,Mars Attacks! (1996),Action|Comedy|Sci-Fi
1486,2011,Back to the Future Part II (1989),Adventure|Comedy|Sci-Fi
2038,2716,Ghostbusters (a.k.a. Ghost Busters) (1984),Action|Comedy|Sci-Fi
2195,2918,Ferris Bueller's Day Off (1986),Comedy
2286,3033,Spaceballs (1987),Comedy|Sci-Fi


## What did the user already rate?

This gives intuition about why the recommended movies might make sense

In [18]:
cols = ['movieId', 'title', 'genres']

user_history = (
    train[train['userId'] == user_id]
    .merge(movies[cols], on = 'movieId', how = 'left')
    .sort_values('timestamp')
)

user_history.tail(15)[cols] 

Unnamed: 0,movieId,title,genres
171,2268,"Few Good Men, A (1992)",Crime|Drama|Thriller
172,2580,Go (1999),Comedy|Crime
173,1396,Sneakers (1992),Action|Comedy|Crime|Drama|Sci-Fi
174,1804,"Newton Boys, The (1998)",Crime|Drama
175,2985,RoboCop (1987),Action|Crime|Drama|Sci-Fi|Thriller
176,1620,Kiss the Girls (1997),Crime|Drama|Mystery|Thriller
177,1805,Wild Things (1998),Crime|Drama|Mystery|Thriller
178,2616,Dick Tracy (1990),Action|Crime
179,2389,Psycho (1998),Crime|Horror|Thriller
180,3247,Sister Act (1992),Comedy|Crime


## Notes , Next Steps

- Add popularity fallback for true cold-start users
- Evaluate filtering vs basline in `evaluation.ipynb`
- Tune parameters:
  - `candidate_pool`
  - `shrinkage`
  - implicit vs explicit matrix