<a href="https://colab.research.google.com/github/Firojpaudel/Machine-Learning-Notes/blob/main/Practical%20Deep%20Learning%20For%20Coders/Chapter_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collaborative Filtering: _Movie Recommendation_

**Key Idea:** Recommend items based on user behavior patterns, not item features.

**Example:** Netflix suggests movies by finding users with similar viewing histories.

**Latent Factors:** Hidden preferences (e.g., genre, era) inferred from data, not explicitly stated.
***

#### **Dataset: MovieLens**
We use a 100k subset of MovieLens, containing:

- `User ID, Movie ID, Rating, Timestamp`

In [1]:
## First Setting up the notebook

%reload_ext autoreload
%autoreload 2
%matplotlib inline

## Installing the dependencies
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

## Importing the necessary libraries
from fastbook import *
from fastai.collab import *
from fastai.tabular.all import *
from fastai.vision.all import *

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/719.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m348.2/719.8 kB[0m [31m10.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m719.8/719.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━

##### A. Getting Dataset

In [2]:
path = untar_data(URLs.ML_100k)

ratings = pd.read_csv(
    path/'u.data', delimiter='\t', header=None,
    names= ['user', 'movie', 'rating', 'timestamp']
)

ratings.head()

Unnamed: 0,user,movie,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [5]:
##@ Optional Block....
last_skywalker = np.array([0.98, 0.9, -0.9])
user1 = np.array([0.9, 0.8, -0.6])
(user1 * last_skywalker).sum() ## Getting the dot product

2.1420000000000003

##### B. Creating DataLoaders

In [11]:
#@ Loading  the Movie Titles
movies= pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', usecols=(0,1),
                    names= ['movie', 'title'], header= None)
ratings= ratings.merge(movies)
ratings.head()

Unnamed: 0,user,movie,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)


In [9]:
#@ Constructing DataLoaders

dls = CollabDataLoaders.from_df(ratings, item_name= 'title', bs=64)
dls.show_batch()

Unnamed: 0,user,title,rating
0,782,Starship Troopers (1997),2
1,943,Judge Dredd (1995),3
2,758,Mission: Impossible (1996),4
3,94,Farewell My Concubine (1993),5
4,23,Psycho (1960),4
5,296,Secrets & Lies (1996),5
6,940,"American President, The (1995)",4
7,334,Star Trek VI: The Undiscovered Country (1991),1
8,380,Braveheart (1995),4
9,690,So I Married an Axe Murderer (1993),1
