In [11]:
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_recommenders as tfrs

In [33]:
df = pd.read_csv('../datasets/ratings.csv')

In [34]:
df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,110,1.0,1425941529
1,1,147,4.5,1425942435
2,1,858,5.0,1425941523
3,1,1221,5.0,1425941546
4,1,1246,5.0,1425941556


In [35]:
df['userId'].unique()

array([     1,      2,      3, ..., 270894, 270895, 270896])

In [36]:
df['rating'].unique()

array([1. , 4.5, 5. , 4. , 3.5, 2.5, 0.5, 3. , 2. , 1.5])

In [37]:
df['rating'].nunique()

10

In [38]:
df['rating'].max()

np.float64(5.0)

In [39]:
df['rating'].min()

np.float64(0.5)

In [40]:
df['rating'].value_counts()

rating
4.0    6998802
3.0    5256722
5.0    3812499
3.5    3116213
4.5    2170441
2.0    1762440
2.5    1255358
1.0     843310
0.5     404897
1.5     403607
Name: count, dtype: int64

In [41]:
df.isnull().sum()

userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

In [42]:
df.drop(['timestamp'], axis=1, inplace=True)

In [43]:
df.head()

Unnamed: 0,userId,movieId,rating
0,1,110,1.0
1,1,147,4.5
2,1,858,5.0
3,1,1221,5.0
4,1,1246,5.0


In [44]:
df = df[0:1000000]

In [45]:
df.shape

(1000000, 3)

In [46]:
df.info

<bound method DataFrame.info of         userId  movieId  rating
0            1      110     1.0
1            1      147     4.5
2            1      858     5.0
3            1     1221     5.0
4            1     1246     5.0
...        ...      ...     ...
999995   10183      380     4.0
999996   10183      381     4.0
999997   10183      410     3.0
999998   10183      415     4.0
999999   10183      419     2.0

[1000000 rows x 3 columns]>

In [47]:
df['rating'].value_counts()

rating
4.0    269794
3.0    208032
5.0    144849
3.5    122558
4.5     82503
2.0     66423
2.5     48573
1.0     30815
1.5     13609
0.5     12844
Name: count, dtype: int64

In [48]:
df.duplicated().sum()

np.int64(0)

In [49]:
df.isnull().sum()

userId     0
movieId    0
rating     0
dtype: int64

# Recommendation System

A **Recommendation System** is a machine learning model designed to suggest items to users based on their preferences, interests, and item similarity. It helps reduce the time users spend manually for searching the items of his interest. Recommendation systems are widely used in various domains like e-commerce, streaming platforms, and social media.
Recommendation System mainly categories into 3 parts.
1. **Popularity-Based Recommendation**
2. **Content-Based Recommendation**
3. **Collaborative Filtering**

## 1. **Popularity Based Recommendation:**
This type of recommendation system suggests items that are trending or popular among all users, irrespective of their individual preferences.
For example:

- The trending section on Netflix recommends shows or movies that are widely viewed by other users.
- The "Trending Now" section on X (formerly Twitter) highlights trending posts or topics.
    
## 2. **Content based filtering**:
This Model used the feature of the currently visited or liked item by the user like if a user like a movie of genre action and comedy. Model will recommmend him similar movie containing same tags (action and comedy). To do show we use vectorization technique(eg. TF-IDF, count_vectorizer) and cosine similarity.
## 3. **Collaborative filtering**:
It used the idea of similarity in preference and co-occurence of the item.
It is called collaborative because it used collabortion of various user and items to suggest items.
Collaborative Filtering is further divided into:

1. **Neighbourhood based approach**
2. **Model Based Approaches**

### 3.1 **Neighbourhood based approach:**
This method identifies users or items with similar preferences.
- **User-Based Filtering:**

    - Finds users (neighbors) with similar preferences or behavior.
    - Predicts how likely a user is to like an item based on the preferences of similar users.
    - Example: If User A and User B have rated the same movies highly, a movie liked by User A but not yet watched by User B may be recommended to User B.

- **Item-Based Filtering:**

    - Focuses on the similarity between items based on user interactions.
    - Uses a co-occurrence matrix that records how often items are interacted with together.
    - Items are ranked and sorted based on similarity scores.
    - Example: If a user liked Movie A and Movie B, a new user liking Movie A might also be recommended Movie B.
User and item based filtering both uses nearest neighbour learning algorithm(**KNN**). As we know KNN is a run time algorithm and in any instant there is a million or billion of items and users, it will become memory consuming and takes lots of time. so during the run time, this is not a good approach in recommendation system.
User and Item based filtering both are classification method.

### 3.2 **Model Based Approach**
These approaches use advanced algorithms to build more scalable and accurate recommendation systems.
i) **Clustering-Based Filtering**
ii) **Matrix Factorization**
iii) **Deep Learning-Based Approaches**

In [50]:
class MatrixFactorizationModel(tfrs.Model):
    def __init__(self, num_users, num_items, embedding_dim):
        super().__init__()
        self.user_embedding = tf.keras.layers.Embedding(num_users, embedding_dim)
        self.item_embedding = tf.keras.layers.Embedding(num_items, embedding_dim)

    def call(self, features, training=False):
        user_ids = features['userId']
        item_ids = features['movieId']

        user_embeddings = self.user_embedding(user_ids)
        item_embeddings = self.item_embedding(item_ids)

        dot_product = tf.reduce_sum(user_embeddings * item_embeddings, axis=1)
        return dot_product

    def compute_loss(self, features, training=False):
        ratings = features['rating']
        user_ids = features['userId']
        item_ids = features['movieId']

        user_embeddings = self.user_embedding(user_ids)
        item_embeddings = self.item_embedding(item_ids)

        dot_product = tf.reduce_sum(user_embeddings * item_embeddings, axis=1)
        return tf.reduce_mean(tf.square(dot_product - tf.cast(ratings, tf.float32)))