# What Is Item-Based Collaborative Filtering ?

- Item-based collaborative filtering is a technique used in recommender systems to provide personalized recommendations to users based on their preferences and the preferences of similar users. It is a form of collaborative filtering that focuses on the similarity between items rather than users.

- In item-based collaborative filtering, the recommendations are generated by identifying items that are similar to the ones a user has already shown interest in. The underlying assumption is that if a user likes or interacts with a particular item, they are likely to have similar preferences for other similar items.

- The process of item-based collaborative filtering typically involves the following steps:

- Data collection: Gather data on user-item interactions, such as ratings, reviews, or purchase history.

- Item similarity calculation: Calculate the similarity between items based on various metrics, such as cosine similarity or Pearson correlation. The similarity is usually determined by comparing the ratings or preferences of users who have interacted with both items.

- Neighborhood selection: Identify a subset of similar items for each item in the system. This subset, known as the item's neighborhood, consists of items that are most similar to the item in question.

- Recommendation generation: Once the item's neighborhood is established, the system can generate recommendations by considering the preferences of similar users. For a given user, the system identifies the items in their neighborhood that the user has not interacted with and recommends those items based on the assumption that the user will likely be interested in them.

- Item-based collaborative filtering has several advantages. It is computationally efficient and can handle large datasets and item catalogs. It also performs well when dealing with the "cold start" problem, where there is limited information about new users or items. Additionally, it can provide accurate recommendations based on item similarities.

- However, item-based collaborative filtering can suffer from the "sparsity" problem, where the user-item interaction matrix is sparse, meaning that most users have only interacted with a small fraction of the available items. In such cases, it can be challenging to find a sufficient number of similar items for recommendation.

- Overall, item-based collaborative filtering is a popular and effective approach in building recommender systems, particularly in scenarios where item similarities are well-defined and easily calculated.

![](https://predictivehacks.com/wp-content/uploads/2020/06/recommenders_systems.png)

# Business Problem

- An online film viewing platform wants to develop a recommendation system with collaborative filtering. The company, which is experimenting with content-based recommendation systems, wants to develop recommendations in a way to accommodate the opinions of the community.
- When users like a film, they want to recommend other films that have a similar liking pattern with that film.

# Dataset Story

- It contains films and the ratings given to these films.
- The data set contains about 2000000 ratings for about 27000 films.

**The dataset consists of two csv files**

- **1st csv file: film.csv file**
- movield: Unique film number
- title: Film name

- **2nd csv file : rating.csv file**

- userid = Unique user number.
- movield = Unique film number
- rating = Rating given to the film by the user
- timestamp = Evaluation date

# Road Map

- 1 Preparation of Data Set
- 2 Creating User Movie Df
- 3 Making Item-Based Film Suggestions
- 4 Preparation of Study Script

# 1. Preparation of Data Set

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# import Required Libraries

import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
# Adjusting Row Column Settings

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)

In [4]:
# Loading the Data Set

movie = pd.read_csv('/content/drive/MyDrive/movie.csv')
rating = pd.read_csv('/content/drive/MyDrive/rating.csv')

In [5]:
# Merging movie and rating data sets

df = movie.merge(rating, how="left", on="movieId")

In [6]:
# Preliminary examination of the data set

def check_df(dataframe, head=5):
    print('##################### Shape #####################')
    print(dataframe.shape)
    print('##################### Types #####################')
    print(dataframe.dtypes)
    print('##################### Head #####################')
    print(dataframe.head(head))
    print('##################### Tail #####################')
    print(dataframe.tail(head))
    print('##################### NA #####################')
    print(dataframe.isnull().sum())
    print('##################### Quantiles #####################')
    print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df)

##################### Shape #####################
(20000797, 6)
##################### Types #####################
movieId        int64
title         object
genres        object
userId       float64
rating       float64
timestamp     object
dtype: object
##################### Head #####################
   movieId             title                                       genres  userId  rating            timestamp
0        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy     3.0     4.0  1999-12-11 13:36:47
1        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy     6.0     5.0  1997-03-13 17:50:52
2        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy     8.0     4.0  1996-06-05 13:37:51
3        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy    10.0     4.0  1999-11-25 02:44:47
4        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy    11.0     4.5  2009-01-02 01:13:41
##################### Tail ####

# 2. Creating User Movie Df

In [7]:
# Xóa năm ở cuối tiêu đề

df['title'] = df['title'].str.replace(r"\s\(\d{4}\)$", "", regex=True)
# Tạo DataFrame với số lần xuất hiện của mỗi tiêu đề
comment_counts = pd.DataFrame(df["title"].value_counts())


In [8]:
print(comment_counts.columns)


Index(['count'], dtype='object')


In [9]:
rare_movies = comment_counts[comment_counts['count'] <= 1000].index

In [10]:
df['title'] = df['title'].str.replace(r"\s\(\d{4}\)$", "", regex=True)

In [11]:
rare_movies

Index(['Ted', 'Bear, The (Ours, L')', 'Rosewood', 'One Night at McCool's', 'Marked for Death', 'Three to Tango', 'Adam's Rib', 'Frankie and Johnny', 'Stakeout', 'Someone Like You',
       ...
       'Whitecoats', 'L'antisémite', 'Night Walker, The', 'Shadow Puppets', 'Stille nacht', 'Into the Mind', 'Road to Ruin, The', 'Prize of Peril, The (Prix du danger, Le)', 'Nukes in Space', 'Rentun Ruusu'], dtype='object', name='title', length=23082)

In [12]:
common_movies = df[~df["title"].isin(rare_movies)]

In [13]:
common_movies["title"].nunique()

3134

In [14]:
df["title"].nunique()

26216

In [None]:
# Creating User Movie Df
user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")

In [None]:
user_movie_df.head()

In [None]:
user_movie_df.shape

In [None]:
user_movie_df.columns

# 3. Making Item-Based Film Suggestions

In [None]:
movie_name1 = "Toy Story"

In [None]:
movie_name = user_movie_df[movie_name1]

In [None]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)


In [None]:
movie_name2 = "Ocean's Twelve"

In [None]:
movie_name = user_movie_df[movie_name2]

In [None]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

In [None]:
movie_name = pd.Series(user_movie_df.columns).sample(1).values[0]


In [None]:
movie_name = user_movie_df[movie_name]

In [None]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

In [None]:
def check_film(keyword, user_movie_df):
    return [col for col in user_movie_df.columns if keyword in col]

In [None]:
check_film("Insomnia", user_movie_df)

# 4. Preparation of Study Script

In [None]:
"""
def create_user_movie_df():
    import pandas as pd
    movie = pd.read_csv('/kaggle/input/movielens-20m-dataset/movie.csv')
    rating = pd.read_csv('/kaggle/input/movielens-20m-dataset/rating.csv')
    df = movie.merge(rating, how="left", on="movieId")
    comment_counts = pd.DataFrame(df["title"].value_counts())
    rare_movies = comment_counts[comment_counts["title"] <= 1000].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

"""


In [None]:
# user_movie_df = create_user_movie_df()

In [None]:
"""
def item_based_recommender(movie_name, user_movie_df):
    movie_name = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

"""

In [None]:
# item_based_recommender("Matrix, The (1999)", user_movie_df)

In [None]:
# movie_name = pd.Series(user_movie_df.columns).sample(1).values[0]

In [None]:
# item_based_recommender(movie_name, user_movie_df)

In [None]:
!pip install --no-deps flask

In [None]:
rating_sample = rating.sample(frac=0.5, random_state=42)  # Sử dụng 10% dữ liệu

# Gộp dữ liệu với movie và lọc các bộ phim phổ biến
df_sample = movie.merge(rating_sample, how="left", on="movieId")
df_sample['title'] = df_sample['title'].str.replace(r"\s\(\d{4}\)$", "", regex=True)

# Lọc các bộ phim phổ biến và tạo bảng User-Movie DataFrame
comment_counts = df_sample['title'].value_counts()
rare_movies = comment_counts[comment_counts <= 1000].index
common_movies_sample = df_sample[~df_sample['title'].isin(rare_movies)]
user_movie_df = common_movies_sample.pivot_table(index="userId", columns="title", values="rating")

In [None]:
from flask import Flask, render_template, request, send_from_directory
import pandas as pd
import os
import shutil
from google.colab import drive
from google.colab.output import eval_js

# Mount Google Drive
drive.mount('/content/drive', force_remount=True)

# Path to posters in Google Drive
POSTER_PATH = '/content/drive/MyDrive/posters/posters'
# Path to static folder in Colab
STATIC_POSTER_PATH = '/content/posters'

# Create static folder if it doesn't exist
os.makedirs(STATIC_POSTER_PATH, exist_ok=True)

# Copy posters to static folder
for filename in os.listdir(POSTER_PATH):
    if filename.endswith('.jpg'):
        shutil.copy(os.path.join(POSTER_PATH, filename), STATIC_POSTER_PATH)

# Load the datasets
# Replace with the actual paths to your CSV files
df_sample = pd.DataFrame({
    'title': ['Movie A', 'Movie B', 'Movie C'],
    'genres': ['Action', 'Comedy', 'Drama']
})
user_movie_df = pd.DataFrame({
    'Movie A': [5, 0, 0],
    'Movie B': [0, 3, 0],
    'Movie C': [0, 0, 4]
}, index=['User 1', 'User 2', 'User 3'])

# Initialize Flask app
app = Flask(__name__)

# Function to get movie info
def get_movie_info(movie_title):
    if movie_title in df_sample['title'].values:
        movie_info = df_sample[df_sample['title'] == movie_title].iloc[0]
        poster = f'/posters/{movie_title}.jpg'
        return {
            'title': movie_info['title'],
            'genres': movie_info['genres'],
            'poster': poster
        }
    return None

# Recommendation function
def recommend_movies(movie_name):
    if movie_name not in user_movie_df.columns:
        return []
    movie_series = user_movie_df[movie_name]
    similar_movies = user_movie_df.corrwith(movie_series).sort_values(ascending=False).head(10)
    recommended_movies = []
    for title in similar_movies.index:
        if title != movie_name:
            info = get_movie_info(title)
            if info:
                recommended_movies.append(info)
    return recommended_movies

# Main route
@app.route('/', methods=['GET', 'POST'])
def index():
    recommendations = []
    movie_info = None
    if request.method == 'POST':
        movie_name = request.form.get('movie_name')
        movie_info = get_movie_info(movie_name)
        if movie_info:
            recommendations = recommend_movies(movie_name)
    return render_template('index.html', movie_info=movie_info, recommendations=recommendations)

# Route to serve static files
@app.route('/posters/<path:filename>')
def send_poster(filename):
    return send_from_directory(STATIC_POSTER_PATH, filename)

# Run the app
if __name__ == '__main__':
    print(eval_js("google.colab.kernel.proxyPort(5000)"))
    app.run(port=5000)
