<a href="https://colab.research.google.com/github/PrateekCoder/Recommendation-Systems/blob/main/Content_Based_Movie_Recommendation_System_Using_Binary_Feature_Matrix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## YouTube 
###https://youtu.be/fsdjFdBbbpI

## Connect the Colab File with Google Drive

In [25]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [4]:
#Import all the required packages
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

In [5]:
# Load the movies.csv file into a Pandas dataframe
movies = pd.read_csv('gdrive/My Drive/datasets/movielens-10m/movies.csv')

In [6]:
movies

Unnamed: 0.1,Unnamed: 0,movieId,title,genres
0,0,2,Jumanji (1995),Adventure|Children|Fantasy
1,1,3,Grumpier Old Men (1995),Comedy|Romance
2,2,4,Waiting to Exhale (1995),Comedy|Drama|Romance
3,3,5,Father of the Bride Part II (1995),Comedy
4,4,6,Heat (1995),Action|Crime|Thriller
...,...,...,...,...
10675,10675,65088,Bedtime Stories (2008),Adventure|Children|Comedy
10676,10676,65091,Manhattan Melodrama (1934),Crime|Drama|Romance
10677,10677,65126,Choke (2008),Comedy|Drama
10678,10678,65130,Revolutionary Road (2008),Drama|Romance


A binary feature matrix for genres is used to represent the genre information of the movies in a numerical form that can be used as input for a recommendation system. The idea behind this is to represent each movie with a set of binary features that indicate whether the movie belongs to a certain genre or not. This allows the system to compare movies based on their genre similarity, rather than other attributes like director, cast, or release year.

The output of the binary feature matrix for genres will be a matrix where each row corresponds to a movie and each column corresponds to a genre. If a movie belongs to a certain genre, the corresponding entry in the matrix will be 1, otherwise it will be 0. For example, if we have 5 movies and 3 genres (Action, Drama, and Comedy), the binary feature matrix might look like this:



```
Action	Drama	Comedy
Movie 1	1	0	0
Movie 2	0	1	1
Movie 3	1	1	0
Movie 4	0	0	1
Movie 5	1	1	1
```


This is one of the most basic ways of implementing a content-based recommendation system. Another common way is to use a term frequency-inverse document frequency (TF-IDF) approach, where the genre information is represented as a weighted sum of the genre terms instead of binary features.


In [7]:
# Create a binary feature matrix for the genres
genre_matrix = pd.get_dummies(movies['genres'].str.split("|").apply(pd.Series).stack()).sum(level=0)

  genre_matrix = pd.get_dummies(movies['genres'].str.split("|").apply(pd.Series).stack()).sum(level=0)


In [8]:
genre_matrix

Unnamed: 0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10675,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10676,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0
10677,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0
10678,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0


In [12]:
# Compute the cosine similarity matrix
similarity = cosine_similarity(genre_matrix)
similarity

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.81649658, ..., 0.5       , 0.5       ,
        0.70710678],
       [0.        , 0.81649658, 1.        , ..., 0.81649658, 0.81649658,
        0.57735027],
       ...,
       [0.        , 0.5       , 0.81649658, ..., 1.        , 0.5       ,
        0.70710678],
       [0.        , 0.5       , 0.81649658, ..., 0.5       , 1.        ,
        0.        ],
       [0.        , 0.70710678, 0.57735027, ..., 0.70710678, 0.        ,
        1.        ]])

In [13]:
# Function to get the recommended movies
def get_recommendations(title, top_n=5):
    # Find the index of the movie with the given title
    idx = movies[movies['title'] == title].index[0]
    
    # Get the cosine similarity scores for the movie
    similarity_scores = list(enumerate(similarity[idx]))
    
    # Sort the similarity scores in descending order
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    
    # Get the top_n movie indices
    movie_indices = [i[0] for i in similarity_scores[1:top_n+1]]
    
    # Return the top_n most similar movies
    return movies['title'].iloc[movie_indices]

In [14]:
# Ask the user for the movie name
title = input("Enter the title of your favorite movie: ")

Enter the title of your favorite movie: Prestige, The (2006)


In [15]:
# Get the recommended movies
print("Top 5 similar movies:")
print(get_recommendations(title))

Top 5 similar movies:
947          39 Steps, The (1935)
1991           Blue Velvet (1986)
2123    Lady Vanishes, The (1938)
2420                   8MM (1999)
2676      Sixth Sense, The (1999)
Name: title, dtype: object
