## Content-Based Recommender Systems

Recommender systems suggest relevant items to users, like products, movies, books, or articles, based on their preferences and past behavior. There are several types of recommender systems, such as content-based filtering and collaborative filtering. In this notebook we will focus on building a simple content-based recommender system for movies.

A Content-based recommender system suggests items to users based on the features of the items and the user's preferences. For example, if a user likes action movies, the system will recommend other action movies.

## Dataset Overview
The dataset used in this notebook contains movie metadata, including the movie title and genres. 

## Steps to Build the Content-Based Recommender System
To build a simple content-based recommender system for movies using the movie genres as features we will follow the steps below: 

1. Import the necessary libraries such as pandas and sklearn for data manipulation, vectorization, and similarity computation.
2. Load the movie dataset and preprocess it.
3. Gather the item features into a single column and vectorize the text data.
4. Calculate the cosine similarity between movies based on their TF-IDF vectors.
5. Generate Top-N recommendations for a user

## Implementing Content-Based Recommendations

##### Import necessary libraries

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


##### Load movie dataset

In [2]:
movies = pd.read_csv('movies.csv', low_memory=False)

##### Data Preprocessing

In [3]:
movies['title'] = movies['title'].fillna('')
movies['genres'] = movies['genres'].fillna('')

##### Sampling Dataset

In [4]:
movies = movies.iloc[0:30000]

##### Vectorize the genres column using TF-IDF

In [5]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['genres'])


##### Compute cosine similarity between movies based on the TF-IDF vector

In [6]:
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

##### Build a reverse map of indices and movie titles Step 5: Build a reverse map of indices and movie titles

In [7]:
indices = pd.Series(movies.index, index=movies['title']).drop_duplicates()

##### Building Content-based Recommender

In [8]:
def get_recommendations(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = indices[title]

    # Get the pairwise similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return movies['title'].iloc[movie_indices]

##### Generating Top-N movie recommendations based on similarity for a specific movie

In [9]:
movie_title = 'Toy Story (1995)'
recommendations = get_recommendations(movie_title)

recommendations_df = pd.DataFrame(recommendations).reset_index().rename(columns = {'index': 'Movie_id', 'title': 'Movie_title'})

recommendations_df

Unnamed: 0,Movie_id,Movie_title
0,2203,Antz (1998)
1,3021,Toy Story 2 (1999)
2,3653,"Adventures of Rocky and Bullwinkle, The (2000)"
3,3912,"Emperor's New Groove, The (2000)"
4,4780,"Monsters, Inc. (2001)"
5,9949,DuckTales: The Movie - Treasure of the Lost La...
6,10773,"Wild, The (2006)"
7,11604,Shrek the Third (2007)
8,12969,"Tale of Despereaux, The (2008)"
9,17431,Asterix and the Vikings (Astérix et les Viking...
