# **Content-Based Recommendation System Notebook**
- In this notebook, we will explore and implement a content-based recommendation system. Content-based recommendation systems suggest items to users based on the characteristics of the items and a profile of the user's preferences. 
- This approach is particularly useful when we have a lot of information about the items and the users' preferences. We will build a simple content-based recommendation system using Python and the scikit-learn library.

## **1. Introduction**
- **What is a Content-Based Recommendation System?**
    - A content-based recommendation system recommends items to users based on the content or characteristics of the items. This type of recommendation system focuses on understanding the properties of items and learning user preferences from the items they have interacted with in the past.


- **How Does it Work?**
    - The working principle of a content-based recommendation system can be summarized in a few steps:
        1. **Feature Extraction**: Extract relevant features from the items. For example, in a movie recommendation system, features could include genre, director, actors, and plot keywords.

        2. **User Profile**: Create a user profile based on their interactions with items. This profile is essentially a summary of the features of items the user has liked or interacted with in the past.

        3. **Recommendation**: Calculate the similarity between the user profile and each item's features. Items that are most similar to the user profile are recommended.

## 2. **Data Preparation**
**Dataset**
   - We will use a dataset containing movie information, including titles, genres, and descriptions.

In [None]:
# Import needed modules
import numpy as np
import pandas as pd
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
# Read data
df = pd.read_csv('/kaggle/input/movies/movies.csv')

- Let's optain some analysis

In [None]:
# printing the first 5 rows of the dataframe
df.head()

In [None]:
# Get data information
df.info()

**Feature Extraction**
- We will extract relevant features from the dataset, such as movie genres and actors.

In [None]:
# Selecting the relevant features for recommendation
selected_features = ['genres','keywords','tagline','cast','director']
print(selected_features)

**Data Preprocessing**
- Before building the recommendation system, we need to preprocess the data. This may include text cleaning, handling missing values, and tokenization.

In [None]:
# Replacing the null valuess with null string
for feature in selected_features:
    df[feature] = df[feature].fillna('')

In [None]:
# combining all the 5 selected features
combined_features = df['genres'] + ' ' + df['keywords'] + ' ' + df['tagline'] + ' ' + df['cast'] + ' ' + df['director']
combined_features

## 3. **Building the Content-Based Recommendation System**
**TF-IDF Vectorization**
- We use TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text features (descriptions) into numerical vectors. 
- TF-IDF gives more weight to terms that are important in a specific document and less weight to common terms.


In [None]:
# converting the text data to feature vectors
vectorizer = TfidfVectorizer()

feature_vectors = vectorizer.fit_transform(combined_features)

In [None]:
print(feature_vectors)

**Cosine Similarity**
- We compute the cosine similarity between the TF-IDF vectors of items. Cosine similarity measures the cosine of the angle between two non-zero vectors and is used to determine how similar two items are based on their feature vectors.

In [None]:
# getting the similarity scores using cosine similarity
similarity = cosine_similarity(feature_vectors, feature_vectors)

In [None]:
print(similarity)

**Test your Recommendation System**

In [None]:
# creating a list with all the movie names given in the dataset
list_of_all_titles = df['title'].tolist()
print(list_of_all_titles)

In [None]:
# getting the movie name from the user
movie_name = input(' Enter your favourite movie name : ')

In [None]:
# finding the close match for the movie name given by the user
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

In [None]:
# finding the index of the movie with title
close_match = find_close_match[0]
index_of_the_movie = df[df.title == close_match]['index'].values[0]

In [None]:
index_of_the_movie

In [None]:
# getting a list of similar movies
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

In [None]:
# sorting the movies based on their similarity score
sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) 
print(sorted_similar_movies)

In [None]:
top_sim = sorted_similar_movies[:10]
top_sim

In [None]:
for movie in top_sim:
    title_from_index = df[df.index == movie[0]]['title'].values
    print(title_from_index)