# **Introduction** 

Most recommendation systems use **content-based filtering and collaborative filtering** to show recommendations to the user to provide a better user experience. **Content-based filtering** generates recommendations based on a user’s behaviour. In this article, I will walk you through what content-based filtering is in machine learning and how to implement it using Python.

A recommendation system is used to generate personalized recommendations by understanding a user’s preferences using data such as user history, time of viewing or reading etc. There are many applications based on recommendation systems. Most of the categories of these apps are:

- Online Shopping (Amazon, Zomato, etc.)
- Audio (Songs, Audiobooks, Podcast, etc.)
- Video Recommendations (YouTube, Netflix, Amazon Prime, etc.)

So there are two types of recommendation systems:

- Collaborative Filtering
- Content-Based Filtering

Collaborative filtering uses the behaviour of other users who have similar interests like you and based on the activities of those users, it shows you perfect recommendations. A recommendation system based on the content-based method will show you recommendations based on your behaviour. In the section below, I’ll walk you through how content-based filtering in machine learning works in detail, and then we’ll see how to implement it using Python.

A recommendation system based on content-based filtering provides recommendations to the user by analyzing the description of the content that has been rated by the user. In this method, the algorithm is trained to understand the context of the content and find similarities in other content to recommend the same class of content to a particular user.

Let’s understand the process of content-based filtering by looking at all the steps that are involved in this method for generating recommendations for the user:

 - It begins by identifying the keywords to understand the context of the content. In this step, it avoids unnecessary words such as stop words.
- Then it finds the same kind of context in other content to find the similarities. To determine the similarities between two or more contents, the content-based method uses cosine similarities.
- It finds similarities by analyzing the correlation between two or more users.
- Then finally it generates recommendations by calculating the weighted average of all user ratings for active users.

Hope you now understand how content-based filtering works. Now in the section below, I will walk you through how to implement it using the Python programming language.

I hope till now you have understood what are recommendation systems and how content-based method is used to generate recommendations for a user. Now let’s see how to implement content-based method with Python. For this task, I will be using the dataset provided by MovieLens to create a movie recommendation system using content-based filtering with Python.

Let’s start his task by importing the necessary Python libraries and the dataset:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
movies = pd.read_csv('/content/drive/MyDrive/Datasets/Movie Recommendation/movies_metadata.csv')
print(movies.head())

   adult  ... vote_count
0  False  ...     5415.0
1  False  ...     2413.0
2  False  ...       92.0
3  False  ...       34.0
4  False  ...      173.0

[5 rows x 24 columns]


  interactivity=interactivity, compiler=compiler, result=result)


Now, I’m going to implement all of the steps I talked about in the content-based filtering process mentioned above using Python. Here I will prepare the data first, then select the columns that we will use to understand the context of the content, then we will remove the stop words and finally, we will find the cosine similarities to generate recommendations:

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
movies['overview'] = movies['overview'].fillna('')
overview_matrix = tfidf.fit_transform(movies['overview'])
similarity_matrix = linear_kernel(overview_matrix,overview_matrix)
mapping = pd.Series(movies.index,index = movies['title'])
print(mapping)

Now let’s create a function and have a look at how the recommendation system is working:

In [None]:
def recommend_movies(movie_input):
    movie_index = mapping[movie_input]
    similarity_score = list(enumerate(similarity_matrix[movie_index]))
    similarity_score = sorted(similarity_score, key=lambda x: x[1], reverse=True)
    similarity_score = similarity_score[1:15]
    movie_indices = [i[0] for i in similarity_score]
    return (movies['title'].iloc[movie_indices])

print(recommend_movies('Life Begins for Andy Hardy'))

# **References**

[Content-Based Filtering in Machine Learning
](https://thecleverprogrammer.com/2021/02/10/content-based-filtering-in-machine-learning/)