### About Dataset

https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset

**The Movies Dataset** contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.

<br>

<hr>

### Import Libraries

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

<br>

<hr>

### Functions

In [6]:
def content_based_recommender(title, cosine_sim, dataframe):
    indices = pd.Series(dataframe.index, index=dataframe['title'])
    indices = indices[~indices.index.duplicated(keep='last')]
    movie_index = indices[title]
    similarity_scores = pd.DataFrame(cosine_sim[movie_index], columns=["score"])
    movie_indices = similarity_scores.sort_values("score", ascending=False)[1:11].index
    return dataframe['title'].iloc[movie_indices]

def calculate_cosine_sim(dataframe):
    tfidf = TfidfVectorizer(stop_words='english')
    dataframe['overview'] = dataframe['overview'].fillna('')
    tfidf_matrix = tfidf.fit_transform(dataframe['overview'])
    cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
    return cosine_sim

<br>

<hr>

### Read Dataset 

In [4]:
df_ = pd.read_csv("dataset/movies_metadata.csv", low_memory=False)
df = df_.copy()

<br>

<hr>

### Recommendation 

In [47]:
movie_title = 'Grumpier Old Men'

cosine_sim = calculate_cosine_sim(df)
content_based_recommender(movie_title,
                          cosine_sim,
                          df)

9207     An Extremely Goofy Movie
35575                         Max
443                      Fearless
235                 A Goofy Movie
4101                Heartbreakers
24576               The Guardians
31705        The Phantom of Paris
1617                         Bent
35304            The Zohar Secret
2282                     Rushmore
Name: title, dtype: object