<div style="font-family: Ubuntu; font-size: 42px; font-weight:800; text-align: center; padding: 20px 0;">
    Movie Recommendation System
</div>
<div style="font-family: Ubuntu; font-size: 30px; font-weight:600; text-align: center; padding: 20px 0;">
    Top 5 movie recommendations 
</div>


> Over **100,000 ratings** and `1 million tags` applied to **9,000 movies by 600 users**

   >  **Author:**  [Mithamo Beth](https://github.com/Mythamor)

   ![movies.jpg](rs.jpg)

## Business Problem

In an era of rapidly evolving cinema, some of the most classic and culturally significant movies were made prior to 2018. However, the younger generation may not be as appreciative of this cinematic heritage, potentially missing out on timeless and influential films. To bridge this gap and ensure that these cinematic gems continue to be appreciated, Mithamo Beth aims to develop a movie recommendation system tailored to the younger audience's tastes.

### Problem Statement

Design and implement a movie recommendation system that targets the younger generation and recommends movies made before the year 2018. This recommendation system should consider the preferences and viewing habits of younger users to introduce them to classic movies that are artistically significant and culturally enriching.

### Project Goals

1. Develop a user-friendly web application that allows younger users to input their movie preferences, genres of interest, or favorite recent films.
2. Build a recommendation engine that utilizes collaborative filtering, content-based filtering, or hybrid methods to suggest classic movies released before 2018.
3. Ensure the recommendations align with the younger generation's taste preferences, considering factors such as genre, director, and historical significance.
4. Provide informative movie descriptions and highlights to engage users and spark their interest in classic cinema.
5. Implement a user feedback mechanism to collect data on the effectiveness of recommendations and continuously improve the system.

### Expected Outcomes

- Increased awareness and appreciation of classic movies among the younger audience.
- Enhanced user engagement and satisfaction with the recommendation system.
- A platform that encourages exploration of classic cinema while catering to modern tastes.
- Valuable insights into the preferences and viewing behaviors of younger users for future content recommendations.

By addressing this business problem and developing an effective recommendation system, we aim to bridge the generation gap in cinematic appreciation and ensure that timeless movies made before 2018 continue to find an audience among the younger generation.

## Objectives
- Recommendations based on genre
- Recommendations based on year of release
- Use a bigger dataset after running the smaller dataset
- Deploy and create a web application for movie recommendations

> I will create 2 different recommendation models, 
    1. Based on NLP - genre and tags
    2. Based on Collaborative filtering - item rating

## Scrape data from TMBD

In [1]:
# Import libraries
import pandas as pd
import numpy as np

# Visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Import surprise
import surprise

# Warnings
from warnings import filterwarnings

# Other imports
import requests
import os

In [2]:
api_key = ''

In [3]:
# Function to get genre names for a movie
def get_genres(tmdb_id):
    genre_url = f'https://api.themoviedb.org/3/movie/{tmdb_id}?api_key={api_key}'
    response = requests.get(genre_url)
    if response.status_code == 200:
        data = response.json()
        genres = [genre['name'] for genre in data.get('genres', [])]
        return genres
    else:
        print(f"Error fetching genres for movie with TMDB ID {tmdb_id}: {response.status_code}, {response.text}")
        return []

# Search for movies with the keyword 'Christmas' in the title
search_url = f'https://api.themoviedb.org/3/search/movie?api_key={api_key}&query=Christmas&page='

movies_data = []
page = 1

while True:
    response = requests.get(search_url + str(page))
    
    if response.status_code == 200:
        data = response.json()
        results = data.get('results', [])
        
        if not results:
            break  # No more pages, exit the loop
        
        # Extract and append movie data
        for movie in results:
            tmdb_id = movie.get('id')
            genres = get_genres(tmdb_id)
            
            movie_data = {
                'title': movie.get('title'),
                'genres': genres,
                'tmdb_id': tmdb_id,
                'overview': movie.get('overview'),
                'release_year': movie.get('release_date', '')[:4],  # Extracting the year from release_date
            }
            movies_data.append(movie_data)
        
        # Check if there are more pages
        total_pages = data.get('total_pages', 0)
        if page >= total_pages:
            break
        
        page += 1
    else:
        print(f"Error: {response.status_code}, {response.text}")
        break

# Create a DataFrame
movies_df = pd.DataFrame(movies_data)

# Print or export the DataFrame
movies_df.head()

Error fetching genres for movie with TMDB ID 1186471: 404, {"success":false,"status_code":34,"status_message":"The resource you requested could not be found."}
Error fetching genres for movie with TMDB ID 1192248: 404, {"success":false,"status_code":34,"status_message":"The resource you requested could not be found."}
Error fetching genres for movie with TMDB ID 1205914: 404, {"success":false,"status_code":34,"status_message":"The resource you requested could not be found."}
Error fetching genres for movie with TMDB ID 1202007: 404, {"success":false,"status_code":34,"status_message":"The resource you requested could not be found."}


Unnamed: 0,title,genres,tmdb_id,overview,release_year
0,Christmas Bloody Christmas,[Horror],1019836,It's Christmas Eve and Tori just wants to get ...,2022
1,How the Grinch Stole Christmas,"[Family, Comedy, Fantasy]",8871,The Grinch decides to rob Whoville of Christma...,2000
2,Dealing with Christmas,"[Comedy, Adventure, Action]",1204912,"On Christmas Eve, Greg, a solitary and tacitur...",2023
3,Christmas,"[Drama, TV Movie]",373519,"An adaptation of Dickens' ""A Christmas Carol ""...",1986
4,Christmas As Usual,"[Romance, Comedy, Drama]",1202584,"To celebrate their engagement, Thea takes Jash...",2023


In [4]:
# Investigate the dataset
movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3575 entries, 0 to 3574
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   title         3575 non-null   object
 1   genres        3575 non-null   object
 2   tmdb_id       3575 non-null   int64 
 3   overview      3575 non-null   object
 4   release_year  3575 non-null   object
dtypes: int64(1), object(4)
memory usage: 139.8+ KB


In [5]:
# Investigate the dataset
movies_df.sample(10)

Unnamed: 0,title,genres,tmdb_id,overview,release_year
2494,A Christmas Journey to Freedom,[],1056852,Viewers go on an exciting adventure as they tr...,2009.0
2665,Christmas on the North Pole Express,[],1121649,To be added,
698,My Dad's Christmas Date,"[Romance, Comedy]",754053,"It’s Christmas and the charming city of York, ...",2020.0
894,Christmas with Elisabeth,"[Drama, Romance]",325711,Surly solitary truck driver is given an assist...,1968.0
1577,Project: Puppies for Christmas,[Family],727057,Two young girls attempt to bring Christmas joy...,2019.0
2826,A Joint Custody Christmas,[Comedy],1100302,With the holidays just around the corner a div...,2022.0
2753,A Christmas Tree,[Animation],566263,"With the help of Charles Dickens, two young ch...",1972.0
57,The Christmas Retreat,"[TV Movie, Romance, Comedy]",1003485,When Kim's boyfriend breaks up with her instea...,2022.0
849,Last Christmas,"[Comedy, Drama, Horror]",651506,Six people celebrate Christmas together close ...,2015.0
825,Christmas at Camp 119,"[Drama, Comedy]",401577,Second World War . Field 119 in California (US...,1947.0


In [6]:
# The shape of the dataframe
movies_df.shape

(3575, 5)

In [7]:
# Function to convert list to comma-separated string
def convert_to_string(genres_list):
    return ', '.join(genres_list)

# Apply the convert_to_string function to the 'genres' column
movies_df['genres'] = movies_df['genres'].apply(convert_to_string)

# Print the updated DataFrame
movies_df.sample(10)

Unnamed: 0,title,genres,tmdb_id,overview,release_year
871,Cicada Crowing Christmas,"Comedy, Drama",849602,A group of four girlfriends stash a letter in ...,2021
123,I'll Be Home for Christmas,"Comedy, Family",17037,"Estranged from his father, college student Jak...",1998
2469,Broncho Billy's Christmas Spirit,Western,918036,"It is Christmas Eve, and a humble prospector h...",1914
3370,Single and Ready to Jingle,"TV Movie, Comedy, Romance",1003487,Emma Warner feels like she lives Christmas yea...,2022
2025,A Norman Rockwell Christmas Story,"Drama, Family",412401,This film brings to life a famous Norman Rockw...,1995
2853,White Bloody Christmas,Horror,695761,,2006
83,Christmas at the Drive-In,"TV Movie, Comedy, Romance",1026941,Sadie Walker is starting over in her hometown ...,2022
2965,Dairy Farmers of Canada 'Merry Christmas',Animation,720647,Re-imagine the holidays in a felted universe.,2018
126,Christmas in Summer,"Family, Fantasy",1174940,"Soon to be married, Eun-su heads to Gangneung ...",2023
2534,Christmas Bells,Animation,660924,This cute little short was made for the Austra...,1957


In [8]:
# Create a copy of the dataset
copy = movies_df.copy()
copy.head()

Unnamed: 0,title,genres,tmdb_id,overview,release_year
0,Christmas Bloody Christmas,Horror,1019836,It's Christmas Eve and Tori just wants to get ...,2022
1,How the Grinch Stole Christmas,"Family, Comedy, Fantasy",8871,The Grinch decides to rob Whoville of Christma...,2000
2,Dealing with Christmas,"Comedy, Adventure, Action",1204912,"On Christmas Eve, Greg, a solitary and tacitur...",2023
3,Christmas,"Drama, TV Movie",373519,"An adaptation of Dickens' ""A Christmas Carol ""...",1986
4,Christmas As Usual,"Romance, Comedy, Drama",1202584,"To celebrate their engagement, Thea takes Jash...",2023


In [18]:
# export the data
import csv
movies_df.to_csv('movie_data.csv', index=False, encoding='utf-8', quoting=csv.QUOTE_NONNUMERIC)

In [19]:
# Check the validity of the exported file
x = pd.read_csv('movie_data.csv')
x.head()

Unnamed: 0,title,genres,tmdb_id,overview,release_year
0,Christmas Bloody Christmas,Horror,1019836,It's Christmas Eve and Tori just wants to get ...,2022.0
1,How the Grinch Stole Christmas,"Family, Comedy, Fantasy",8871,The Grinch decides to rob Whoville of Christma...,2000.0
2,Dealing with Christmas,"Comedy, Adventure, Action",1204912,"On Christmas Eve, Greg, a solitary and tacitur...",2023.0
3,Christmas,"Drama, TV Movie",373519,"An adaptation of Dickens' ""A Christmas Carol ""...",1986.0
4,Christmas As Usual,"Romance, Comedy, Drama",1202584,"To celebrate their engagement, Thea takes Jash...",2023.0


In [86]:
movies_df['overview'].head(10)

0    On Christmas Eve, Greg, a solitary and tacitur...
1    The Grinch decides to rob Whoville of Christma...
2    An adaptation of Dickens' "A Christmas Carol "...
3    To celebrate their engagement, Thea takes Jash...
4    Many symbols and legends have become associate...
5    "This singular, bleakly funny, R-rated vision ...
6    On Christmas Eve, Kelly is reluctant to go to ...
7    The winter holidays are turning out to be espe...
8    A young girl wishes Christmas was over for goo...
9    It's Christmastime, and the Griswolds are prep...
Name: overview, dtype: object

In [90]:
import re
from collections import Counter
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download NLTK stop words data
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/mythamor/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /home/mythamor/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [93]:
# Combine all overviews into a single string
combined_overviews = ' '.join(movies_df['overview'])

# Remove non-alphanumeric characters and convert to lowercase
cleaned_text = re.sub(r'[^a-zA-Z\s]', '', combined_overviews).lower()

# Tokenize the text into words using NLTK
words = word_tokenize(cleaned_text)

# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word not in stop_words]

# Calculate word frequencies using Counter
word_frequencies = Counter(filtered_words)

# Print the top N words
top_words = word_frequencies.most_common(100)  # Change 10 to the desired number of top words

print("Top words:")
for word, frequency in top_words:
    print(f"{word}: {frequency}")

Top words:
christmas: 4069
holiday: 730
family: 683
santa: 510
new: 480
one: 431
time: 421
home: 409
love: 363
special: 358
eve: 347
life: 333
find: 333
help: 326
town: 298
get: 297
two: 295
together: 275
young: 273
holidays: 267
season: 259
friends: 258
story: 248
little: 238
year: 237
tree: 225
make: 218
back: 216
finds: 214
first: 214
way: 210
years: 209
must: 207
old: 206
true: 206
claus: 198
show: 198
day: 197
father: 196
spirit: 194
world: 190
save: 186
children: 184
man: 183
night: 181
music: 172
come: 169
film: 163
best: 153
three: 153
take: 150
small: 149
classic: 147
daughter: 145
house: 144
past: 143
gets: 142
takes: 138
around: 137
festive: 136
also: 136
city: 135
songs: 135
mother: 134
magic: 134
perfect: 132
go: 130
work: 130
boy: 130
presents: 129
woman: 124
santas: 122
along: 121
join: 121
musical: 119
scrooge: 117
comes: 117
party: 116
goes: 115
celebrate: 114
friend: 114
gift: 113
like: 111
parents: 111
ever: 110
live: 110
wish: 108
girl: 108
kids: 108
big: 106
school

In [21]:
# Make a copy of the movies_df
final_df = movies_df.copy()

In [22]:
# Concatenate genres and overview
final_df["tags"] = final_df["genres"] + " " + final_df['overview'] 

# Drop genre and tag colums
final_df = final_df.drop(["overview"], axis=1)

final_df

Unnamed: 0,title,genres,tmdb_id,release_year,tags
0,Christmas Bloody Christmas,Horror,1019836,2022,Horror It's Christmas Eve and Tori just wants ...
1,How the Grinch Stole Christmas,"Family, Comedy, Fantasy",8871,2000,"Family, Comedy, Fantasy The Grinch decides to ..."
2,Dealing with Christmas,"Comedy, Adventure, Action",1204912,2023,"Comedy, Adventure, Action On Christmas Eve, Gr..."
3,Christmas,"Drama, TV Movie",373519,1986,"Drama, TV Movie An adaptation of Dickens' ""A C..."
4,Christmas As Usual,"Romance, Comedy, Drama",1202584,2023,"Romance, Comedy, Drama To celebrate their enga..."
...,...,...,...,...,...
3570,Freeze Frame,"Animation, Comedy",197768,1979,"Animation, Comedy Wile E. Coyote chases the Ro..."
3571,No One Stays Good,"Drama, Thriller",999021,2022,"Drama, Thriller Two brothers, Kurt and Ron Gil..."
3572,Turkeys for Xmas,Documentary,1205433,1915,Documentary Feathered friends prepare to meet ...
3573,Post Haste,Comedy,362901,1943,Comedy A brief documentary about the history o...


In [66]:
# Investigate the final_df
final_df.iloc[1]["tags"][:200]

'Family, Comedy, Fantasy The Grinch decides to rob Whoville of Christmas - but a dash of kindness from little Cindy Lou Who and her family may be enough to melt his heart...'

In [67]:
# Remove the commas, and spaces on the tags column in prep for modeling
final_df["tags"] = final_df["tags"].apply(lambda x: str(x).replace(",", " "))

# Change the tags column to lowercase only
final_df["tags"] = final_df["tags"].apply(lambda x: x.lower())
final_df

Unnamed: 0,title,genres,tmdb_id,release_year,tags
0,Dealing with Christmas,"Comedy, Adventure, Action",1204912,2023,comedy adventure action on christmas eve gr...
1,How the Grinch Stole Christmas,"Family, Comedy, Fantasy",8871,2000,family comedy fantasy the grinch decides to ...
2,Christmas,"Drama, TV Movie",373519,1986,"drama tv movie an adaptation of dickens' ""a c..."
3,Christmas As Usual,"Romance, Comedy, Drama",1202584,2023,romance comedy drama to celebrate their enga...
4,Christmas,,488525,1994,many symbols and legends have become associat...
...,...,...,...,...,...
3566,Freeze Frame,"Animation, Comedy",197768,1979,animation comedy wile e. coyote chases the ro...
3567,Post Haste,Comedy,362901,1943,comedy a brief documentary about the history o...
3568,No One Stays Good,"Drama, Thriller",999021,2022,drama thriller two brothers kurt and ron gil...
3569,Turkeys for Xmas,Documentary,1205433,1915,documentary feathered friends prepare to meet ...


## NLP Modeling - based on genres, tags and keywords

In [23]:
#Import the nlp modelling libraries
from nltk.stem import PorterStemmer

In [24]:
# Instantiate the stemmer
ps = PorterStemmer()

In [25]:
# Create the function
def stems(text):
    l = []
    for i in text.split():
        l.append(ps.stem(i))
        
    return " ".join(l)

In [26]:
# apply stems
final_df["tags"] = final_df["tags"].apply(stems) 

In [27]:
# Investigate one of the rows
final_df.iloc[0]["tags"][:100]

"horror it' christma eve and tori just want to get drunk and party, but when a robot santa clau at a "

In [28]:
# Import the count vectorizer
from sklearn.feature_extraction.text import CountVectorizer

In [29]:
# Instatiate the count vectorizer
cv = CountVectorizer(max_features=5000, stop_words="english")
vector = cv.fit_transform(final_df["tags"]).toarray()
print(vector)
print(vector.shape)

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
(3575, 5000)


In [30]:
# Use a distance metric to measure similarity
from sklearn.metrics.pairwise import cosine_similarity

In [31]:
# Pass in the similarity vector
similarity = cosine_similarity(vector)
print(similarity)
print(similarity.shape)

[[1.         0.04950738 0.12468867 ... 0.         0.         0.        ]
 [0.04950738 1.         0.2074131  ... 0.         0.         0.12524486]
 [0.12468867 0.2074131  1.         ... 0.03863337 0.         0.12617606]
 ...
 [0.         0.         0.03863337 ... 1.         0.14142136 0.        ]
 [0.         0.         0.         ... 0.14142136 1.         0.        ]
 [0.         0.12524486 0.12617606 ... 0.         0.         1.        ]]
(3575, 3575)


In [47]:
import random

# Assuming final_df is your DataFrame
def recommend_year(movie):
    # Filter movies based on the specified release year
    matching_movies = final_df[final_df["release_year"] == movie]

    if not matching_movies.empty:
        # Shuffle the DataFrame to get random order
        shuffled_movies = matching_movies.sample(frac=1)

        # Select and print a random sample of 5 movies
        random_movies = shuffled_movies['title'].tolist()[:5]
        print("Recommended movies:")
        print(random_movies)
    else:
        print("No movies found for the given release year.")

In [33]:
# View available movies 
final_df.sample(10)

Unnamed: 0,title,genres,tmdb_id,release_year,tags
1754,Scream Queens Naked Christmas,"Horror, TV Movie",30401,1996,"horror, TV movi scream queen celebr christma w..."
1909,Dot & Spot's Magical Christmas Adventure,Animation,114019,1996,"anim christma is a time of magic, of wonder an..."
1013,Christmas CEO,"TV Movie, Romance",874193,2021,"TV movie, romanc A small toy compani ceo get a..."
898,The Christmas Visit,"Animation, Adventure, Family",743167,1959,"animation, adventure, famili santa clau lend h..."
852,Christmas,,846811,2021,dongdong reluctantli follow in hi father song'...
1246,How The Grinch Stole Christmas!,"Animation, Family, Fantasy",372205,1992,"animation, family, fantasi thi short video fea..."
3295,Christmas Carols on ITV,,1195203,2022,the show featur perform from brit and mobo awa...
1114,A Christmas for the Books,"Romance, TV Movie",565599,2018,"romance, TV movi lifestyl guru and romanc expe..."
1686,Remember It's Christmas,,1145314,2023,between a lifelong career of play santa at pri...
2183,Christmas Greetings,,899541,1983,An amateur home movi made by betti cook of a c...


In [64]:
def recommend_genre(movie):
    # Filter movies based on the specified genre
    matching_movies = final_df[final_df["genres"].str.contains(movie, case=False)]

    if not matching_movies.empty:
         # Shuffle the DataFrame to get random order
        shuffled_movies = matching_movies.sample(frac=1)
        
        # Select and print a random sample of 5 movies
        random_movies = shuffled_movies['title'].tolist()[:5]
        print("Recommended movies:")
        print(random_movies)
    else:
        print(f"No movies found for the given genre{movie}")

In [69]:
n = pd.read_csv('movie_data.csv')
n.head()

Unnamed: 0,title,genres,tmdb_id,overview,release_year
0,Christmas Bloody Christmas,Horror,1019836,It's Christmas Eve and Tori just wants to get ...,2022.0
1,How the Grinch Stole Christmas,"Family, Comedy, Fantasy",8871,The Grinch decides to rob Whoville of Christma...,2000.0
2,Dealing with Christmas,"Comedy, Adventure, Action",1204912,"On Christmas Eve, Greg, a solitary and tacitur...",2023.0
3,Christmas,"Drama, TV Movie",373519,"An adaptation of Dickens' ""A Christmas Carol ""...",1986.0
4,Christmas As Usual,"Romance, Comedy, Drama",1202584,"To celebrate their engagement, Thea takes Jash...",2023.0


In [49]:
# Test the model
recommend_year("2015")

Recommended movies:
['A Prince for Christmas', 'Christmas Dreams', 'The Spirit of Christmas', "Thomas & Friends: Thomas' Christmas Carol", 'A Luchagore Christmas']


In [66]:
# Test the model 3 
recommend_genre("Animation, Adventure, Family")

Recommended movies:
['A Trash Truck Christmas', "Timmy's Special Delivery: A Precious Moments Christmas", 'Christmas in New York', 'A Freezerburnt Christmas', "Davey and Goliath's Snowboard Christmas"]


In [68]:
# Pickle the df and similarity 
import pickle

pickle.dump(final_df, open('movie_list.pkl', 'wb'))
pickle.dump(similarity, open('similarity.pkl', 'wb'))

In [120]:
from sklearn.feature_extraction.text import CountVectorizer

# Tokenize and count word frequencies
cv = CountVectorizer(max_features=30, stop_words="english")
word_freq = cv.fit_transform(final_df["title"]).toarray()

# Get feature names (words)
words = cv.get_feature_names()

In [121]:
print(words)

['carol', 'christmas', 'country', 'day', 'disney', 'eve', 'family', 'gift', 'holiday', 'home', 'little', 'live', 'love', 'magic', 'magical', 'merry', 'miracle', 'new', 'night', 'royal', 'santa', 'special', 'star', 'story', 'tale', 'time', 'tree', 'white', 'wish', 'year']


In [118]:
word_freq_df = pd.DataFrame(word_freq, columns=words)
word_freq_df.head()

Unnamed: 0,carol,christmas,day,eve,family,holiday,home,little,love,merry,miracle,new,night,santa,special,star,story,time,tree,wish
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Movie Recommendation based on Themes

In [137]:
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

# Tokenize and count word frequencies
word_frequencies = {}
for overview in final_df['title']:
    for word in overview.split():
        # Use lowercase and remove punctuation
        word = word.lower().strip(".,!?'\"()-")
        if word not in ENGLISH_STOP_WORDS:
            word_frequencies[word] = word_frequencies.get(word, 0) + 1

# Identify the most frequent words (top 20 in this example)
most_frequent_words = sorted(word_frequencies.items(), key=lambda x: x[1], reverse=True)[:30]
most_frequent_words = [word for word, _ in most_frequent_words]

import random
def recommend_movies_by_theme(theme_words):
    # Find movies that contain the theme words in their overviews
    matching_movies = final_df[final_df['title'].str.contains('|'.join(theme_words), case=False)]

    if not matching_movies.empty:
        # Select and print a random sample of 5 movies
        random_movies = random.sample(matching_movies['title'].tolist(), min(5, len(matching_movies)))
        print("Recommended movies:")
        print(random_movies)
    else:
        print("No movies found for the given theme words.")

In [138]:
# Print the most frequent words
most_frequent_words

['christmas',
 'carol',
 'merry',
 'special',
 '',
 '&',
 'tree',
 'story',
 'night',
 'home',
 'eve',
 'love',
 'christmas:',
 'family',
 'little',
 'time',
 'wish',
 'holiday',
 'new',
 'miracle',
 'day',
 'gift',
 "it's",
 'royal',
 'country',
 'star',
 'white',
 'live',
 'tale',
 'magic']

In [144]:
fd = pd.DataFrame(most_frequent_words)
fd

Unnamed: 0,0
0,christmas
1,carol
2,merry
3,special
4,
5,&
6,tree
7,story
8,night
9,home


In [142]:
# Example usage
recommend_movies_by_theme('magic')

Recommended movies:
['Christmas Without You', 'Turkeys for Xmas', 'A Tuna Christmas', 'Christmas at Hampton Court', 'Christmas']
