# <div style="text-align: center; background-color: orange; color: white; padding: 14px; line-height: 1; border-radius: 10px">Movie Recommender System
</div>

![image](https://cdn-images-1.medium.com/v2/resize:fit:1500/1*leuI7fVkeOrKAIGOOj_T9A.png)

# <center> Building a Movie Recommender System using TMDb Dataset
</center>

<hr>

<Hr>


## Statement

#### The objective of this project is to create a movie recommendation system using the TMDb dataset that provides movie recommendations based on user input. The system will allow users to search for a specific movie, and based on that input, it will suggest a list of similar movies. The recommendations will be generated by analyzing the movie metadata, such as genres, keywords, and production companies, to identify movies with similar characteristics. The system aims to assist users in discovering new movies that align with their preferences and interests.

## Objective
- The main objective of this project is to develop a movie recommendation system that:

- Utilizes the TMDb dataset to extract movie information, including titles, genres, keywords, and production companies.
- Provides a user interface where users can search for a specific movie.
- Analyzes the user's input and identifies movies with similar characteristics to the searched movie.
- Ranks the recommended movies based on their similarity to the searched movie.
- Presents the recommended movies to the user, along with their relevant details such as title, genre, and rating.
- Allows users to explore the recommended movies further, such as viewing additional information or watching trailers.

#### By achieving these objectives, we aim to create a movie recommendation system that assists users in finding movies similar to the ones they search for. This system aims to enhance the user experience by providing relevant movie suggestions and promoting movie discovery.

<hr>

<Hr>

In [75]:
# Importing Libraries

import numpy as np              # Library for numerical operations
import pandas as pd             # Library for data manipulation and analysis
import matplotlib.pyplot as plt # Library for creating plots and charts
import seaborn as sns           # Library for data visualization
import ast                      # Library for literal_eval function
import json                     # Library for working with JSON data
from sklearn.feature_extraction.text import CountVectorizer  # Library for text feature extraction
from sklearn.metrics.pairwise import cosine_similarity      # Library for calculating cosine similarity
import nltk                     # Natural Language Toolkit
from nltk.stem.porter import PorterStemmer                   # Stemming algorithm for text processing
import warnings 
warnings.filterwarnings('ignore')


# The code snippet above imports the necessary libraries for data analysis and visualization in Python.

# NumPy (imported as np) provides functions for numerical operations and working with arrays and matrices.

# Pandas (imported as pd) is used for data manipulation and analysis. It provides data structures like DataFrames to work with structured data.

# Matplotlib.pyplot (imported as plt) is a library for creating plots and charts in Python.

# Seaborn is a data visualization library that builds on top of matplotlib, providing additional functionality and a higher-level interface for creating visually appealing statistical graphics.

# The ast library is used for literal_eval function, which is used to evaluate strings containing Python literal structures.

# The json library is used for working with JSON data, which is commonly used for storing and exchanging data between a client and server.

# The sklearn.feature_extraction.text.CountVectorizer is a library for text feature extraction. It converts a collection of text documents into a matrix of token counts.

# The sklearn.metrics.pairwise.cosine_similarity is a library for calculating cosine similarity between vectors. It is often used in recommendation systems and text analysis.

# The nltk library is the Natural Language Toolkit, which provides tools and resources for working with human language data.

# The nltk.stem.porter.PorterStemmer is a stemming algorithm used for text processing, which reduces words to their base or root form.


In [76]:
# Loading datasets

# Loading the movie dataset from the 'tmdb_5000_movies.csv' file.
movies = pd.read_csv('./tmdb_5000_movies.csv')

# Loading the credits dataset from the 'tmdb_5000_credits.csv' file.
credits = pd.read_csv('./tmdb_5000_credits.csv')

# Comments:
# The code snippet above loads two datasets: 'tmdb_5000_movies.csv' and 'tmdb_5000_credits.csv'.
# The 'movies' dataset contains information about various movies, such as title, genres, runtime, budget, and revenue.
# It is loaded into a pandas DataFrame named 'movies'.

# The 'credits' dataset contains information about the movie credits, including cast and crew details.
# It is also loaded into a pandas DataFrame named 'credits'.

# Both datasets are loaded from CSV files using the pd.read_csv() function and stored in separate DataFrames for further analysis and processing.


In [77]:
# Displaying top 3 rows in the 'movies' dataset

movies.head(3)

# Comments:
# The code snippet above displays the top 3 rows of the 'movies' dataset.
# This gives a glimpse of the dataset by showing the first few rows, including the column headers and corresponding data.

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466


In [78]:
# Getting the shape of the movies dataset
movies_shape = movies.shape

# Comments:
# The code snippet above retrieves the shape of the 'movies' dataset.
# The shape function returns a tuple representing the dimensions of the dataset, where the first value represents the number of rows (observations) and the second value represents the number of columns (variables).

# Getting the shape of the credits dataset
credits_shape = credits.shape

# Comments:
# The code snippet above retrieves the shape of the 'credits' dataset.
# Similar to the previous comment, the shape function returns a tuple representing the dimensions of the dataset, where the first value represents the number of rows (observations) and the second value represents the number of columns (variables).

# Print Results
print("Shape of the movies dataset:", movies_shape)
print("Shape of the credits dataset:", credits_shape)

Shape of the movies dataset: (4803, 20)
Shape of the credits dataset: (4803, 4)


<hr>

<hr>

#### merging The datasets

In [79]:
# Merging movies dataset with credits dataset based on the 'title' column
movies = movies.merge(credits, on='title')

# Comments:
# The code snippet above merges the 'movies' dataset with the 'credits' dataset based on the common 'title' column.
# This operation combines the information from both datasets into a single dataset, allowing access to movie details and credits in one consolidated DataFrame.

# Displaying the first 2 rows of the merged dataset
movies.head(2)

# Comments:
# The code snippet above displays the first 2 rows of the merged dataset.
# This provides a preview of the merged DataFrame, showing the combined information from both datasets for the respective movie entries.

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [80]:
# Retrieving the columns in the combined dataset
columns_combined = movies.columns

# Comments:
# The code snippet above retrieves the column names in the combined 'movies' dataset.
# The columns attribute returns an Index object containing the column labels of the DataFrame.

# Printing the columns in the combined dataset
print("Columns in the combined dataset:")
print(columns_combined)

Columns in the combined dataset:
Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count', 'movie_id', 'cast', 'crew'],
      dtype='object')


#### Droping unnecessary columns

In [81]:
# Selecting specific columns in the movies dataset
movies = movies[['movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', 'crew']]

# Comments:
# The code snippet above selects specific columns from the 'movies' dataset and reassigns the DataFrame with the selected columns.
# The selected columns include 'movie_id', 'title', 'overview', 'genres', 'keywords', 'cast', and 'crew'.
# This operation helps to focus on the relevant columns for the movie recommender system.

# Displaying the first 2 rows of the updated dataset
movies.head(2)

# Comments:
# The code snippet above displays the first 2 rows of the updated 'movies' dataset.
# By selecting specific columns, the updated dataset now contains only the chosen columns, providing a concise view of the relevant movie information.

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [82]:
# Checking for missing values in the movies dataset
missing_values = movies.isnull().sum()

# Comments:
# The code snippet above checks for missing values in the 'movies' dataset using the isnull() function.
# The sum() function is then applied to calculate the total number of missing values for each column.

# Printing the missing values count
print("Missing values count in the movies dataset:")
print(missing_values)

# Comments:
# The code snippet above prints the count of missing values in the 'movies' dataset.
# The missing values count provides information about the number of missing values in each column of the dataset.

Missing values count in the movies dataset:
movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64


In [83]:
# Dropping rows with missing values
movies.dropna(inplace=True)

# Comments:
# The code snippet above drops rows with missing values from the 'movies' dataset using the dropna() function.
# The inplace=True parameter ensures that the changes are applied to the 'movies' DataFrame itself.

# Checking for duplicated rows
duplicate_count = movies.duplicated().sum()

# Comments:
# The code snippet above checks for duplicated rows in the 'movies' dataset using the duplicated() function.
# The sum() function is then applied to count the number of duplicated rows in the dataset.
# Printing the duplicate rows
print("Duplicatecounts in the movies dataset:")
print(duplicate_count)

# Comments:
# The code snippet above prints the duplicate rows in the 'movies' dataset.
# This provides a view of the duplicated rows present in the dataset, helping to identify and handle them as needed.

Duplicatecounts in the movies dataset:
0


# <div style="text-align: center; background-color: orange; color: white; padding: 14px; line-height: 1; border-radius: 10px">Text Pre Processing
</div>


In [84]:
def convert(text):
    """
    Converts a string representation of a list of dictionaries into a list of names.
    
    Parameters:
    text (str): String representation of a list of dictionaries.
    
    Returns:
    list: List of names extracted from the dictionaries.
    """
    L = []
    for i in ast.literal_eval(text):
        L.append(i['name'])
    return L

In [85]:
# Applying the convert function to the 'genres' column
movies['genres'] = movies['genres'].apply(convert)

# Comments:
# The code snippet above applies the 'convert' function to the 'genres' column in the 'movies' dataset.
# The apply() function is used to apply the specified function to each element in the 'genres' column.
# This operation converts the string representation of a list of dictionaries into a list of genre names.

# Applying the convert function to the 'keywords' column
movies['keywords'] = movies['keywords'].apply(convert)

# Comments:
# The code snippet above applies the 'convert' function to the 'keywords' column in the 'movies' dataset.
# Similarly, the apply() function is used to apply the 'convert' function to each element in the 'keywords' column.
# This operation converts the string representation of a list of dictionaries into a list of keyword names.

In [86]:
movies['genres']  # Accessing the 'genres' column/field in the 'movies' dataset

movies['keywords']  # Accessing the 'keywords' column/field in the 'movies' dataset

0       [culture clash, future, space war, space colon...
1       [ocean, drug abuse, exotic island, east india ...
2       [spy, based on novel, secret agent, sequel, mi...
3       [dc comics, crime fighter, terrorist, secret i...
4       [based on novel, mars, medallion, space travel...
                              ...                        
4804    [united states–mexico barrier, legs, arms, pap...
4805                                                   []
4806    [date, love at first sight, narration, investi...
4807                                                   []
4808            [obsession, camcorder, crush, dream girl]
Name: keywords, Length: 4806, dtype: object

In [87]:
def convert(obj):
    l = []  # Create an empty list to store the extracted names
    counter = 0  # Initialize a counter to keep track of the number of iterations
    
    # Iterate over the items in the evaluated object using ast.literal_eval()
    for i in ast.literal_eval(obj):
        if counter != 3:  # Check if the counter is not equal to 3
            counter += 1  # Increment the counter
            l.append(i['name'])  # Append the 'name' value from the current item to the list
        else:
            break  # If the counter reaches 3, exit the loop
        
    return l  # Return the list of extracted names


In [88]:
movies['cast'] = movies['cast'].apply(convert)  # Applying the 'convert' function to the 'cast' column and updating the column with the converted values

movies.head()  # Displaying the first few rows of the updated 'movies' dataset

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [89]:
def convert(obj):
    l = []  # Create an empty list to store the names
    
    # Iterate over the items in the evaluated object using ast.literal_eval()
    for i in ast.literal_eval(obj):
        if i['job'] == 'Director':  # Check if the value of the 'job' key is 'Director'
            l.append(i['name'])  # Append the 'name' value to the list
        break  # Exit the loop after processing the first item
        
    return l  # Return the list of director names

In [90]:
movies['crew'] = movies['crew'].apply(convert)  # Applying the 'convert' function to the 'crew' column and updating the column with the converted values

movies.head()  # Displaying the first few rows of the updated 'movies' dataset

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[]
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley]",[]
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[]
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman]",[]
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[]


In [91]:
movies['overview'] = movies['overview'].apply(lambda x: x.split())  # Splitting each string in the 'overview' column into a list of words

movies['genres'] = movies['genres'].apply(lambda x: [i.replace(" ","") for i in x])  # Removing spaces from each string in the 'genres' column

movies['keywords'] = movies['keywords'].apply(lambda x: [i.replace(" ","") for i in x])  # Removing spaces from each string in the 'keywords' column

movies['cast'] = movies['cast'].apply(lambda x: [i.replace(" ","") for i in x])  # Removing spaces from each string in the 'cast' column

movies['crew'] = movies['crew'].apply(lambda x: [i.replace(" ","") for i in x])  # Removing spaces from each string in the 'crew' column

movies['tags'] = movies['overview'] + movies['genres'] + movies['cast'] + movies['crew']  # Concatenating the 'overview', 'genres', 'cast', and 'crew' columns into a new 'tags' column

new_df = movies[['movie_id', 'title', 'tags']]  # Creating a new DataFrame 'new_df' with only the 'movie_id', 'title', and 'tags' columns

new_df  # Displaying the 'new_df' DataFrame

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4804,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4805,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4806,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4807,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [92]:
new_df['tags'] = new_df['tags'].apply(lambda x: " ".join(x))  # Joining the elements of each list in the 'tags' column into a single space-separated string


In [93]:
new_df['tags'][0]  # Accessing the 'tags' value at index 0 in the 'new_df' DataFrame


'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction SamWorthington ZoeSaldana SigourneyWeaver'

In [94]:
new_df['tags'] = new_df['tags'].apply(lambda x: x.lower())  # Converting the 'tags' column to lowercase

new_df['tags']  # Displaying the updated 'tags' column

0       in the 22nd century, a paraplegic marine is di...
1       captain barbossa, long believed to be dead, ha...
2       a cryptic message from bond’s past sends him o...
3       following the death of district attorney harve...
4       john carter is a war-weary, former military ca...
                              ...                        
4804    el mariachi just wants to play his guitar and ...
4805    a newlywed couple's honeymoon is upended by th...
4806    "signed, sealed, delivered" introduces a dedic...
4807    when ambitious new york attorney sam is sent t...
4808    ever since the second grade when he first saw ...
Name: tags, Length: 4806, dtype: object

In [95]:
# making object of stemming
ps=PorterStemmer() 

In [96]:
def stem(text):
    y = []  # Create an empty list to store the stemmed words
    
    # Iterate over each word in the text after splitting it
    for i in text.split():
        y.append(ps.stem(i))  # Apply stemming to each word using the 'ps.stem()' function and add the stemmed word to the list
    
    return " ".join(y)  # Join the stemmed words with a space and return the resulting string

In [97]:
new_df['tags'] = new_df['tags'].apply(stem)  # Applying the 'stem' function to each element in the 'tags' column and updating the column with the stemmed values

### <div style="text-align: center; background-color: orange; color: white; padding: 14px; line-height: 1; border-radius: 10px">Vectorization
</div>


In [98]:
cv = CountVectorizer(max_features=5000, stop_words='english')  # Creating a CountVectorizer object with a maximum of 5000 features and English stop words

vectors = cv.fit_transform(new_df['tags']).toarray()  # Transforming the 'tags' column into a matrix of token counts using CountVectorizer and converting it to a dense array


In [99]:
print(vectors.shape)  # Retrieving the shape of the 'vectors' matrix

similarity = cosine_similarity(vectors)  # Calculating the cosine similarity between the vectors in the 'vectors' matrix
print(similarity)

similar_docs = sorted(list(enumerate(similarity[0])), reverse=True, key=lambda x: x[1])[1:6]
# Sorting the similarity scores for the first document in descending order, excluding the first element (itself), and selecting the top 5 most similar documents


(4806, 5000)
[[1.         0.14638501 0.0855921  ... 0.         0.         0.        ]
 [0.14638501 1.         0.0877058  ... 0.03333333 0.         0.03627381]
 [0.0855921  0.0877058  1.         ... 0.02923527 0.         0.        ]
 ...
 [0.         0.03333333 0.02923527 ... 1.         0.04444444 0.02418254]
 [0.         0.         0.         ... 0.04444444 1.         0.09673017]
 [0.         0.03627381 0.         ... 0.02418254 0.09673017 1.        ]]


In [100]:
# lets have a look at the data once again
new_df

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a parapleg marin is dispa..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believ to be dead, ha c..."
2,206647,Spectre,a cryptic messag from bond’ past send him on a...
3,49026,The Dark Knight Rises,follow the death of district attorney harvey d...
4,49529,John Carter,"john carter is a war-weary, former militari ca..."
...,...,...,...
4804,9367,El Mariachi,el mariachi just want to play hi guitar and ca...
4805,72766,Newlyweds,a newlyw couple' honeymoon is upend by the arr...
4806,231617,"Signed, Sealed, Delivered","""signed, sealed, delivered"" introduc a dedic q..."
4807,126186,Shanghai Calling,when ambiti new york attorney sam is sent to s...


In [101]:
def recommend(movie):
    movie_index = new_df[new_df['title'] == movie].index[0]
    # Get the index of the movie in the 'new_df' DataFrame based on the specified movie title
    
    distances = similarity[movie_index]
    # Retrieve the similarity scores for the specified movie by accessing the corresponding row in the 'similarity' matrix
    
    movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]
    # Sort the similarity scores in descending order, excluding the first element (itself), and select the top 5 most similar movies
    
    for i in movies_list:
        print(new_df.iloc[i[0]].title)
        # Print the title of each recommended movie based on the index retrieved from 'movies_list'


In [102]:
# lets try
recommend('The Dark Knight')

The Dark Knight Rises
Amidst the Devil's Wings
Batman Begins
Nine Queens
Gangster's Paradise: Jerusalema


In [103]:
import pickle

In [104]:
pickle.dump(new_df.to_dict(),open('movies_dict.pkl','wb'))

In [105]:
pickle.dump(new_df,open('movies.pkl','wb'))

In [106]:
pickle.dump(similarity,open('similarity.pkl','wb'))