#**Title and Introduction**

 **Title**

Movie Recommendation System with Content-Based and Collaborative Filtering



**Introduction**


The goal of this project is to create a sophisticated recommendation system for movies that makes use of both content-based and collaborative filtering methods. The model analyses a particular movie based on keywords such as- genre, cast, director, writer and producers and suggests movies based on the similarities of those keywords. After this content based filtering, the suggested movies goes through a collaborative filtering to get the movie suggestion based on ratings from those similar movies.




#**Problem Statement**

**Problem Definition**

The implementation of a reliable and effective system for movie recommendations is the issue this project aims to solve. Based on their prior tastes and movie characteristics, this system seeks to offer customers personalised movie recommendations.



**Importance and Real-World Relevance**

On a variety of platforms, such as streaming services and movie recommender services, recommendation systems are crucial for improving customer engagement and experience. The user experience and content consumption can both be considerably improved by a well-designed movie recommendation system.



**Goals and Objectives**

Implement collaborative filtering and content-based recommendation techniques.
By offering accurate and relevant movie suggestions, which may increase user engagement and satisfacion.

#Content based filtering

##Data Collection and Preprocessing

###Data Sources

The dataset used for this project is sourced from Kaggle. It includes information about movies, ratings, overview etc.
Dataset link- https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata





In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
import ast
!pip install nltk
import nltk
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity



In [None]:
#Reading the datasets
movies = pd.read_csv('/content/drive/MyDrive/kaggle_api/tmdb_5000_movies.csv')
credits = pd.read_csv('/content/drive/MyDrive/kaggle_api/tmdb_5000_credits.csv')

###Data Collection

The dataset was collected from Kaggle where there were two datasets named- movies and credits. The movies dataset contains information of 4803 movies and attributes such as overview, keyword, genre, vote average and etc. On the other hand, credits dataset contains attributes like- title, cast, crew and etc.



In [None]:
#Movies Dataset
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [None]:
#Credits datset
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [None]:
#Merge movies and credits dataset based on title
movies.merge(credits,on='title').shape

(4809, 23)

In [None]:
#Save the merged dataset in movies dataframe
movies =movies.merge(credits,on='title')

In [None]:
movies.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [None]:
#Select only related columns for the system
movies= movies[['movie_id','vote_average','title','overview','genres','keywords','cast','crew']]

In [None]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   movie_id      4809 non-null   int64  
 1   vote_average  4809 non-null   float64
 2   title         4809 non-null   object 
 3   overview      4806 non-null   object 
 4   genres        4809 non-null   object 
 5   keywords      4809 non-null   object 
 6   cast          4809 non-null   object 
 7   crew          4809 non-null   object 
dtypes: float64(1), int64(1), object(6)
memory usage: 338.1+ KB


In [None]:
movies.isnull().sum()

movie_id        0
vote_average    0
title           0
overview        3
genres          0
keywords        0
cast            0
crew            0
dtype: int64

In [None]:
movies.duplicated().sum()

0

###Data Preprocessing

The datasets that are used were already cleaned. Here the movies and credits dataset is merged together based the 'title' column to create a dataframe. Only necessay and impactful columns from that dataframe are kept for further data preproceesing. Attributes named- 'movie_id', 'vote_average', 'title', 'overview', 'genres', 'keywords', 'cast' and 'crew' are used to create a new dataframe named 'movies'.

Duplicate entries and unimpactful columns for the model have been eliminated. Attributes named- 'overview', 'genres', 'keywords', 'cast' and 'crew' are used to create a new attribute for each movie named 'tags'. That particular column consists of the combination of the movie overview, genre of the movie, keywords, first 10 cast name of the movie and name of the director, producer and writer of the movie. The column stores all these information in form of a list. This particular column named 'tags' will be used for the contenbased filtering method.

In [None]:
movies.iloc[0].genres

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [None]:
def my_convert(obj):
    L=[]
    for i in ast.literal_eval(obj):
        L.append(i['name'])
    return L

###Convert necessary columns to list of strings

In [None]:
movies['genres']=movies['genres'].apply(my_convert)

In [None]:
movies['keywords']=movies['keywords'].apply(my_convert)

In [None]:
movies.head(2)

Unnamed: 0,movie_id,vote_average,title,overview,genres,keywords,cast,crew
0,19995,7.2,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,6.9,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


In [None]:
def my_convert2(obj):
    L=[]
    counter=0
    for i in ast.literal_eval(obj):
        if counter !=10:
            L.append(i['name'])
            counter+=1
        else:
            break
    return L

In [None]:
movies['cast']=movies['cast'].apply(my_convert2)

In [None]:
def my_fetch_crew(obj):
    L=[]
    for i in ast.literal_eval(obj):
        if i['job']=='Director' or i['job']=='Writer' or i['job']=='Producer':
            L.append(i['name'])
    return L

In [None]:
movies['crew']=movies['crew'].apply(my_fetch_crew)

In [None]:
movies['overview']=movies['overview'].apply(lambda x:str(x).split())

In [None]:
movies.dtypes

movie_id          int64
vote_average    float64
title            object
overview         object
genres           object
keywords         object
cast             object
crew             object
dtype: object

In [None]:
movies['genres']=movies['genres'].apply(lambda x:[i.replace(' ','') for i in x])

In [None]:
movies['keywords']=movies['keywords'].apply(lambda x:[i.replace(' ','') for i in x])
movies['cast']=movies['cast'].apply(lambda x:[i.replace(' ','') for i in x])
movies['crew']=movies['crew'].apply(lambda x:[i.replace(' ','') for i in x])

In [None]:
movies.head()

Unnamed: 0,movie_id,vote_average,title,overview,genres,keywords,cast,crew
0,19995,7.2,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver, ...","[JamesCameron, JamesCameron, JamesCameron, Jon..."
1,285,6.9,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley, Ste...","[GoreVerbinski, JerryBruckheimer, EricMcLeod, ..."
2,206647,6.3,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux, Ralp...","[SamMendes, BarbaraBroccoli, MichaelG.Wilson]"
3,49026,7.6,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman, Anne...","[CharlesRoven, ChristopherNolan, ChristopherNo..."
4,49529,6.1,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton, Wi...","[AndrewStanton, ColinWilson, JimMorris, Lindse..."


In [None]:
#Create a new column named tags to store all necessary info about each movies
movies['tags']= movies['overview']+movies['genres']+movies['keywords']+movies['cast']+movies['crew']

In [None]:
movies.head()

Unnamed: 0,movie_id,vote_average,title,overview,genres,keywords,cast,crew,tags
0,19995,7.2,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver, ...","[JamesCameron, JamesCameron, JamesCameron, Jon...","[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,6.9,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley, Ste...","[GoreVerbinski, JerryBruckheimer, EricMcLeod, ...","[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,6.3,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux, Ralp...","[SamMendes, BarbaraBroccoli, MichaelG.Wilson]","[A, cryptic, message, from, Bond’s, past, send..."
3,49026,7.6,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman, Anne...","[CharlesRoven, ChristopherNolan, ChristopherNo...","[Following, the, death, of, District, Attorney..."
4,49529,6.1,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton, Wi...","[AndrewStanton, ColinWilson, JimMorris, Lindse...","[John, Carter, is, a, war-weary,, former, mili..."


In [None]:
movies['tags'][0]

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.',
 'Action',
 'Adventure',
 'Fantasy',
 'ScienceFiction',
 'cultureclash',
 'future',
 'spacewar',
 'spacecolony',
 'society',
 'spacetravel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alienplanet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'loveaffair',
 'antiwar',
 'powerrelations',
 'mindandsoul',
 '3d',
 'SamWorthington',
 'ZoeSaldana',
 'SigourneyWeaver',
 'StephenLang',
 'MichelleRodriguez',
 'GiovanniRibisi',
 'JoelDavidMoore',
 'CCHPounder',
 'WesStudi',
 'LazAlonso',
 'JamesCameron',
 'JamesCameron',
 'JamesCameron',
 'JonLandau']

In [None]:
#Create a new dataframe with columns 'movie_id','vote_average','title' and 'tags'
new_df=movies[['movie_id','vote_average','title','tags']]

In [None]:
new_df.head()

Unnamed: 0,movie_id,vote_average,title,tags
0,19995,7.2,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,6.9,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,6.3,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,7.6,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,6.1,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."


In [None]:
#Convert tags to string from list
new_df['tags']=new_df['tags'].apply(lambda x:' '.join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:' '.join(x))


In [None]:
new_df.head()

Unnamed: 0,movie_id,vote_average,title,tags
0,19995,7.2,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,6.9,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,6.3,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,7.6,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,6.1,John Carter,"John Carter is a war-weary, former military ca..."


In [None]:
new_df['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver StephenLang MichelleRodriguez GiovanniRibisi JoelDavidMoore CCHPounder WesStudi LazAlonso JamesCameron JamesCameron JamesCameron JonLandau'

In [None]:
#Convert the string to lowercase
new_df['tags']=new_df['tags'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:x.lower())


In [None]:
new_df.head()

Unnamed: 0,movie_id,vote_average,title,tags
0,19995,7.2,Avatar,"in the 22nd century, a paraplegic marine is di..."
1,285,6.9,Pirates of the Caribbean: At World's End,"captain barbossa, long believed to be dead, ha..."
2,206647,6.3,Spectre,a cryptic message from bond’s past sends him o...
3,49026,7.6,The Dark Knight Rises,following the death of district attorney harve...
4,49529,6.1,John Carter,"john carter is a war-weary, former military ca..."


##Methodology

**Recommendation Methods**

Using count vectorizer function, the most frequent 5000 words are extracted from the tags column except the english stop words as features for the model. Then using python library nltk, the stem of each word is extracted. An array named vectors is created to store the frequency of each word/feature. Using cosine similarity from sklearn library, cosine distance between each vector is calculated. Using the cosine distance of two vectors as in two movies, the similarities between two movies are measured. If the cosine distance between two movies are smaller then the movies are more similar in nature. This is the mechanism for the content-based filtering method.

When the name of a particular movie is provided, the system takes the movie and finds the similarity of that movie with other movies in the dataset using the cosine distance. After that it provides with 10 movies with the least cosine distance from the provided movie. These are the 10 recmmended movies based on the conetent based filteing of the movies that is provided to the system.



In [None]:
#Use CountVectorizer to eliminate stop words from the tags column
cv= CountVectorizer(max_features=5000,stop_words='english')

In [None]:
#Transform every word into a vector
vectors=cv.fit_transform(new_df['tags']).toarray()

In [None]:
vectors.shape

(4809, 5000)

In [None]:
vectors

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [None]:
ps=PorterStemmer()

In [None]:
#Define a new function to extract stem of the words only from the string
def stem(text):
    y=[]

    for i in str(text).split():
        y.append(ps.stem(i))
    return ' '.join(y)

In [None]:
new_df['tags']=new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(stem)


In [None]:
new_df['tags'][0]

'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav stephenlang michellerodriguez giovanniribisi joeldavidmoor cchpounder wesstudi lazalonso jamescameron jamescameron jamescameron jonlandau'

In [None]:
#Calculate the cosine distance between each vectors
similarity=cosine_similarity(vectors)

###Content Based Recommender System

In [None]:
#Function for Content Based Recommendation
def recommend_content(movie):
    movie_index= new_df[new_df['title']==movie].index[0]
    distances=similarity[movie_index]
    movies_list= sorted(list(enumerate(distances)), reverse=True,key=lambda x:x[1])[0:5]
    title=[]
    rating=[]
    for i in movies_list:
        title.append(new_df.iloc[i[0]].title)
        rating.append(new_df.iloc[i[0]].vote_average)
    df = pd.DataFrame(list(zip(title, rating)),
               columns =['title', 'rating'])
    return df

In [None]:
#Pass a movie name through the function and get 5 movies recommended by content
recommended_movies=recommend_content('Batman')
recommended_movies

Unnamed: 0,title,rating
0,Batman,7.0
1,Batman,7.0
2,Batman & Robin,4.2
3,Batman Returns,6.6
4,Batman,6.1


##Results and Discussion
The system uses the cosine distance to determine similarities between the contents of the movies and recommends 5 movies based on that.


#Collaborative filtering



##Methodology
The code is building an item similarity matrix using Pearson correlation on the user_ratings DataFrame, which was constructed in the code snippet, item_similarity_df, will have movie titles as both rows and columns, with each cell (i,j) containing the Pearson correlation between movie i and movie j. Finally, it displays the first 100 rows of the computed item similarity matrix for inspection. This item similarity matrix is a fundamental component in item-item collaborative filtering

In [None]:
import pandas as pd
from scipy import sparse

In [None]:
movies = pd.read_csv('/content/drive/MyDrive/kaggle_api/tmdb_5000_movies.csv')
credits = pd.read_csv('/content/drive/MyDrive/kaggle_api/tmdb_5000_credits.csv')

In [None]:
movies = movies.merge(credits,on='title')

In [None]:
movies.head(100)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,165000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 18, ""...",http://www.interstellarmovie.net/,157336,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",en,Interstellar,Interstellar chronicles the adventures of a gr...,724.247784,"[{""name"": ""Paramount Pictures"", ""id"": 4}, {""na...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Mankind was born on Earth. It was never meant ...,Interstellar,8.1,10867,157336,"[{""cast_id"": 9, ""character"": ""Joseph Cooper"", ...","[{""credit_id"": ""52fe4bbf9251416c910e4801"", ""de..."
96,160000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 53, ""nam...",http://inceptionmovie.warnerbros.com/,27205,"[{""id"": 1014, ""name"": ""loss of lover""}, {""id"":...",en,Inception,"Cobb, a skilled thief who commits corporate es...",167.583710,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,148.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Your mind is the scene of the crime.,Inception,8.1,13752,27205,"[{""cast_id"": 1, ""character"": ""Dom Cobb"", ""cred...","[{""credit_id"": ""56e8462cc3a368408400354c"", ""de..."
97,15000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",,315011,"[{""id"": 1299, ""name"": ""monster""}, {""id"": 7671,...",ja,シン・ゴジラ,From the mind behind Evangelion comes a hit la...,9.476999,"[{""name"": ""Cine Bazar"", ""id"": 5896}, {""name"": ...",...,120.0,"[{""iso_639_1"": ""it"", ""name"": ""Italiano""}, {""is...",Released,A god incarnate. A city doomed.,Shin Godzilla,6.5,143,315011,"[{""cast_id"": 4, ""character"": ""Rando Yaguchi : ...","[{""credit_id"": ""5921d321c3a368799b05933f"", ""de..."
98,250000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://www.thehobbit.com/,49051,"[{""id"": 483, ""name"": ""riddle""}, {""id"": 603, ""n...",en,The Hobbit: An Unexpected Journey,"Bilbo Baggins, a hobbit enjoying his quiet lif...",108.849621,"[{""name"": ""WingNut Films"", ""id"": 11}, {""name"":...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,From the smallest beginnings come the greatest...,The Hobbit: An Unexpected Journey,7.0,8297,49051,"[{""cast_id"": 6, ""character"": ""Gandalf"", ""credi...","[{""credit_id"": ""52fe4783c3a36847f8139fa5"", ""de..."


In [None]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4809 entries, 0 to 4808
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4809 non-null   int64  
 1   genres                4809 non-null   object 
 2   homepage              1713 non-null   object 
 3   id                    4809 non-null   int64  
 4   keywords              4809 non-null   object 
 5   original_language     4809 non-null   object 
 6   original_title        4809 non-null   object 
 7   overview              4806 non-null   object 
 8   popularity            4809 non-null   float64
 9   production_companies  4809 non-null   object 
 10  production_countries  4809 non-null   object 
 11  release_date          4808 non-null   object 
 12  revenue               4809 non-null   int64  
 13  runtime               4807 non-null   float64
 14  spoken_languages      4809 non-null   object 
 15  status               

In [None]:
movies.isnull().sum()

budget                     0
genres                     0
homepage                3096
id                         0
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
movie_id                   0
cast                       0
crew                       0
dtype: int64

In [None]:
movies.duplicated().sum()

0

In [None]:
movies= movies[['movie_id','vote_average','title']]

###Data Preprocessing

The provided code transforms a movie dataset into a pivot table format where rows represent unique movie IDs, columns represent movie titles, and cells are filled with the average votes. After reshaping, it ensures that columns with only NaNs are dropped and any remaining NaNs in the table are replaced with zeroes.

In [None]:
user_ratings = movies.pivot_table(index=['movie_id'],columns=['title'],values='vote_average')
user_ratings = user_ratings.dropna(thresh=1, axis=1).fillna(0,axis=1)
user_ratings

title,#Horror,(500) Days of Summer,10 Cloverfield Lane,10 Days in a Madhouse,10 Things I Hate About You,102 Dalmatians,10th & Wolf,11:14,12 Angry Men,12 Rounds,...,Zoolander,Zoolander 2,Zoom,Zulu,[REC],[REC]²,eXistenZ,xXx,xXx: State of the Union,Æon Flux
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
426067,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
426469,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
433715,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
447027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
item_similarity_df = user_ratings.corr(method='pearson')
item_similarity_df.head(50)

title,#Horror,(500) Days of Summer,10 Cloverfield Lane,10 Days in a Madhouse,10 Things I Hate About You,102 Dalmatians,10th & Wolf,11:14,12 Angry Men,12 Rounds,...,Zoolander,Zoolander 2,Zoom,Zulu,[REC],[REC]²,eXistenZ,xXx,xXx: State of the Union,Æon Flux
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
#Horror,1.0,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
(500) Days of Summer,-0.000208,1.0,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
10 Cloverfield Lane,-0.000208,-0.000208,1.0,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
10 Days in a Madhouse,-0.000208,-0.000208,-0.000208,1.0,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
10 Things I Hate About You,-0.000208,-0.000208,-0.000208,-0.000208,1.0,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
102 Dalmatians,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,1.0,-0.000208,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
10th & Wolf,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,1.0,-0.000208,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
11:14,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,1.0,-0.000208,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
12 Angry Men,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,1.0,-0.000208,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208
12 Rounds,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,1.0,...,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208,-0.000208


In [None]:
def get_similar_movies(movie_name,rating):
    similar_score = item_similarity_df[movie_name]*(rating-2.5)
    similar_score = similar_score.sort_values(ascending=False)

    return similar_score

In [None]:
action_lover = [("Batman",5),("102 Dalmatians",4)]

similar_movies = pd.DataFrame()

for movie,rating in action_lover:
    similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index = True)

similar_movies.head(10)
similar_movies.sum().sort_values(ascending=False)

  similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index = True)
  similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index = True)


title
Batman                           2.499558
102 Dalmatians                   1.499264
Light from the Darkroom          0.000000
Hum To Mohabbat Karega           0.000000
House at the End of the Drive    0.000000
                                   ...   
The Interpreter                 -0.001049
Star Trek: The Motion Picture   -0.001049
War of the Worlds               -0.001049
Out of the Blue                 -0.001483
The Host                        -0.001483
Length: 4800, dtype: float64

##Results and Discussion
The system uses the correlation method and find the nearest rating to determine similarities between the movies and recommends some movies based on that.


#Conclusion
Recommendation system are now used on every space for attracting mass people with an easy solution to find anything based on a user's taste. Here, the system successfully recommend 5 movies to a user by analyzing the content of the movie.

However, the system can be further developed by using Collaborative Filtering Method using the movie ratings. Using a Hybrid Method, both content based and collaborative filtering can be used for a further effective recommender system.

#Code Repository
GitHub Link- https://github.com/anika072/AI-Project-CSE366

#Acknowledgments
We would like to convey our gratitude to our course instructor Dr. Mohammad Rifat Ahmmad Rashid for his guidance throughout the project

#Student Details
Anika Tabassum Nafisa

ID- 2019-3-60-072

E-mail: anikatnafisa1999@gmail.com

D.M. Rafiun Bin Masud

ID:2019-3-60-137

E-mail: dmrafiun@gmail.com

# streamlit app

In [None]:
import pickle

In [None]:
pickle.dump(movies.to_dict(),open('movies_dct.pkl','wb'))

In [None]:
pickle.dump(similarity,open('similarity.pkl','wb'))

In [None]:
movies['title'].values

array(['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre',
       ..., 'Signed, Sealed, Delivered', 'Shanghai Calling',
       'My Date with Drew'], dtype=object)

In [None]:
movies.to_dict()

{'movie_id': {0: 19995,
  1: 285,
  2: 206647,
  3: 49026,
  4: 49529,
  5: 559,
  6: 38757,
  7: 99861,
  8: 767,
  9: 209112,
  10: 1452,
  11: 10764,
  12: 58,
  13: 57201,
  14: 49521,
  15: 2454,
  16: 24428,
  17: 1865,
  18: 41154,
  19: 122917,
  20: 1930,
  21: 20662,
  22: 57158,
  23: 2268,
  24: 254,
  25: 597,
  26: 271110,
  27: 44833,
  28: 135397,
  29: 37724,
  30: 558,
  31: 68721,
  32: 12155,
  33: 36668,
  34: 62211,
  35: 8373,
  36: 91314,
  37: 68728,
  38: 102382,
  39: 20526,
  40: 49013,
  41: 44912,
  42: 10193,
  43: 534,
  44: 168259,
  45: 72190,
  46: 127585,
  47: 54138,
  48: 81005,
  49: 64682,
  50: 9543,
  51: 68726,
  52: 38356,
  53: 217,
  54: 105864,
  55: 62177,
  56: 188927,
  57: 10681,
  58: 5174,
  59: 14161,
  60: 17979,
  61: 76757,
  62: 258489,
  63: 411,
  64: 246655,
  65: 155,
  66: 14160,
  67: 15512,
  68: 1726,
  69: 44826,
  70: 8487,
  71: 1735,
  72: 297761,
  73: 2698,
  74: 137113,
  75: 9804,
  76: 14869,
  77: 150540,
  78: