## Movie Recommender System

This code aims to create a machine learning content based movie reccomender system. The data for this algorithm was acquired from kaggle https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata?select=tmdb_5000_credits.csv. The end goal is to build an end to end system and have this model be of use to anyone. 

### Importing datasets and Preprocessing

In [1]:
# Import libraries
import numpy as np 
import pandas as pd 

In [2]:
# Reading in the initial datasets
movies = pd.read_csv('tmdb_5000_movies.csv')
credits = pd.read_csv('tmdb_5000_credits.csv')

In [3]:
movies = movies.merge(credits, on='title')

In [4]:
# Removing unncessary columns 
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.dropna(inplace=True)

In [5]:
movies.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [6]:
# Need to convert the genres, keywords, cast, crew columns to usable data
import ast 
def keywords(text):
    ''' Gives list of Genres
    '''
    key = []
    for i in ast.literal_eval(text):
        key.append(i['name'])
    return key

def cast(text):
    '''Gives the 3 main actors in the movie
    '''
    actor = []
    counter = 0
    for i in ast.literal_eval(text):
        if counter < 3:
            actor.append(i['name'])
        counter+=1
    return actor

def director(text):
    '''Gives the director of the movie 
    '''
    direc = []
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            direc.append(i['name'])
    return direc


In [7]:
movies['genres'] = movies['genres'].apply(keywords)
movies['keywords'] = movies['keywords'].apply(keywords)

In [8]:
movies['cast'] = movies['cast'].apply(cast)
movies['crew'] = movies['crew'].apply(director)

In [9]:
movies['overview'] = movies['overview'].apply(lambda x:x.split())

In [10]:
movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" ","") for i in x])
movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" ","") for i in x])
movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" ","") for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" ","") for i in x])

In [11]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drugabuse, exoticisland, eastindiatrad...","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski]
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, basedonnovel, secretagent, sequel, mi6, ...","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes]
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dccomics, crimefighter, terrorist, secretiden...","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan]
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[basedonnovel, mars, medallion, spacetravel, p...","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton]


In [12]:
import nltk
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

In [13]:
# Concatenating all the preprocessed columns into one 
movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']
movies_new = movies.drop(columns=['overview','genres','keywords','cast','crew'])
movies_new['tags'] = movies_new['tags'].apply(lambda x: " ".join(x))
movies_new['tags'] = movies_new['tags'].apply(lambda x:x.lower())

In [15]:
def stem(text):
    y = []
    for i in text.split():
        y.append(ps.stem(i))
    return " ".join(y)

In [16]:
movies_new['tags'] = movies_new['tags'].apply(stem)

In [17]:
movies_new['tags'][0]

'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'

### Algorithm Building

Now I will turn every single tag into vectors using a bag of words text vectorization process.

In [18]:
from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(max_features=5000, stop_words='english')

In [19]:
vectors = cv.fit_transform(movies_new['tags']).toarray()

In [20]:
from sklearn.metrics.pairwise import cosine_similarity

In [21]:
similarities = cosine_similarity(vectors)

In [25]:
def recommend(movie):
    movie_index = movies_new[movies_new['title'] == movie].index[0]
    distances = similarities[movie_index]
    movies_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x:x[1])[1:6]
    
    for i in movies_list:
        print(movies_new.iloc[i[0]].title)

In [27]:
recommend('Batman Begins')

The Dark Knight
Batman
Batman
The Dark Knight Rises
10th & Wolf
