# Movie Recommendation System Project - MVP

The goal of my project is to create a content-based movie recommendation system using Wikipedia movie plots. 

After trying different vectorizers and models, I decided to use the TFIDF vectorizer and NMF topic modeler to create 30 topics. Using the cosine similarity function, I was then able to find movies that shared similar topic profiles. 

See below for the first iteration of my recommendation system, which takes in a movie and outputs the top 10 most similar movies based on their topic profiles.

Next, I am hoping to find a way to incorporate filtering by movie origin/language in my recommendation system.

### Importing packages

In [1]:
import numpy as np 
import pandas as pd 

from nltk.corpus import wordnet
from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.metrics.pairwise import cosine_similarity

### Reading the CSV

In [2]:
movie_df = pd.read_csv('movie_model.csv')

In [3]:
#Making movie title with year the index
movie_df.set_index('Title year', inplace=True)

## Creating the recommendation system 
### 1. Creating the cosine similarity matrix for the function

In [5]:
#Creating vectors from tokenized documents
my_stop_words = text.ENGLISH_STOP_WORDS.union(['film', 'wa', 'ha', 'asks', 'say', 'tell', 'live'])
vectorizer = TfidfVectorizer(stop_words=my_stop_words)
doc_word = vectorizer.fit_transform(movie_df['Plot modeling'])

#Creating topics from vectors 
nmf_model = NMF(30, random_state=10, max_iter=1000)
doc_topic = nmf_model.fit_transform(doc_word)

#Creating matrix with the degree to which movies belong to different topics 
doc_topic_nmf = pd.DataFrame(doc_topic.round(5),index = movie_df.index)

#Creating matrix with similarity between movies
cosine_sim = cosine_similarity(doc_topic_nmf, doc_topic_nmf)



### 2. Creating the recommendation system function

In [6]:
# creating a Series for the movie titles so they are associated with an ordered numerical 
#list that I will use in the function to match the indexes
indices = pd.Series(movie_df.index)

#  defining the function that takes in movie title with year as input and returns the top 10 recommended movies
def recommendations(title, cosine_sim = cosine_sim):
    
    # initializing the empty list of recommended movies
    recommended_movies = []
    
    # gettin the index of the movie that matches the title (and year)
    idx = indices[indices == title].index[0]

    # creating a Series with the similarity scores in descending order
    score_series = pd.Series(cosine_sim[idx]).sort_values(ascending = False)

    # getting the indexes of the 10 most similar movies
    top_10_indexes = list(score_series.iloc[1:11].index)
    
    # populating the list with the titles of the best 10 matching movies
    for i in top_10_indexes:
        recommended_movies.append(list(movie_df.index)[i])
        
    return recommended_movies

### 3. Testing the recommendation system

In [7]:
recommendations('The Godfather 1972')

['Crooked House 2017',
 'Crooked House 2017',
 'Addams Family Values 1993',
 'The Godfather Part II 1974',
 'The Godfather Part III 1990',
 'Gotti 1996',
 'Sin of a Family 2011',
 'Chor Police 1983',
 'The Romanovs: A Crowned Family 2000',
 'Hungry Hill 1947']

In [8]:
recommendations('The Notebook 2004')

["Let's Live a Little 1948",
 'Humko Tumse Pyaar Hai 2006',
 'Rhythm in the Air 1936',
 'More Than Blue 2009',
 'Stonewall 1995',
 'Hell Is Sold Out 1951',
 'And Now Tomorrow 1944',
 'Chandni 1989',
 'Pyar ka Punchnama 2011',
 'Moulin Rouge 1952']