# MOVIE RECOMMENDATION SYSTEM

Netflix one of the most populous movie sites uses a recommendation system to keep looping their users into an endless cycle of movies. An action movie watched usually leads to another action movie of a similar genre.

The question here is how best can a movie to recommended to a Netflix user based on their genre?

In [1]:
# Import required libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from fuzzywuzzy import process
from sklearn.neighbors import NearestNeighbors



In [2]:
# Dataset from Kaggle
movies='netflix_titles.csv'
# Putting netflix data on 'movies' dataframe
movies=pd.read_csv(movies, usecols=['show_id', 'title','cast','listed_in'],dtype={'show_id':'str','title':'str','cast':'str','listed_in':'str'})
new_id=list(range(0, movies.shape[0]))
movies['new_id']=new_id
movies=movies[['show_id','title','cast','listed_in','new_id']]
movies=movies.fillna('')

In [3]:
movies['listed_in']=movies['listed_in'].str.replace('&','')
movies['listed_in']=movies['listed_in'].str.replace(',','')
movies['listed_in']=movies['listed_in'].str.replace('Sci-Fi','SciFi')
movies['listed_in']=movies['listed_in'].str.replace('TV','')
movies['listed_in']=movies['listed_in'].str.replace('Movies','')

Using TfidVectorizer which will convert the 'listed_in'(text column) into numerical as computers can only understand 0s and 1s.
TF-IDF means Term Frequency-Inverse Document Frequency. The number of features it creates is equal to the total number of distinct words in the listed_in column and the values are directly proportional to the number of times a particular word is used and inversely proportional to the number of documents in which the word is used.

In [4]:
tfv=TfidfVectorizer()

In [5]:
tfv_matrix=tfv.fit_transform(movies['listed_in'])

In [6]:
model_knn=NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=10)

In [7]:
model_knn.fit(tfv_matrix)

NearestNeighbors(algorithm='brute', metric='cosine', n_neighbors=10)

In [8]:
def recommender():
    while True:
        name=input('Enter name of movie: ')
        idx= process.extractOne(name,movies['title'])[2]
        print('Did you mean?: ',movies['title'][idx])
        print('Options: Yes/No/Quit')
        choice=''
        choice=input('> ').lower()
        if choice =='yes':
            print('Searching for similar movies....')
            model=model_knn
            data=tfv_matrix
            n_recommendations=10
            distances, indices=model.kneighbors(data[idx], n_neighbors=n_recommendations)
            for i in indices:
                print(movies['title'][i].where(i !=idx))
        elif choice == 'no':
            print('Sorry movie not available, input another movie name!')
        elif choice == 'quit':
            print('Recommender Exited')
            break
        else:
            print('I do not understand')
            break

In [9]:
recommender()

Enter name of movie: Blood and Water
Did you mean?:  Blood & Water
Options: Yes/No/Quit
> Yes
Searching for similar movies....
4030      Disappearance
4985        Tabula Rasa
3755               Jinn
1880        To the Lake
637                 Ray
699               Katla
703            The Gift
4741           Switched
225      Open Your Eyes
2999    The Ghost Bride
Name: title, dtype: object
Enter name of movie: up
Did you mean?:  Grown Ups
Options: Yes/No/Quit
> QUIT
Recommender Exited
