# Quick Movie Recommender
See how QuickRecommender can be used for a simple movie recommendation.
In this notebook, I've loaded my custom version of MovieLens, selected a random subset (due to the memory limit), and used a simple TF-IDF vectorization on the titles, overviews, cast list and genres of the movies. I've also applied LSA and normalization on top. The result will be a dense matrix containing all features. This matrix will be fed into QuickRecommender and it'll start recommending movies randomly at first, but will start to recommend more relevant items as you go on selecting movies you like.

## Import dependencies & movies dataset

In [1]:
from quickrecommender import QuickRecommender
import pandas as pd
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD as LSA
from sklearn.preprocessing import Normalizer

In [2]:
movie_db = pd.read_csv('output.csv')
movie_db = movie_db.sample(frac=0.6198).reset_index(drop=True)
len(movie_db.index)

20000

We'll be continuing with 20000 movies.

In [3]:
title_corpus = movie_db['title'].astype(str).values.tolist()
description_corpus = movie_db['desc'].astype(str).values.tolist()
cast_corpus = movie_db['cast'].astype(str).values.tolist()
genres_corpus = movie_db['genres'].astype(str).values.tolist()
keywords_corpus = movie_db['keywords'].astype(str).values.tolist()

## Vectorization, dim-reduction and normalization

In [4]:
pipe_sm = Pipeline([
    ('tfidfvectorizer', TfidfVectorizer()),
    ('lsa', LSA(n_components=16, algorithm='arpack', tol=1e-10, random_state=0)),
    ('normalizer', Normalizer())])
pipe_lg = Pipeline([
    ('tfidfvectorizer', TfidfVectorizer()),
    ('lsa', LSA(n_components=128, algorithm='arpack', tol=1e-10, random_state=0)),
    ('normalizer', Normalizer())])

X_titles = pipe_lg.fit_transform(title_corpus)
X_desc = pipe_lg.fit_transform(description_corpus)
X_cast = pipe_lg.fit_transform(cast_corpus)
X_keywords = pipe_lg.fit_transform(keywords_corpus)
X_genres = pipe_sm.fit_transform(genres_corpus)

X = Normalizer().fit_transform(np.concatenate((X_titles, X_desc, X_cast, X_genres), axis=1))

## Fitting QuickRecommender
Let's start with a 20-nearest neighbors graph. The more the neighbors, the quicker the learning, and possibly worse results.

In [5]:
qr = QuickRecommender(n_neighbors=20)
qr.fit(X)

## Time for some recommendations
The first recommendations are literally random, so you can search the movies first and select your favorites to get more meaningful recommendations at first try.

In [6]:
# MovieDB cast search
query = input("Search cast: ")
["{}: {}".format(i, title_corpus[i]) for i in range(len(title_corpus)) if query in cast_corpus[i]][:20]

Search cast: Depp


['6100: Lost in La Mancha',
 '6421: Fear and Loathing in Las Vegas',
 '6487: Secret Window',
 '6644: Fantastic Beasts and Where to Find Them',
 '7845: The Tourist',
 '8098: Dead Man',
 '8313: Transcendence',
 '8499: Charlie and the Chocolate Factory',
 '8512: Cry-Baby',
 '8715: Donnie Brasco',
 '9020: The Ninth Gate',
 '9964: Pirates of the Caribbean: On Stranger Tides',
 '10039: Buy the Ticket, Take the Ride',
 '11157: Chocolat',
 '12490: Public Enemies',
 '12751: Mortdecai',
 '13170: Made for Each Other',
 '13220: Rango',
 '14380: Charlie: The Life and Art of Charles Chaplin',
 '14604: Private Resort']

In [7]:
# MovieDB search
query = input("Search movies: ")
["{}: {}".format(i, title_corpus[i]) for i in range(len(title_corpus)) if query in title_corpus[i]][:20]

Search movies: car


['269: Running Scared',
 '458: Scooby-Doo! Camp Scare',
 '589: The Scarecrow',
 '1467: Madagascar Skin',
 '1471: Scary Movie 3',
 '1547: Dr. Syn, Alias the Scarecrow',
 '1755: Scared Shrekless',
 "2155: The First Annual 'On Cinema' Oscar Special",
 '2480: Dimenticare Palermo',
 '2495: Sylvia Scarlett',
 '2779: Apex: The Story of the Hypercar',
 "2989: Ricardo O'Farrill: Abrazo Genial",
 '3497: Sicario',
 '4292: Maggie Simpson in The Longest Daycare',
 '4334: Vivien Leigh: Scarlett And Beyond',
 '4499: The Scarlet Clue',
 '5103: The Year of Living Vicariously',
 '5147: Madly Madagascar',
 '5259: Lisa Picard Is Famous',
 '5285: Bullet Scars']

In [8]:
selections = [9964, 12751, 13220, 8715, 8313, 1755, 5147]
for movie_idx in selections:
    print("Most similar items to {} are:".format(title_corpus[movie_idx]))
    for idx in list(qr.get_nn_graph().neighbors[movie_idx,1:6]):
        print("    {}: {}".format(idx, title_corpus[idx]))
my_user = qr.update(selections=selections)

Most similar items to Pirates of the Caribbean: On Stranger Tides are:
    18740: Pirates of the Caribbean: Dead Man's Chest
    17737: Pirates of the Caribbean: At World's End
    660: The Mummy: Tomb of the Dragon Emperor
    3246: The Wisdom of Crocodiles
    14044: The Brothers Grimm
Most similar items to Mortdecai are:
    10157: Amelia
    14894: Spymate
    15815: Spectre
    16993: Casanova
    849: Centurion
Most similar items to Rango are:
    7504: Paddington
    6534: Hugo
    12986: Andre
    12968: Trolls
    12136: MouseHunt
Most similar items to Donnie Brasco are:
    11533: Subconscious Cruelty
    317: Perfect Sisters
    3758: The Big I Am
    6713: Money for Nothing
    19106: Nick of Time
Most similar items to Transcendence are:
    19131: Elysium
    16146: Blackhat
    397: Residue
    14664: Taboo
    8274: Lunopolis
Most similar items to Scared Shrekless are:
    6085: Ernest Scared Stupid
    9456: Now You See Him, Now You Don't
    16617: Please Don't Eat the

In [9]:
recomms = qr.recommend(my_user, n_recommendations=20)
for movie_idx in list(recomms):
    print("{} : {} -- {}".format(movie_idx, title_corpus[movie_idx], genres_corpus[movie_idx]))

2743 : Spies of Warsaw -- Action Adventure Drama
18735 : Arthur Christmas -- Drama Animation Family Comedy
577 : Dimensions -- Drama ScienceFiction
18135 : Animal Kingdom -- Drama Thriller Crime
10157 : Amelia -- Adventure
9456 : Now You See Him, Now You Don't -- Comedy Family
1310 : Harry Potter and the Goblet of Fire -- Adventure Fantasy Family
6596 : Once Upon a Time in America -- Drama Crime
317 : Perfect Sisters -- Thriller Drama Crime
15247 : Madagascar 3: Europe's Most Wanted -- Animation Family
14708 : Don Verdean -- Comedy
9768 : Gus -- Comedy Family
4513 : Hitman -- Action Crime Drama Thriller
12636 : A Christmas Wish -- Family Drama Comedy
2927 : Jesse Stone: Sea Change -- TVMovie Drama Thriller Crime
6085 : Ernest Scared Stupid -- Horror Comedy Family
8274 : Lunopolis -- Thriller ScienceFiction Mystery
17737 : Pirates of the Caribbean: At World's End -- Adventure Fantasy Action
7666 : Limitless -- Thriller Mystery ScienceFiction
3375 : Christmas Mail -- Comedy Family


In [10]:
my_user = qr.update(my_user, selections=[15247, 17737])

In [11]:
recomms = qr.recommend(my_user, n_recommendations=20)
for movie_idx in list(recomms):
    print("{} : {} -- {}".format(movie_idx, title_corpus[movie_idx], genres_corpus[movie_idx]))

10907 : Advantageous -- ScienceFiction Drama Family
4577 : Utu -- War Adventure Drama History
15247 : Madagascar 3: Europe's Most Wanted -- Animation Family
14708 : Don Verdean -- Comedy
2217 : Shoot on Sight -- Crime Thriller Drama
16475 : In the Heart of the Sea -- Thriller Drama Adventure Action History
4773 : Grabbers -- ScienceFiction Comedy Thriller Horror
7570 : Malaya -- Adventure Drama
6596 : Once Upon a Time in America -- Drama Crime
5989 : Coraline -- Animation Family
8715 : Donnie Brasco -- Crime Drama Thriller
5843 : Kalamity -- Mystery Thriller
9456 : Now You See Him, Now You Don't -- Comedy Family
660 : The Mummy: Tomb of the Dragon Emperor -- Adventure Action Fantasy
865 : Marley & Me -- Comedy Family
18776 : Sgt. Bilko -- Comedy Family
8731 : Tangled Ever After -- Animation Comedy Action Family
8274 : Lunopolis -- Thriller ScienceFiction Mystery
14894 : Spymate -- Adventure Comedy
3193 : Jesse Stone: Night Passage -- TVMovie Drama Thriller Crime
