# Movie Recommendation System
This project is to build a movie recommendation system based on the movie data found here:

https://www.kaggle.com/rounakbanik/movie-recommender-systems/data

We will be building system by the following approach:
1. Simple rating-based
2. Content-based filtering
3. User-based collaborative filtering
4. Item-based collaborative filtering
5. Hybrid recommendation system that use content-based filtering and user-based collaborative filtering

#### Load and explore the data

In [280]:
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline


In [281]:
# read movie metadata
movie_md = pd.read_csv('./data/movies_metadata.csv', low_memory=False)


In [282]:
df = movie_md.copy()
df.shape


(45466, 24)

In [283]:
df.describe()

Unnamed: 0,revenue,runtime,vote_average,vote_count
count,45460.0,45203.0,45460.0,45460.0
mean,11209350.0,94.128199,5.618207,109.897338
std,64332250.0,38.40781,1.924216,491.310374
min,0.0,0.0,0.0,0.0
25%,0.0,85.0,5.0,3.0
50%,0.0,95.0,6.0,10.0
75%,0.0,107.0,6.8,34.0
max,2787965000.0,1256.0,10.0,14075.0


In [284]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
adult                    45466 non-null object
belongs_to_collection    4494 non-null object
budget                   45466 non-null object
genres                   45466 non-null object
homepage                 7782 non-null object
id                       45466 non-null object
imdb_id                  45449 non-null object
original_language        45455 non-null object
original_title           45466 non-null object
overview                 44512 non-null object
popularity               45461 non-null object
poster_path              45080 non-null object
production_companies     45463 non-null object
production_countries     45463 non-null object
release_date             45379 non-null object
revenue                  45460 non-null float64
runtime                  45203 non-null float64
spoken_languages         45460 non-null object
status                   45379 non-null objec

In [285]:
df.head(2)

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0


## Rating-based recommendation

In [286]:
from ast import literal_eval
def process_genres(df):
    '''
    process genres column using liter_eval to convert string to python object
    extract the genres info
    '''
    df['genres'] = df['genres'].fillna('[]').apply(literal_eval)
    # extract the 'name' value of genres
    df['genres'] = df['genres'].apply(lambda x: [i['name'] for i in x])
    
print('before processing genres: \n', df['genres'][0])
process_genres(df)
print('after processeding genres: \n', df['genres'][0])


before processing genres: 
 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
after processeding genres: 
 ['Animation', 'Comedy', 'Family']


In [287]:
# process release_date to show year only
df['year'] = df['release_date'].apply(lambda x: str(x)[:4] if x != 'NaN' else np.nan)
print('the release_data column:')
print(df['release_date'].head())
print()
print('the year column:')
print(df['year'].head())

the release_data column:
0    1995-10-30
1    1995-12-15
2    1995-12-22
3    1995-12-22
4    1995-02-10
Name: release_date, dtype: object

the year column:
0    1995
1    1995
2    1995
3    1995
4    1995
Name: year, dtype: object


In [288]:
def process_vote(df, vote_count_cutoff_percentile=0.95):
    '''
    calculate weighted rating instead of row rating
    weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

    Where: 
    R = average for the movie (mean) = (rating)
    v = number of votes for the movie = (votes)
    m = minimum votes required to be listed in the Top Rated list (default: 95 percentile of vote_count)
    C = the mean vote across the whole report
    '''
    df = df.dropna(subset=['vote_count', 'vote_average'])
    df.vote_count = df.vote_count.astype('int')
    df.vote_average = df.vote_average.astype('int')
    
    mean_vote_average = df.vote_average.mean()
    vote_count_cutoff = df.vote_count.quantile(vote_count_cutoff_percentile)
    df = df.loc[df['vote_count'] >= vote_count_cutoff]
    df['weighted_rating'] = (df.vote_average * df.vote_count/(df.vote_count + vote_count_cutoff)) + \
                            (mean_vote_average * vote_count_cutoff/(df.vote_count + vote_count_cutoff))
    df = df.sort_values('weighted_rating', ascending=False)
    return df

In [289]:
df = process_vote(df)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [290]:
FEATURES = ['title', 'year', 'vote_count', 'vote_average', 'popularity', \
            'genres', 'weighted_rating', 'tagline', 'overview', 'id']
df = df[FEATURES]


In [291]:
links = pd.read_csv('./data/links_small.csv')
df['id'] = df['id'].astype(int)
df = df.merge(links, left_on='id', right_on='tmdbId')
df.shape

(2005, 13)

### Filter top rated movies by genres

In [292]:

def get_tops_by_genres(df, *genres, intersect=True, top=10):
    if not genres:
        return df.head(top)
    elif not intersect:
        return df[df['genres'].apply(lambda x: not set(genres).isdisjoint(x))].head(top)
    else:
        return df[df['genres'].apply(lambda x: set(genres).issubset(x))].head(top)

In [293]:
get_tops_by_genres(df, 'Family').head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId
11,Back to the Future,1985,6239,8,25.778509,"[Adventure, Comedy, Science Fiction, Family]",7.820813,He's the only kid ever to get into trouble bef...,Eighties teenager Marty McFly is accidentally ...,105,1270,88763,105.0
16,The Lion King,1994,5520,8,21.605761,"[Family, Animation, Drama]",7.799175,Life's greatest adventure is finding your plac...,A young lion cub named Simba can't wait to be ...,8587,364,110357,8587.0
26,Spirited Away,2001,3968,8,41.048867,"[Fantasy, Adventure, Animation, Family]",7.72837,The tunnel led Chihiro to a mysterious town...,A ten year old girl who wanders away from her ...,129,5618,245429,129.0
49,My Neighbor Totoro,1988,1730,8,13.507299,"[Fantasy, Animation, Family]",7.447452,These strange creatures still exist in Japan. ...,Two sisters move to the country with their fat...,8392,5971,96283,8392.0
56,It's a Wonderful Life,1946,1103,8,15.031588,"[Drama, Family, Fantasy]",7.222046,It's a wonderful laugh! It's a wonderful love!,George Bailey has spent his entire life giving...,1585,953,38650,1585.0


In [294]:
get_tops_by_genres(df, 'Family', 'Animation').head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId
16,The Lion King,1994,5520,8,21.605761,"[Family, Animation, Drama]",7.799175,Life's greatest adventure is finding your plac...,A young lion cub named Simba can't wait to be ...,8587,364,110357,8587.0
26,Spirited Away,2001,3968,8,41.048867,"[Fantasy, Adventure, Animation, Family]",7.72837,The tunnel led Chihiro to a mysterious town...,A ten year old girl who wanders away from her ...,129,5618,245429,129.0
49,My Neighbor Totoro,1988,1730,8,13.507299,"[Fantasy, Animation, Family]",7.447452,These strange creatures still exist in Japan. ...,Two sisters move to the country with their fat...,8392,5971,96283,8392.0
63,Paperman,2012,734,8,7.198633,"[Animation, Family, Romance]",6.976272,"Delicate, charming and sweet.",An urban office worker finds that paper airpla...,140420,98491,2388725,140420.0
83,Up,2009,7048,7,19.330884,"[Animation, Comedy, Family, Adventure]",6.898194,,Carl Fredricksen spent his entire life dreamin...,14160,68954,1049413,14160.0


In [295]:
get_tops_by_genres(df, 'Family', 'Animation', intersect=True).head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId
16,The Lion King,1994,5520,8,21.605761,"[Family, Animation, Drama]",7.799175,Life's greatest adventure is finding your plac...,A young lion cub named Simba can't wait to be ...,8587,364,110357,8587.0
26,Spirited Away,2001,3968,8,41.048867,"[Fantasy, Adventure, Animation, Family]",7.72837,The tunnel led Chihiro to a mysterious town...,A ten year old girl who wanders away from her ...,129,5618,245429,129.0
49,My Neighbor Totoro,1988,1730,8,13.507299,"[Fantasy, Animation, Family]",7.447452,These strange creatures still exist in Japan. ...,Two sisters move to the country with their fat...,8392,5971,96283,8392.0
63,Paperman,2012,734,8,7.198633,"[Animation, Family, Romance]",6.976272,"Delicate, charming and sweet.",An urban office worker finds that paper airpla...,140420,98491,2388725,140420.0
83,Up,2009,7048,7,19.330884,"[Animation, Comedy, Family, Adventure]",6.898194,,Carl Fredricksen spent his entire life dreamin...,14160,68954,1049413,14160.0


### Filter top rated movies by year

In [296]:
def get_tops_by_year(df, year, top=10):
    return df[df.year == str(year)].head(top)

In [297]:
get_tops_by_year(df,2013).head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId
85,The Wolf of Wall Street,2013,6768,7,16.382422,"[Crime, Drama, Comedy]",6.894236,EARN. SPEND. PARTY.,A New York stockbroker refuses to cooperate in...,106646,106782,993846,106646.0
87,The Hunger Games: Catching Fire,2013,6656,7,25.309139,"[Adventure, Action, Science Fiction]",6.892565,Every revolution begins with a spark.,Katniss Everdeen has returned home safe after ...,101299,106487,1951264,101299.0
107,Gravity,2013,5879,7,18.50194,"[Science Fiction, Thriller, Drama]",6.879342,Don't Let Go,"Dr. Ryan Stone, a brilliant medical engineer o...",49047,104841,1454468,49047.0
110,Now You See Me,2013,5635,7,17.852022,"[Thriller, Crime]",6.874491,4 amazing magicians. 3 impossible heists. 1 bi...,An FBI agent and an Interpol detective track a...,75656,102903,1670345,75656.0
115,Frozen,2013,5440,7,24.248243,"[Animation, Adventure, Family]",6.870324,Only the act of true love will thaw a frozen h...,Young princess Anna of Arendelle dreams about ...,109445,106696,2294629,109445.0


## Content-based filtering

### Use tagline and overview columns to build cosine similarity matrix

In [298]:
df.tagline.fillna('', inplace=True)
df.overview.fillna('', inplace=True)

In [299]:
df['description'] = df.overview + df.tagline
df['description'].head(1).values

array(['Cobb, a skilled thief who commits corporate espionage by infiltrating the subconscious of his targets is offered a chance to regain his old life as payment for a task considered to be impossible: "inception", the implantation of another person\'s idea into a target\'s subconscious.Your mind is the scene of the crime.'],
      dtype=object)

In [300]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
def cal_similarity_matrix(data):    
    tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
    tfidf_matrix = tf.fit_transform(data)
    cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
    return cosine_sim

In [301]:
cosine_sim_description = cal_similarity_matrix(df['description'])


In [302]:
title_to_idx = {title: idx for title, idx in zip(df.title, range(df.shape[0]))}

#### Get recommendations by movie title

In [303]:
def get_recommendation_by_title(df, title, cosine_sim, top=10):
    idx = title_to_idx[title]
    scores = sorted(list(enumerate(cosine_sim[idx])), key=lambda x: x[1], reverse=True)
    movie_indices = list(map(lambda x: x[0], scores[1:top+1]))
    return df.iloc[movie_indices].sort_values('weighted_rating', ascending=False)
    

In [304]:
get_recommendation_by_title(df, 'Toy Story', cosine_sim_description).head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId,description
6,The Shawshank Redemption,1994,8358,8,51.645403,"[Drama, Crime]",7.864,Fear can hold you prisoner. Hope can set you f...,Framed in the 1940s for the double murder of h...,278,318,111161,278.0,Framed in the 1940s for the double murder of h...
133,Toy Story 3,2010,4710,7,16.96647,"[Animation, Family, Comedy]",6.851922,No toy gets left behind.,"Woody, Buzz, and the rest of Andy's toys haven...",10193,78499,435761,10193.0,"Woody, Buzz, and the rest of Andy's toys haven..."
168,Toy Story 2,1999,3914,7,17.547693,"[Animation, Comedy, Family]",6.824813,The toys are back!,"Andy heads off to Cowboy Camp, leaving his toy...",863,3114,120363,863.0,"Andy heads off to Cowboy Camp, leaving his toy..."
202,The Devil Wears Prada,2006,3198,7,13.102384,"[Comedy, Drama, Romance]",6.790277,Meet Andy Sachs. A million girls would kill to...,The Devil Wears Prada is about a young journal...,350,45720,458352,350.0,The Devil Wears Prada is about a young journal...
312,Pretty Woman,1990,1807,7,13.348451,"[Romance, Comedy]",6.6601,Who knew it was so much fun to be a hooker?,When millionaire wheeler-dealer Edward Lewis e...,114,597,100405,114.0,When millionaire wheeler-dealer Edward Lewis e...


In [305]:
get_recommendation_by_title(df, 'The Dark Knight', cosine_sim_description).head()

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId,description
70,The Dark Knight Rises,2012,9263,7,20.58258,"[Action, Crime, Drama, Thriller]",6.921448,The Legend Ends,Following the death of District Attorney Harve...,49026,91529,1345836,49026.0,Following the death of District Attorney Harve...
78,Batman Begins,2005,7511,7,28.505341,"[Action, Crime, Drama]",6.904127,Evil fears the knight.,"Driven by tragedy, billionaire Bruce Wayne ded...",272,33794,372784,272.0,"Driven by tragedy, billionaire Bruce Wayne ded..."
161,Sherlock Holmes: A Game of Shadows,2011,3971,7,18.695329,"[Adventure, Action, Crime, Mystery]",6.827079,The game is afoot.,There is a new criminal mastermind at large (P...,58574,91542,1515091,58574.0,There is a new criminal mastermind at large (P...
269,Batman,1989,2145,7,19.10673,"[Fantasy, Action]",6.704647,Have you ever danced with the devil in the pal...,The Dark Knight of Gotham City begins his war ...,268,592,96895,268.0,The Dark Knight of Gotham City begins his war ...
356,Law Abiding Citizen,2009,1522,7,16.639047,"[Drama, Crime, Thriller]",6.610575,The System Must Pay.,A frustrated man decides to take justice into ...,22803,71838,1197624,22803.0,A frustrated man decides to take justice into ...


### Use tagline, overview, keywords, director, and actors to build cosine similarity matrix

In [306]:
# add more features to calculate similarity between movies
credits = pd.read_csv('./data/credits.csv')
keywords = pd.read_csv('./data/keywords.csv')

# remove duplicates
credits = credits.drop_duplicates(subset='id')
keywords = keywords.drop_duplicates(subset='id')

In [307]:

df = df.merge(credits, on='id')
df = df.merge(keywords, on='id')

In [308]:
# extract names of the top 3 actors from the cast column
df['cast'] = df['cast'].fillna('[]').apply(lambda x: [str.lower(i['name'].replace(" ","")) for i in literal_eval(x)][:3])

In [309]:
def get_director(x):
    '''
    extract director's name from crew column of x
    '''
    for i in x:
        if i['job'] == 'Director':
            return str.lower(i['name'].replace(' ', ''))
    return np.nan


In [310]:
df['director'] = df['crew'].fillna('[]').apply(lambda x: get_director(literal_eval(x)))

In [311]:
# process keywords
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer('english')
df['keywords'] = df['keywords'].fillna('[]').apply(lambda x: [str.lower(stemmer.stem(i['name'].replace(' ', ''))) for i in literal_eval(x)])

In [312]:
df.head()[['cast', 'director', 'keywords']]

Unnamed: 0,cast,director,keywords
0,"[leonardodicaprio, josephgordon-levitt, ellenp...",christophernolan,"[lossoflov, dream, kidnap, sleep, subconsci, h..."
1,"[christianbale, michaelcaine, heathledger]",christophernolan,"[dccomic, crimefight, secretident, scarecrow, ..."
2,"[matthewmcconaughey, jessicachastain, annehath...",christophernolan,"[savingtheworld, artificialintellig, fatherson..."
3,"[edwardnorton, bradpitt, meatloaf]",davidfincher,"[supportgroup, dualident, nihil, rageandh, ins..."
4,"[elijahwood, ianmckellen, cateblanchett]",peterjackson,"[elv, dwarv, orc, middle-earth(tolkien), hobbi..."


In [313]:
# use title, genres, top 3 actors, director (* 3 to make director a more significant factor), 
# and keyword to calculate similarity
df['mixed_credits'] = df['title'].apply(lambda x: [x]) + df['genres'] + df['cast'] + df['director'].apply(lambda x: [x]) * 3 + df['keywords']
print(df['mixed_credits'].head())
df['mixed_credits'] = df['mixed_credits'].apply(lambda x: ' '.join(x))
print(df['mixed_credits'].head())

0    [Inception, Action, Thriller, Science Fiction,...
1    [The Dark Knight, Drama, Action, Crime, Thrill...
2    [Interstellar, Adventure, Drama, Science Ficti...
3    [Fight Club, Drama, edwardnorton, bradpitt, me...
4    [The Lord of the Rings: The Fellowship of the ...
Name: mixed_credits, dtype: object
0    Inception Action Thriller Science Fiction Myst...
1    The Dark Knight Drama Action Crime Thriller ch...
2    Interstellar Adventure Drama Science Fiction m...
3    Fight Club Drama edwardnorton bradpitt meatloa...
4    The Lord of the Rings: The Fellowship of the R...
Name: mixed_credits, dtype: object


In [314]:
# tf2 = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
# tfidf_matrix2 = tf2.fit_transform(df['mixed_credits'])
# tfidf_matrix2.shape

In [315]:
cosine_sim_mixed = cal_similarity_matrix(df['mixed_credits'])

#### Get recommendations by title

In [316]:
get_recommendation_by_title(df, 'The Dark Knight', cosine_sim_mixed)

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId,description,cast,crew,keywords,director,mixed_credits
0,Inception,2010,14075,8,29.108149,"[Action, Thriller, Science Fiction, Mystery, A...",7.917588,Your mind is the scene of the crime.,"Cobb, a skilled thief who commits corporate es...",27205,79132,1375666,27205.0,"Cobb, a skilled thief who commits corporate es...","[leonardodicaprio, josephgordon-levitt, ellenp...","[{'credit_id': '56e8462cc3a368408400354c', 'de...","[lossoflov, dream, kidnap, sleep, subconsci, h...",christophernolan,Inception Action Thriller Science Fiction Myst...
2,Interstellar,2014,11187,8,32.213481,"[Adventure, Drama, Science Fiction]",7.897107,Mankind was born on Earth. It was never meant ...,Interstellar chronicles the adventures of a gr...,157336,109487,816692,157336.0,Interstellar chronicles the adventures of a gr...,"[matthewmcconaughey, jessicachastain, annehath...","[{'credit_id': '54cba75b925141678e014d1a', 'de...","[savingtheworld, artificialintellig, fatherson...",christophernolan,Interstellar Adventure Drama Science Fiction m...
20,The Prestige,2006,4510,8,16.94556,"[Drama, Mystery, Thriller]",7.758148,Are You Watching Closely?,A mysterious story of two magicians whose inte...,1124,48780,482571,1124.0,A mysterious story of two magicians whose inte...,"[hughjackman, christianbale, michaelcaine]","[{'credit_id': '52fe42e8c3a36847f802bef9', 'de...","[competit, secret, obsess, magic, dyinganddeat...",christophernolan,The Prestige Drama Mystery Thriller hughjackma...
24,Memento,2000,4168,8,15.450789,"[Mystery, Thriller]",7.740175,Some memories are best forgotten.,Suffering short-term memory loss after a head ...,77,4226,209144,77.0,Suffering short-term memory loss after a head ...,"[guypearce, carrie-annemoss, joepantoliano]","[{'credit_id': '52fe4214c3a36847f80024cb', 'de...","[individu, insulin, tattoo, waitress, amnesia,...",christophernolan,Memento Mystery Thriller guypearce carrie-anne...
70,The Dark Knight Rises,2012,9263,7,20.58258,"[Action, Crime, Drama, Thriller]",6.921448,The Legend Ends,Following the death of District Attorney Harve...,49026,91529,1345836,49026.0,Following the death of District Attorney Harve...,"[christianbale, michaelcaine, garyoldman]","[{'credit_id': '52fe4781c3a36847f81398c3', 'de...","[dccomic, crimefight, terrorist, secretident, ...",christophernolan,The Dark Knight Rises Action Crime Drama Thril...
78,Batman Begins,2005,7511,7,28.505341,"[Action, Crime, Drama]",6.904127,Evil fears the knight.,"Driven by tragedy, billionaire Bruce Wayne ded...",272,33794,372784,272.0,"Driven by tragedy, billionaire Bruce Wayne ded...","[christianbale, michaelcaine, liamneeson]","[{'credit_id': '52fe4230c3a36847f800ac6d', 'de...","[himalaya, martialart, dccomic, crimefight, se...",christophernolan,Batman Begins Action Crime Drama christianbale...
716,Batman: Under the Red Hood,2010,459,7,7.039325,"[Action, Animation]",6.147016,Dare to Look Beneath the Hood.,Batman faces his ultimate challenge as the mys...,40662,79274,1569923,40662.0,Batman faces his ultimate challenge as the mys...,"[brucegreenwood, jensenackles, neilpatrickharris]","[{'credit_id': '589f8b1ac3a3684fe40031cb', 'de...","[martialart, dccomic, vigilant, joker, superhe...",brandonvietti,Batman: Under the Red Hood Action Animation br...
957,Batman Returns,1992,1706,6,15.001681,"[Action, Fantasy]",5.846862,"The Bat, the Cat, the Penguin.","Having defeated the Joker, Batman now faces th...",364,1377,103776,364.0,"Having defeated the Joker, Batman now faces th...","[michaelkeaton, dannydevito, michellepfeiffer]","[{'credit_id': '52fe423cc3a36847f800e513', 'de...","[holiday, corrupt, doublelif, dccomic, crimefi...",timburton,Batman Returns Action Fantasy michaelkeaton da...
1105,Insomnia,2002,1181,6,11.424974,"[Crime, Mystery, Thriller]",5.797081,A tough cop. A brilliant killer. An unspeakabl...,Two Los Angeles homicide detectives are dispat...,320,5388,278504,320.0,Two Los Angeles homicide detectives are dispat...,"[alpacino, robinwilliams, hilaryswank]","[{'credit_id': '52fe4237c3a36847f800ced5', 'de...","[detect, confess, fbi, homicid, blackmail, sus...",christophernolan,Insomnia Crime Mystery Thriller alpacino robin...
2000,Batman & Robin,1997,1447,4,17.038824,"[Action, Crime, Fantasy]",4.287233,Strength. Courage. Honor. And loyalty.,Along with crime-fighting partner Robin and ne...,415,1562,118688,415.0,Along with crime-fighting partner Robin and ne...,"[georgeclooney, chriso'donnell, arnoldschwarze...","[{'credit_id': '59b66a169251417cbc011ec4', 'de...","[doublelif, dccomic, dualident, crimefight, fi...",joelschumacher,Batman & Robin Action Crime Fantasy georgecloo...


## Collaborative filtering

In [317]:
##### Collaborative filtering
from surprise import Reader, Dataset, SVD, evaluate, NormalPredictor, KNNBasic
from surprise.model_selection import cross_validate
import heapq
from collections import defaultdict
from operator import itemgetter

In [318]:
test_user_id = '85'


In [319]:
ratings = pd.read_csv('./data/ratings_small.csv')

In [320]:
print(ratings.shape)
# ratings_small = ratings.sample(frac=0.1, random_state=1)


(100004, 4)


In [321]:
reader = Reader(line_format='user item rating timestamp', sep=',', skip_lines=1)
# data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
data = Dataset.load_from_file('./data/ratings_small.csv', reader=reader)
# data.split(n_folds=5)
train_set = data.build_full_trainset()

In [324]:
def collaborative_filtering(user_id, train_set, user_based=True, k=10):
    sim_options = {'name': 'cosine', 'user_based': user_based}
    model = KNNBasic(sim_options=sim_options)
    model.fit(train_set)
    sims_matrix = model.compute_similarities()
    # Get top N similar users to our test subject
    test_user_inner_id = train_set.to_inner_uid(test_user_id)
    
    if user_based:
        similarity_row = sims_matrix[test_user_inner_id]
        similar_users = []
        for inner_id, score in enumerate(similarity_row):
            if (inner_id != test_user_id):
                similar_users.append((inner_id, score))


        kNeighbors = heapq.nlargest(k, similar_users, key=lambda t: t[1])
    else:
        test_user_ratings = train_set.ur[test_user_inner_id]
        kNeighbors = heapq.nlargest(k, test_user_ratings, key=lambda t: t[1])
    
    # Get the stuff they rated, and add up ratings for each item, weighted by user similarity
    candidates = defaultdict(float)
    
    if user_based:
        for similar_user in kNeighbors:
            inner_id = similar_user[0]
            user_similarity_score = similar_user[1]
            their_ratings = train_set.ur[inner_id]
            for rating in their_ratings:
                candidates[rating[0]] += (rating[1] / 5.0) * user_similarity_score
    else:
        for item_id, rating in kNeighbors:
            similarity_row = sims_matrix[item_id]
            for inner_id, score in enumerate(similarity_row):
                candidates[inner_id] += score * (rating / 5.0)

    # Build a dictionary of stuff the user has already seen
    watched = {}
    for item_id, rating in train_set.ur[test_user_inner_id]:
        watched[item_id] = 1


    movie_ids = []
    ratings = []
    for item_id, rating_sum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
        if not item_id in watched:
            movie_id = train_set.to_raw_iid(item_id)
            movie_ids.append(int(movie_id))
            ratings.append(rating_sum)
    
    res = pd.DataFrame({'movieId': movie_ids, 'ratings': ratings})
    df[df['title'] == 'Toy Story']
    res = res.merge(df)
    return res.head(10)



### User-based collaborative filtering

In [325]:
collaborative_filtering('85', train_set)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.


Unnamed: 0,movieId,ratings,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,imdbId,tmdbId,description,cast,crew,keywords,director,mixed_credits
0,79132,3.3,Inception,2010,14075,8,29.108149,"[Action, Thriller, Science Fiction, Mystery, A...",7.917588,Your mind is the scene of the crime.,"Cobb, a skilled thief who commits corporate es...",27205,1375666,27205.0,"Cobb, a skilled thief who commits corporate es...","[leonardodicaprio, josephgordon-levitt, ellenp...","[{'credit_id': '56e8462cc3a368408400354c', 'de...","[lossoflov, dream, kidnap, sleep, subconsci, h...",christophernolan,Inception Action Thriller Science Fiction Myst...
1,1196,2.4,The Empire Strikes Back,1980,5998,8,19.470959,"[Adventure, Action, Science Fiction]",7.814099,The Adventure Continues...,"The epic saga continues as Luke Skywalker, in ...",1891,80684,1891.0,"The epic saga continues as Luke Skywalker, in ...","[markhamill, harrisonford, carriefisher]","[{'credit_id': '566e19f292514169e200d46f', 'de...","[rebel, android, asteroid, spacebattl, snowsto...",irvinkershner,The Empire Strikes Back Adventure Action Scien...
2,3996,2.0,"Crouching Tiger, Hidden Dragon",2000,949,7,19.944953,"[Adventure, Drama, Action, Romance]",6.44923,"A timeless story of strength, secrets and two ...",Two warriors in pursuit of a stolen sword and ...,146,190332,146.0,Two warriors in pursuit of a stolen sword and ...,"[chowyun-fat, michelleyeoh, zhangziyi]","[{'credit_id': '52fe421ec3a36847f80055ed', 'de...","[fli, martialart, taskmast, comb, tiger, deser...",anglee,"Crouching Tiger, Hidden Dragon Adventure Drama..."
3,58559,2.0,The Dark Knight,2008,12269,8,123.167259,"[Drama, Action, Crime, Thriller]",7.905871,Why So Serious?,Batman raises the stakes in his war on crime. ...,155,468569,155.0,Batman raises the stakes in his war on crime. ...,"[christianbale, michaelcaine, heathledger]","[{'credit_id': '55a0eb4a925141296b0010f8', 'de...","[dccomic, crimefight, secretident, scarecrow, ...",christophernolan,The Dark Knight Drama Action Crime Thriller ch...
4,1201,1.9,"The Good, the Bad and the Ugly",1966,2371,8,16.788787,[Western],7.57372,For three men the Civil War wasn't hell. It wa...,While the Civil War rages between the Union an...,429,60196,429.0,While the Civil War rages between the Union an...,"[eliwallach, clinteastwood, leevancleef]","[{'credit_id': '52fe4242c3a36847f80105d3', 'de...","[bountyhunt, refuge, gold, antihero, gallow, h...",sergioleone,"The Good, the Bad and the Ugly Western eliwall..."
5,48516,1.9,The Departed,2006,4455,7,18.515448,"[Drama, Thriller, Crime]",6.844198,Lies. Betrayal. Sacrifice. How far will you ta...,"To take down South Boston's Irish Mafia, the p...",1422,407887,1422.0,"To take down South Boston's Irish Mafia, the p...","[leonardodicaprio, mattdamon, jacknicholson]","[{'credit_id': '52fe42f5c3a36847f802fed5', 'de...","[undercov, boston, polic, friend, mafia, under...",martinscorsese,The Departed Drama Thriller Crime leonardodica...
6,91529,1.9,The Dark Knight Rises,2012,9263,7,20.58258,"[Action, Crime, Drama, Thriller]",6.921448,The Legend Ends,Following the death of District Attorney Harve...,49026,1345836,49026.0,Following the death of District Attorney Harve...,"[christianbale, michaelcaine, garyoldman]","[{'credit_id': '52fe4781c3a36847f81398c3', 'de...","[dccomic, crimefight, terrorist, secretident, ...",christophernolan,The Dark Knight Rises Action Crime Drama Thril...
7,1270,1.9,Back to the Future,1985,6239,8,25.778509,"[Adventure, Comedy, Science Fiction, Family]",7.820813,He's the only kid ever to get into trouble bef...,Eighties teenager Marty McFly is accidentally ...,105,88763,105.0,Eighties teenager Marty McFly is accidentally ...,"[michaelj.fox, christopherlloyd, leathompson]","[{'credit_id': '52fe4218c3a36847f80039c7', 'de...","[clocktow, carrac, terrorist, delorean, lightn...",robertzemeckis,Back to the Future Adventure Comedy Science Fi...
8,104841,1.8,Gravity,2013,5879,7,18.50194,"[Science Fiction, Thriller, Drama]",6.879342,Don't Let Go,"Dr. Ryan Stone, a brilliant medical engineer o...",49047,1454468,49047.0,"Dr. Ryan Stone, a brilliant medical engineer o...","[sandrabullock, georgeclooney, edharris]","[{'credit_id': '52fe4783c3a36847f8139de3', 'de...","[spacemiss, loss, space, astronaut, trappedins...",alfonsocuarón,Gravity Science Fiction Thriller Drama sandrab...
9,2959,1.8,Fight Club,1999,9678,8,63.869599,[Drama],7.881753,Mischief. Mayhem. Soap.,A ticking-time-bomb insomniac and a slippery s...,550,137523,550.0,A ticking-time-bomb insomniac and a slippery s...,"[edwardnorton, bradpitt, meatloaf]","[{'credit_id': '55731b8192514111610027d7', 'de...","[supportgroup, dualident, nihil, rageandh, ins...",davidfincher,Fight Club Drama edwardnorton bradpitt meatloa...


### Item-based collaborative filtering

In [326]:
collaborative_filtering('85', train_set, False)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.


Unnamed: 0,movieId,ratings,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,imdbId,tmdbId,description,cast,crew,keywords,director,mixed_credits
0,1665,9.87691,Bean,1997,602,6,12.799853,[Comedy],5.683673,One Man. One Masterpiece. One Very Big Mistake.,Bean works as a caretaker at Britain's formida...,1281,118689,1281.0,Bean works as a caretaker at Britain's formida...,"[rowanatkinson, petermacnicol, johnmills]","[{'credit_id': '52fe42edc3a36847f802d79b', 'de...","[pari, londonengland, california, airport, vau...",melsmith,Bean Comedy rowanatkinson petermacnicol johnmi...
1,1237,9.800141,The Seventh Seal,1957,569,7,6.47828,"[Fantasy, Drama]",6.240563,,When disillusioned Swedish knight Antonius Blo...,490,50976,490.0,When disillusioned Swedish knight Antonius Blo...,"[maxvonsydow, gunnarbjörnstrand, bengtekerot]","[{'credit_id': '52fe4249c3a36847f80127a7', 'de...","[chess, countrysid, witch, blacksmith, allegor...",ingmarbergman,The Seventh Seal Fantasy Drama maxvonsydow gun...
2,3105,9.788161,Awakenings,1990,568,7,13.201595,[Drama],6.239806,There is no such thing as a simple miracle.,"Dr. Malcolm Sayer, a shy research physician, u...",11005,99077,11005.0,"Dr. Malcolm Sayer, a shy research physician, u...","[robertdeniro, robinwilliams, johnheard]","[{'credit_id': '58c99d86c3a3685c47000263', 'de...","[coma, basedonnovel, miracl, frustrat, hope, b...",pennymarshall,Awakenings Drama robertdeniro robinwilliams jo...
3,2058,9.782882,The Negotiator,1998,593,6,7.65502,"[Action, Adventure, Crime, Drama, Mystery, Thr...",5.680901,He frees hostages for a living. Now he's takin...,The police try to arrest expert hostage negoti...,9631,120768,9631.0,The police try to arrest expert hostage negoti...,"[samuell.jackson, kevinspacey, davidmorse]","[{'credit_id': '52fe4514c3a36847f80bb201', 'de...","[corrupt, hostag, pension, innoc, polic, hosta...",f.garygray,The Negotiator Action Adventure Crime Drama My...
4,4005,9.773931,The Living Daylights,1987,447,6,12.189038,"[Action, Adventure, Thriller]",5.628019,Licensed to thrill.,James Bond helps a Russian General escape into...,708,93428,708.0,James Bond helps a Russian General escape into...,"[timothydalton, maryamd'abo, jeroenkrabbé]","[{'credit_id': '52fe426ec3a36847f801df5f', 'de...","[londonengland, smugglingofarm, prison, englan...",johnglen,The Living Daylights Action Adventure Thriller...
5,3252,9.773181,Scent of a Woman,1992,763,7,14.505987,[Drama],6.363647,Col. Frank Slade has a very special plan for t...,Charlie Simms (Chris O'Donnell) is a student a...,9475,105323,9475.0,Charlie Simms (Chris O'Donnell) is a student a...,"[alpacino, chriso'donnell, jamesrebhorn]","[{'credit_id': '52fe44fcc3a36847f80b5e55', 'de...","[suicideattempt, blindnessandimpairedvis, than...",martinbrest,Scent of a Woman Drama alpacino chriso'donnell...
6,1994,9.769713,Poltergeist,1982,811,7,11.776326,[Horror],6.388181,They're here.,"Steve Freeling lives with his wife, Diane, and...",609,84516,609.0,"Steve Freeling lives with his wife, Diane, and...","[craigt.nelson, jobethwilliams, beatricestraight]","[{'credit_id': '52fe425dc3a36847f8018967', 'de...","[parentchildrelationship, medium, ghostbust, p...",tobehooper,Poltergeist Horror craigt.nelson jobethwilliam...
7,1249,9.765301,La Femme Nikita,1990,511,7,6.586401,"[Action, Thriller]",6.193953,She murders. So she can live.,"A beautiful felon, sentenced to life in prison...",9322,100263,9322.0,"A beautiful felon, sentenced to life in prison...","[anneparillaud, marcduret, patrickfontana]","[{'credit_id': '52fe44e7c3a36847f80b0f97', 'de...","[secretident, specialunit, romanc, governmenta...",lucbesson,La Femme Nikita Action Thriller anneparillaud ...
8,628,9.760015,Primal Fear,1996,644,7,10.326213,"[Crime, Drama, Mystery, Thriller]",6.2934,"Sooner or later, a man who wears two faces for...","An arrogant, high-powered attorney takes on th...",1592,117381,1592.0,"An arrogant, high-powered attorney takes on th...","[richardgere, edwardnorton, lauralinney]","[{'credit_id': '52fe4302c3a36847f80339a1', 'de...","[corrupt, bishop, courtcas, pornographicvideo,...",gregoryhoblit,Primal Fear Crime Drama Mystery Thriller richa...
9,2194,9.745433,The Untouchables,1987,1424,7,11.062203,"[Crime, Drama, History, Thriller]",6.590035,What are you prepared to do?,Young Treasury Agent Elliot Ness arrives in Ch...,117,94226,117.0,Young Treasury Agent Elliot Ness arrives in Ch...,"[kevincostner, seanconnery, charlesmartinsmith]","[{'credit_id': '52fe421ac3a36847f8004273', 'de...","[whitesuit, alcapon, toughcop, treasuryag, unt...",briandepalma,The Untouchables Crime Drama History Thriller ...


In [327]:
##### use SVD to do user-based collaborative filtering
svd = SVD()
cross_validate(svd, data, measures=['RMSE', 'MAE'])

{'test_rmse': array([0.90089579, 0.89102355, 0.90160724, 0.89512355, 0.89324352]),
 'test_mae': array([0.69266441, 0.68752865, 0.69412094, 0.69188095, 0.68685786]),
 'fit_time': (6.753072023391724,
  6.590444803237915,
  6.888653993606567,
  6.6858978271484375,
  6.8308258056640625),
 'test_time': (0.21761465072631836,
  0.4660532474517822,
  0.2733309268951416,
  0.2105391025543213,
  0.20268607139587402)}

In [328]:
svd.fit(data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a2ff0eac8>

In [329]:
svd.predict(1, 302)

Prediction(uid=1, iid=302, r_ui=None, est=3.543608255669773, details={'was_impossible': False})

In [330]:
svd.predict(1, 1029)

Prediction(uid=1, iid=1029, r_ui=None, est=3.543608255669773, details={'was_impossible': False})

## Hybrid recommendation system
1. Use content-based filtering to identify top-rated movies according to list of movies a user select
2. For each moive, use trained svd to predict the ratings for that user
3. Sort the movies according to the estimated rating and return the top 10


In [331]:
def hybrid(df, u_id, movies):
    cosine_sim = cal_similarity_matrix(df['mixed_credits'])
    recommended_indices = set([])
    for movie in movies:
        rec_movies = get_recommendation_by_title(df, movie, cosine_sim, 10)
        for movie_id in rec_movies['id']:
            recommended_indices.add(movie_id)
    recommended = df[np.isin(df['id'], list(recommended_indices))]
    recommended['est'] = recommended['id'].apply(lambda x: svd.predict(u_id, x).est)
    recommended = recommended.sort_values('est', ascending=False)
    return recommended[['title', 'year', 'est', 'weighted_rating']]

In [332]:
hybrid(df, 1, ['Avatar'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,title,year,est,weighted_rating
77,Titanic,1997,3.543608,6.907153
141,Alien,1979,3.543608,6.847596
146,Star Trek Into Darkness,2013,3.543608,6.844959
152,Terminator 2: Judgment Day,1991,3.543608,6.838208
155,The Terminator,1984,3.543608,6.835908
197,Aliens,1986,3.543608,6.795018
474,Treasure Planet,2002,3.543608,6.461305
537,The Abyss,1989,3.543608,6.393539
968,Alien³,1992,3.543608,5.843797
1115,True Lies,1994,3.543608,5.79153


In [333]:
hybrid(df, 300, ['Avatar'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,title,year,est,weighted_rating
77,Titanic,1997,3.543608,6.907153
141,Alien,1979,3.543608,6.847596
146,Star Trek Into Darkness,2013,3.543608,6.844959
152,Terminator 2: Judgment Day,1991,3.543608,6.838208
155,The Terminator,1984,3.543608,6.835908
197,Aliens,1986,3.543608,6.795018
474,Treasure Planet,2002,3.543608,6.461305
537,The Abyss,1989,3.543608,6.393539
968,Alien³,1992,3.543608,5.843797
1115,True Lies,1994,3.543608,5.79153


In [334]:
hybrid(df, 2, ['Toy Story'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,title,year,est,weighted_rating
97,"Monsters, Inc.",2001,3.543608,6.884308
133,Toy Story 3,2010,3.543608,6.851922
168,Toy Story 2,1999,3.543608,6.824813
210,The Lego Movie,2014,3.543608,6.786095
265,Hugo,2011,3.543608,6.710485
782,Cars,2006,3.543608,5.92594
862,A Bug's Life,1998,3.543608,5.8835
1167,Big,1988,3.543608,5.774921
1221,Monster House,2006,3.543608,5.756527
1934,Cars 2,2011,3.543608,5.042143


In [336]:
hybrid(df, 3, ['Toy Story']).head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,title,year,est,weighted_rating
97,"Monsters, Inc.",2001,3.543608,6.884308
133,Toy Story 3,2010,3.543608,6.851922
168,Toy Story 2,1999,3.543608,6.824813
210,The Lego Movie,2014,3.543608,6.786095
265,Hugo,2011,3.543608,6.710485
782,Cars,2006,3.543608,5.92594
862,A Bug's Life,1998,3.543608,5.8835
1167,Big,1988,3.543608,5.774921
1221,Monster House,2006,3.543608,5.756527
1934,Cars 2,2011,3.543608,5.042143


In [337]:
get_recommendation_by_title(df, 'Toy Story', cosine_sim_mixed)

Unnamed: 0,title,year,vote_count,vote_average,popularity,genres,weighted_rating,tagline,overview,id,movieId,imdbId,tmdbId,description,cast,crew,keywords,director,mixed_credits
97,"Monsters, Inc.",2001,6150,7,26.419962,"[Animation, Comedy, Family]",6.884308,We Scare Because We Care.,"James Sullivan and Mike Wazowski are monsters,...",585,4886,198781,585.0,"James Sullivan and Mike Wazowski are monsters,...","[johngoodman, billycrystal, marygibbs]","[{'credit_id': '52fe4256c3a36847f8016869', 'de...","[monster, infant, energysuppli, compani, rival...",petedocter,"Monsters, Inc. Animation Comedy Family johngoo..."
133,Toy Story 3,2010,4710,7,16.96647,"[Animation, Family, Comedy]",6.851922,No toy gets left behind.,"Woody, Buzz, and the rest of Andy's toys haven...",10193,78499,435761,10193.0,"Woody, Buzz, and the rest of Andy's toys haven...","[tomhanks, timallen, nedbeatty]","[{'credit_id': '5770143fc3a3683733000f3a', 'de...","[hostag, colleg, toy, barbi, anim, escap, dayc...",leeunkrich,Toy Story 3 Animation Family Comedy tomhanks t...
168,Toy Story 2,1999,3914,7,17.547693,"[Animation, Comedy, Family]",6.824813,The toys are back!,"Andy heads off to Cowboy Camp, leaving his toy...",863,3114,120363,863.0,"Andy heads off to Cowboy Camp, leaving his toy...","[tomhanks, timallen, joancusack]","[{'credit_id': '52fe4284c3a36847f8025073', 'de...","[museum, prosecut, identitycrisi, airplan, fle...",johnlasseter,Toy Story 2 Animation Comedy Family tomhanks t...
210,The Lego Movie,2014,3127,7,16.418133,"[Adventure, Animation, Comedy, Family, Fantasy]",6.786095,The story of a nobody who saved everybody.,"An ordinary Lego mini-figure, mistakenly thoug...",137106,108932,1490017,137106.0,"An ordinary Lego mini-figure, mistakenly thoug...","[chrispratt, willferrell, elizabethbanks]","[{'credit_id': '53a44a6cc3a3682a4200128a', 'de...","[fathersonrelationship, creativ, friendship, p...",phillord,The Lego Movie Adventure Animation Comedy Fami...
265,Hugo,2011,2197,7,14.046164,"[Adventure, Drama, Family]",6.710485,One of the most legendary directors of our tim...,Hugo is an orphan boy living in the walls of a...,44826,90866,970179,44826.0,Hugo is an orphan boy living in the walls of a...,"[benkingsley, sachabaroncohen, asabutterfield]","[{'credit_id': '52fe469ec3a36847f8108b6f', 'de...","[librari, clock, filmdirector, key, toy, boy, ...",martinscorsese,Hugo Adventure Drama Family benkingsley sachab...
782,Cars,2006,3991,6,18.907948,"[Animation, Adventure, Comedy, Family]",5.92594,Ahhh... it's got that new movie smell.,"Lightning McQueen, a hotshot rookie race car d...",920,45517,317219,920.0,"Lightning McQueen, a hotshot rookie race car d...","[owenwilson, paulnewman, bonniehunt]","[{'credit_id': '52fe428dc3a36847f8027841', 'de...","[carrac, carjourney, auto, route66, wrecker, p...",johnlasseter,Cars Animation Adventure Comedy Family owenwil...
862,A Bug's Life,1998,2379,6,16.869209,"[Adventure, Animation, Comedy, Family]",5.8835,An epic presentation of miniature proportions.,"On behalf of ""oppressed bugs everywhere,"" an i...",9487,2355,120623,9487.0,"On behalf of ""oppressed bugs everywhere,"" an i...","[kevinspacey, julialouis-dreyfus, haydenpanett...","[{'credit_id': '52fe44fec3a36847f80b64e5', 'de...","[winter, fight, ant, invent, collector, ant-hi...",johnlasseter,A Bug's Life Adventure Animation Comedy Family...
1167,Big,1988,1022,6,9.562292,"[Fantasy, Drama, Comedy, Romance, Family]",5.774921,You're Only Young Once But For Josh It Might J...,"A young boy, Josh Baskin makes a wish at a car...",2280,2797,94737,2280.0,"A young boy, Josh Baskin makes a wish at a car...","[tomhanks, elizabethperkins, robertloggia]","[{'credit_id': '52fe4349c3a36847f8048add', 'de...","[basebal, co-work, bronx, pinballmachin, toyma...",pennymarshall,Big Fantasy Drama Comedy Romance Family tomhan...
1221,Monster House,2006,912,6,15.402378,"[Animation, Comedy, Family, Fantasy]",5.756527,The House is . . . ALIVE!,"Monsters under the bed are scary enough, but w...",9297,46948,385880,9297.0,"Monsters under the bed are scary enough, but w...","[ryannewman, stevebuscemi, mitchelmusso]","[{'credit_id': '55469f70c3a3680ce80074c7', 'de...","[monster, secret, toy, children, neighbor, mis...",gilkenan,Monster House Animation Comedy Family Fantasy ...
1934,Cars 2,2011,2088,5,13.693002,"[Animation, Family, Adventure, Comedy]",5.042143,Ka-ciao!,Star race car Lightning McQueen and his pal Ma...,49013,87876,1216475,49013.0,Star race car Lightning McQueen and his pal Ma...,"[owenwilson, larrythecableguy, michaelcaine]","[{'credit_id': '52fe477fc3a36847f8139271', 'de...","[carrac, sequel, comedi, anthropomorph, bestfr...",johnlasseter,Cars 2 Animation Family Adventure Comedy owenw...
