<img style="float: right; height:30px" src="Data/tmdb.logo.png"> 

# Movie Recommendation System

<div style="text-align: right">Author: Phil Li</div>

## Project Background
As a huge movie and anime (Yes, I said it) enthusiast, I've always been very proud of my taste in movies and been pretty picky about the movies I watch (since two hours feels quite like a big commitment…). Because of this, I often find most of the existing movie recommendation systems unconvincing due to their high non-transparency in the information reduction process. Consequently, while working on improving the complexity of my models, I decided to build this dashboard based on the research of `Dr.Christian Richthammer` [Interactive Visualization of Recommender Systems Data](https://core.ac.uk/download/pdf/211567607.pdf) and to utilize the power of interactive visualizations.

In [None]:
## Load Data

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
# %matplotlib inline
%matplotlib widget
import plotly.graph_objects as go
import ipyvuetify as v
pd.set_option('display.max_columns',40)
import warnings
# def ignore_warnings():
#     pass
# warnings.warn=ignore_warnings

In [None]:
# os.chmod('/opt/conda/lib/python3.8/site-packages/.wh.certifi-2020.6.20-py3.6.egg-info',0o755)
# os.chmod('/opt/conda/lib/python3.8/site-packages/.wh.certifi-2020.6.20-py3.8.egg-info',0o755)
# os.chmod('/opt/conda/lib/python3.8/site-packages/.wh.certifi-2020.12.5-py3.8.egg-info',0o755)

In [2]:
movies_rating=pd.read_csv('Data/movies_rating.csv')
movies_data=pd.read_csv('Data/movies_data.csv')

In [None]:
### IMDB API (poster-url)

In [3]:
# import requests
# import json
# for i in movies_data.title:
#     strip_name=i.replace(' ','%20')
#     baseurl=f'https://api.themoviedb.org/3/search/movie?api_key=48cd39c471c265f648d9b27cc1e89afe&language=en-US&query={strip_name}&page=1&include_adult=false'
#     response=requests.get(baseurl)
#     content=response.content
#     info=json.loads(content)
#     movies_data.loc[movies_data['title']==i,'poster']=f"https://image.tmdb.org/t/p/w500/{info['results'][0]['poster_path']}"

In [None]:
# Content-based filtering (credits, genres and keywords)

In [4]:
# use CountVectorizer() because we do not want to down-weight the presence of an actor/director if he or she has acted or directed in relatively more movies.
from sklearn.feature_extraction.text import CountVectorizer
count=CountVectorizer(stop_words='english')
count_matrix=count.fit_transform(movies_data['soup'])

In [5]:
# compute the cosine similartity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity
cos_sim=cosine_similarity(count_matrix,count_matrix)

In [None]:
## Dashboard and Hybrid Engine

In [6]:
# finding the index from movie titles
indices=pd.Series(movies_data.index,index=movies_data['title']).drop_duplicates()

In [7]:
def hybrid(movie):
    index=np.flip(np.argsort(cos_sim[indices[movie]])[-21:-1])
    movies_filtered=movies_data.loc[index]
    movies_filtered.sort_values('rating_2',ascending=False,inplace=True)
    return movies_filtered.reset_index(drop=True)

In [8]:
df=hybrid('Tangled')

In [9]:
def update_click(widget,event,data):
    num=0
    top_5=1
    for i in figure.data[0]['dimensions']:
        if isinstance(i['constraintrange'],tuple):
            if isinstance(i['constraintrange'][0],tuple):
                sub_5=[]
                for a in range(len(i['constraintrange'])):
                    index=df[(df[str.lower(i['label'])]<=i['constraintrange'][a][1]) & (df[str.lower(i['label'])]>=i['constraintrange'][a][0])].index.values
                    sub_5+=list(index)
                if num==0:
                    top_5=sub_5
                    num+=1
                else:
                    top_5=list(set.intersection(set(top_5),set(sub_5)))    
            
            else:            
                index=df[(df[str.lower(i['label'])]<=i['constraintrange'][1]) & (df[str.lower(i['label'])]>=i['constraintrange'][0])].index.values
                if num==0:
                    top_5=list(index)
                    num+=1
                else:
                    top_5=list(set.intersection(set(top_5),set(index)))
                
    if isinstance(top_5,list):
        top_5=sorted(top_5)[:5]
    else:
        top_5=[0,1,2,3,4]
    
    top=top_5

    for i in range(len(top)):
        movies_num[f'{i}'].children=[f'{top[i]+1}']
        movies_name[f'{i}'].children=[f"{df['title'].loc[top[i]]}"]
        movies_genre[f'{i}'].children=[f"Genre: {df['genre_name'].loc[top[i]]}"]
        movies_lan[f'{i}'].children=[f"Language: {df['language_name'].loc[top[i]]}"]
        movies_run[f'{i}'].children=[f"Running time: {df['runtime_name'].loc[top[i]]}"]
        movies_rate[f'{i}'].children=[f"{df['rating_2'].loc[top[i]]}"]
    
    movies_lst['lst'].children=[movies_rows[f'row_{i}'] for i in range(len(top))]
    movies_title.children=[movies_name['0'].children[0]]
    movies_poster.src=df.loc[top[0]].poster
    movies_cast.children=[df.loc[top[0]].cast]

In [10]:
def update_title(widget,event,data):
    movies_title.children=list(widget.children[1].children[0].children[0])
    movies_poster.src=df[df['title']==widget.children[1].children[0].children[0]].poster.values[0]
    movies_cast.children=[df[df['title']==widget.children[1].children[0].children[0]].cast.values[0]]

In [11]:
def update_everything(widget,event,movie):
    global df
    df=hybrid(movie)
    
    # update the parallel coordinates
    figure.data[0]['dimensions'][0].values=df['genre']
    figure.data[0]['dimensions'][0]['constraintrange']=None
    figure.data[0]['dimensions'][1].values=df['year']
    figure.data[0]['dimensions'][1]['constraintrange']=None
    figure.data[0]['dimensions'][2].values=df['language']
    figure.data[0]['dimensions'][2]['constraintrange']=None
    figure.data[0]['dimensions'][3].values=df['runtime']
    figure.data[0]['dimensions'][3]['constraintrange']=None
    figure.data[0]['line'].color=df['genre']
    
    #update the list
    for i in range(5):
        movies_num[f'{i}'].children=[f'{i+1}']
        movies_name[f'{i}'].children=[f"{df['title'][i]}"]
        movies_genre[f'{i}'].children=[f"Genre: {df['genre_name'][i]}"]
        movies_lan[f'{i}'].children=[f"Language: {df['language_name'][i]}"]
        movies_run[f'{i}'].children=[f"Running time: {df['runtime_name'][i]}"]
        movies_rate[f'{i}'].children=[f"{df['rating_2'][i]}"]

    movies_lst['lst'].children=[movies_rows[f'row_{i}'] for i in range(5)]
    movies_title.children=[movies_name['0'].children[0]]
    movies_poster.src=df['poster'][0]
    movies_cast.children=[df['cast'][0]]

In [12]:
# parallel_coordinates widget
fig=go.Figure(data=
         go.Parcoords(
         line = dict(color = df['genre'],
               colorscale = 'Inferno',
               showscale = True,
               cmin = 1,
               cmax = 6),
         dimensions=list([
             dict(range=[1,6],
                 tickvals=[1,2,3,4,5,6],
                 label='Genre', values = df['genre'],
                 ticktext=['Action','Horror','Comedy','Animation','Drama','Others']),
             dict(range=[1916,2016],
                  label='Year', values = df['year'],
                 ),
             dict(range=[1,5],
                 tickvals=[1,2,3,4,5],
                 label='Language',values = df['language'],
                 ticktext=['English','French','Spanish','Japanese','Others']),
             dict(range=[25,254],
                 label='Runtime',values = df['runtime']
                 )
         ])))
fig.update_layout(
#     height=400,
    autosize=True,
    plot_bgcolor = 'white',
    paper_bgcolor = 'white',
);
figure=go.FigureWidget(fig)

In [13]:
# movies list widget
movies_num={f'{i}': v.Html(tag='span',class_='white--text',children=[f'{i+1}']) for i in range(5)}
movies_name={f'{i}': v.ListItemTitle(class_='deep-purple--text text--accent-4',children=[f"{df['title'][i]}"]) for i in range(5)}
movies_genre={f'{i}': v.ListItemSubtitle(children=[f"Genre: {df['genre_name'][i]}"]) for i in range(5)}
movies_lan={f'{i}': v.ListItemTitle(children=[f"Language: {df['language_name'][i]}"]) for i in range(5)}
movies_run={f'{i}': v.ListItemSubtitle(children=[f"Running time: {df['runtime_name'][i]}"]) for i in range(5)}
movies_rate={f'{i}': v.ListItemTitle(children=[v.Html(tag='p',class_='font-weight-bold font-italic',children=[f"{df['rating_2'][i]}"])]) for i in range(5)}

movies_rows={f'row_{i}':    v.ListItem(v_model=1,two_line=True,children=[
                            v.ListItemIcon(children=[v.Avatar(color='teal',size=25,children=[movies_num[f'{i}']])]),
                            v.ListItemContent(children=[movies_name[f'{i}'],
                                                       movies_genre[f'{i}']]),
                            v.ListItemContent(class_='text-center',children=[movies_lan[f'{i}'],
                                                       movies_run[f'{i}']]),
                            v.ListItemContent(class_='text-right',children=[movies_rate[f'{i}']]),
                            v.ListItemIcon(class_='text-right',children=[v.Icon(medium=True,color='yellow darken-1',children=['mdi-star'])]),
                        ])  for i in range(5)}

movies_lst={'lst':v.ListItemGroup(children=[movies_rows[f'row_{i}'] for i in range(5)])}
for i in range(5):
    movies_rows[f'row_{i}'].on_event('click',update_title)

In [14]:
# movie title widget & img widget & genres widget
movies_title=v.ListItemTitle(class_='deep-purple--text',children=[f"{df['title'][0]}"])
movies_poster=v.Img(width=200,src=f"{df['poster'][0]}",contain=True)
movies_cast=v.Chip(color='deep-purple',small=True,dark=True,light=True,children=[f"{df['cast'][0]}"])

In [15]:
# select movie button widget
select_w=v.Select(dense=True,solo=True,label='Movies',hint='Pick your favorite movie',prepend_icon="mdi-movie-roll",persistent_hint=True,items=list(movies_data['title']))
select_w.on_event('change',update_everything)

In [16]:
# update button widget
plot_update=v.Btn(outlined=True,color='deep-purple',small=True,children=[v.Icon(children=['mdi-sync'])])
plot_update.on_event('click',update_click)

In [17]:
v.Html(tag='div',children=[v.Row(children=[v.Col(children=[
    select_w,
    v.Card(elevation=4,class_='overflow-y-auto',children=[
        figure
                          ])])]),
    v.Row(children=[v.Col(xl=11,lg=10,md=10,sm=9,xs=9,class_='pb-2',children=[v.Card(children=[
    v.ListItem(children=[v.ListItemContent(children=[v.ListItemTitle(class_='deep-purple--text text-uppercase',children=['Movies']),
                                                                    ]),
                        plot_update]),
    v.Divider(),
    v.List(dense=True,shaped=True,children=[movies_lst['lst']
                    ]),
    ]
    )]),
                                          v.Col(xl=1,lg=2,md=2,sm=3,xs=3,class_='pb-2',children=[v.Card(children=[
                                          v.ListItem(children=[v.ListItemContent(children=[movies_title])]),
                                          v.Divider(),
#                                         v.Row(dense=True,children=[v.Col(cols=12,class_='pl-4',children=[movies_poster]),
#                                                                     ]),
                                          movies_poster,
                                          v.Divider(),
                                          v.CardText(children=[v.Html(tag='span',class_='subheading',children=['Cast:  ']),movies_cast])

                                                                                      ])])
                              ])])


Html(children=[Row(children=[Col(children=[Select(dense=True, hint='Pick your favorite movie', items=['The Sha…

## Usage Instruction
This system picks the movies that are the most similar to your selection and sorts the movies based on the prediction of your previous ratings. Then, you could add constraints on the parallel coordinates plot and, lastly, press on the refresh button. The movies that you are most likely to give the highest rating to will be listed.

Features:
* `Selection` - Select a movie you like.
* `Parallel Coordinates Plot` - Select on the y-axis (multiple selections, drag, and expand) to fine-tune the list with specific genre, year, and language etc.
* `Movies List` - Click on the refresh button to refresh the list after you finish selecting on the parallel coordinates plot; click on each row in the list to find more details about the movie

NOTES:
* As this system requires users' ratings to predict their preferences, in addition to other various reasons (e.g. overall complexities), it currently only reflects my preferences. Also, I mostly watch drama, animation, and action movies, so the system won't give very accurate predictions for genres I barely touched on (horror, thriller etc).
* <p style='color:red'> It may take a minute or two for the dashboard to appear. Try to reload the page if nothing shows up.</p>