<h1 style='color:navy'> Recommender System Using Neighbourhood Similarity

In [1]:
import numpy as np
import pandas as pd
from plotly.offline import init_notebook_mode,iplot
import cufflinks as cf
import plotly.figure_factory as ff
init_notebook_mode(connected=True)
cf.go_offline(connected=True)
cf.set_config_file(offline=False, world_readable=True, theme='pearl')

Download Dataset [here](https://raw.githubusercontent.com/sureshgorakala/RecommenderSystems_R/master/movie_rating.csv)

In [2]:
df=pd.read_csv('movie_rating.csv')

In [3]:
fig=ff.create_table(df,colorscale='Cividis')
iplot(fig)

    All Ratings are in range of 1-5.

In [4]:
dff=df.groupby(['critic']).count()
dff['index']=dff.index
dff.iplot(kind='bar',x='index',y='rating',colors=['darksalmon'])

    Total 6 Users and each one of them rated atmost 6 movies.
    User Toby rated only 3 movies.

### Convert Data into Movie centric by Users with values as Ratings

In [5]:
df_r=df.pivot_table(index='title',values='rating',columns=['critic'])

In [6]:
df_r.insert(0,'Movies',df_r.index)
fig=ff.create_table(df_r,colorscale='Hot')
df_r.drop(columns=['Movies'],inplace=True)
iplot(fig)

<h1 style='color:purple'> Aim is to Recommend a movie for Toby.

<h2 style='color:ivory;background-color:black'> User-Based Collaborative Filtering Using Pearson Correlation

In [7]:
corr=df_r.corr()

In [8]:
fig=ff.create_annotated_heatmap(z=corr.values.round(3),x=list(corr.index),y=list(corr.columns),
                                colorscale='Blackbody')
iplot(fig)

    From above Correlation matrix:
    Lisa Rose,Mick LaSalle,Claudia Puig are closely related to Toby.

#### Separate Movies not rated by Toby

In [9]:
df_nr=df_r[df_r['Toby'].isnull()].drop(columns=['Toby'])

In [10]:
df_nr.insert(0,'Movies',df_nr.index)
fig=ff.create_table(df_nr,colorscale='YlOrRd')
df_nr.drop(columns=['Movies'])
iplot(fig)

    Among 6 movies Toby rated only 3 movies.
    
    
    Calculate weighted average of ratings multplied by similarity of users with Toby.
    
    For these values,
    ->For each movie, 
        ->Multiply the rating given by user with co-relation value of user with Toby.
        
        
    Ex: For Movie 'Just My Luck' User Claudia Puig given rating 3.0.
        Multiply this value with Co-realtion value of (Toby,Claudia Puig) = 0.893 ===> appx. 0.268.

In [11]:
rat_toby=corr.values[:,-1]

In [12]:
df_rr=df_r.copy()
for i in range(df_rr.shape[0]):
    for j in range(df_rr.iloc[i].size):
        df_rr.iloc[i][j]*=rat_toby[j]

In [13]:
df_rr=df_rr[df_rr['Toby'].isnull()]

In [14]:
df_rr.drop(columns=['Toby'],inplace=True)

In [15]:
df_rr.insert(0,'Movies',df_rr.index)
fig=ff.create_table(df_rr.round(5),colorscale='Greys')
df_rr.drop(columns=['Movies'],inplace=True)
iplot(fig)

    Now Sum up all values for Movie and make a column as 'sim_rating_sum'.

In [16]:
df_rr['sim_rating_sum']=df_rr.sum(axis=1)

    Calculate sum of similarity (co-relation) values for all users who has given rating.
    Ignore users who haven't rated.
    Make this values as column 'similarity_sum'

In [17]:
sim_sum=[]
for i in range(df_rr.shape[0]):
    sc=0
    for j in range(df_rr.iloc[i].size -1):
        if df_rr.iloc[i][j] == df_rr.iloc[i][j]:
            sc+=rat_toby[j]
    sim_sum.append(sc)

In [18]:
df_rr['similarity_sum']=sim_sum

    Divide 'sim_rating_sum' with 'similarity_sum' to get ratings close to Toby's choice for recommend.
    Make this data as column 'pred_rating'.

In [19]:
df_rr['pred_rating']=df_rr['sim_rating_sum'] / df_rr['similarity_sum']

In [20]:
df_rr.insert(0,'Movies',df_nr.index)
fig=ff.create_table(df_rr.round(5),colorscale='Greens')
df_rr.drop(columns=['Movies'],inplace=True)
iplot(fig)

    The column 'pred_rating' shows approximate ratings for movies if Toby would have made rating for these 
    unrated movies by Toby.

#### Making Recommendation for Toby based on weigthed_average_rating (pred_rating)

In [21]:
avg_rat_toby=df_r['Toby'].sum()/ 3

In [22]:
avg_rat_toby

3.1666666666666665

    Average Rating choice for Toby is around 3.0.

#### Toby likes movie when rating >= 3.16665

In [23]:
df_rr[df_rr['pred_rating']> avg_rat_toby].index.tolist()

['The Night Listener']

    Pred_rating for 'The Night Listener' rating is 3.347, 
    can recommend this movie to Toby based on his previous ratings and similarity with other users.

<h2 style='color:ivory;background-color:black'> Item-Based Collaborative Filtering Using Cosine-Similarity

In [24]:
df_i=df.pivot_table(index='critic',columns=['title'],values='rating')

In [25]:
df_i.insert(0,'Movies',df_i.index)
fig=ff.create_table(df_i,colorscale='Portland')
df_i.drop(columns=['Movies'],inplace=True)
iplot(fig)

#### Calulate Movie similarity using Cosine similarity

In [26]:
mov_name=df_i.columns

In [27]:
import math
def dot_prod(a,b):
    summ=0
    for i,j in zip(a,b):
        summ+= i*j
    return summ
def cosine(a,b):
    dot=dot_prod(a,b)
    norm_a=math.sqrt(dot_prod(a,a))
    norm_b=math.sqrt(dot_prod(b,b))
    return dot / (norm_a * norm_b)

In [28]:
cos_d=[]
for i in mov_name:
    coss=[]
    for j in mov_name:
        a=[]
        b=[]
        for k in range(df_i[i].size):
            if df_i[i].iloc[k] == df_i[i].iloc[k] and df_i[j].iloc[k]==df_i[j].iloc[k]:
                a.append(df_i[i].iloc[k])
                b.append(df_i[j].iloc[k])
        coss.append(cosine(a,b))
    cos_d.append(coss)

In [29]:
df_cs=pd.DataFrame(data=cos_d,columns=df_i.columns,index=df_i.columns)

In [30]:
fig=ff.create_annotated_heatmap(z=df_cs.values.round(3),x=list(df_cs.columns),y=list(df_cs.columns),
                                colorscale='Hot')
iplot(fig)

    For each unrated movie by Toby,
    Calculate Weighed average rating for all unrated movies similarity with other rated movies by Toby 
    multiplied by rating by Toby.
    
    
    Rating for unrated Movie ==> 
    Sum(similarity_with_other_movies * rating_of_other_movies_by_toby) / Sum(similarity_with_other_movies)

In [32]:
mov=[]
rat=[]
for k,f in zip(df_i.columns,df_i.loc['Toby'].isnull().tolist()):
    if f==False:
        continue
    den_sum=0
    num_sum=0
    for i,j in zip(df_cs.loc[k].values,df_i.loc['Toby']):
        if j == j:
            num_sum+= i*j
            den_sum+= i
    mov.append(k)
    rat.append(num_sum/den_sum)

In [33]:
fig=ff.create_table(pd.DataFrame(data=np.array([mov,rat]).T,columns=['Movie','Rating']),colorscale='Viridis')
iplot(fig)

    As for all movies calculated similarity ratings are > 3.16665 which is average rating Toby gives for movies 
    he like then all these movies can be recommended to Toby.

<p> <font size=2><b>Note:</b> Why all movies are recommended to Toby which didn't by <b>User Based Collaborative Filtering?</b> Because of limited data and recommendation generated using rating only.</font>