   # ITEM BASED COLLABORATIVE FILTERING RECOMMENDER

To recommend movies for a user based on previous rating given by that user for other movies, I have used collaborative filtering which is popular for mining large datasets 

Collaborative Filtering recommends items based on similarity measures between users and/or items. The basic assumption behind the algorithm is that users with similar interests have common preferences.

In [75]:
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors

This Dataset is from MovieLens

In [76]:
ratings = pd.read_csv('ratings.csv', usecols=['userId','movieId','rating'])
movies = pd.read_csv('movies.csv', usecols=['movieId','title'])
merged_ratings = pd.merge(ratings, movies, on='movieId')  

In [77]:
df = merged_ratings.pivot_table(index='title',columns='userId',values='rating').fillna(0)
df1 = df.copy()
df.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
'Hellboy': The Seeds of Creation (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Round Midnight (1986),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Salem's Lot (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Til There Was You (1997),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Algorithm for recommending movie for a user

Here we are recommending movies based on predicted rating that the user might give for an unwatched movie

#### Algorithm
step 1 : Similarities between movies are calculated using KNN algorithm (defalult k value is taken as 10)\
step 2 : For each movie that the user has not seen calculate the predicted rating\
step 3 : Recommend movies which have highest predicted values

#### Predict a rating
1) find movies closest based on similarity calculated using knn.\
2) For similar movies weighted average of rating by the user and similarity between movies is calculated. 1-distance metric or inverse distance is used as weight.\
3) weighted average is used as predicted rating.

##### Formula
predicted rating = sum(Rating for movie * similarity with unrated movie)/ sum(similarity)

In [78]:
def Sort(li):
    return sorted(li, key=lambda x:x[1], reverse=True)

def recommend_movies(user, num_recommended_movies): 
    #assuming movies with rating equal to 0 are unwatched movies, then watched movies have rating >0 for that user
    watched_movies=df[df[user]>0][user].index
    #converting numpy arrays to list
    watched_movies=watched_movies.tolist()
    
    print('The list of the movies user id '+str(user)+' has Watched')
    for movies in watched_movies:
        print(movies)
  
    print('\n')

    recommended_movies=[]
    #getting list of movies not rated or watched by that user
    not_rated_movies=df[df[user] == 0].index
    not_rated_movies=not_rated_movies.tolist()
    
    for movies in not_rated_movies:
        ind=df.index.tolist()
        ind=ind.index(movies)
        calculated_rating = df1.iloc[ind, df1.columns.tolist().index(user)]
        recommended_movies.append((movies, calculated_rating))

    #sorting recommended movies list by decreasing calculated rating value
    sorted_recomend_movies=Sort(recommended_movies)

    #getting only required num of recommendation from all recommended movies
    lim_recomend_movies=sorted_recomend_movies[:num_recommended_movies]
    
    print('Recommended movies for '+str(user)+' are:')
    rank=0
    
    #printing recommended movies
    for rm in lim_recomend_movies:
        print(str(rank+1)+': '+str(rm[0])+' (predicted rating: '+str(rm[1])+')')
        rank=rank+1

default number of recommendation is 10, but it can be changed by changing value of num_recommendation

In [79]:
def recommender_system(user):
    number_neighbors=10
    num_recommendation=10
    
    # finding nearest neighbours to given item using NearestNeighbors
    knn = NearestNeighbors(metric='cosine')
    knn.fit(df.values)
    distances, indices = knn.kneighbors(df.values, n_neighbors=number_neighbors)
    
    # converting user_name to user_index 
    user_index = df.columns.tolist()
    user_index=user_index.index(user)

    for rn_title,title in list(enumerate(df.index)):
        
        # selecting movies that are not rated by user
        if df.iloc[rn_title, user_index] == 0:
            similar_movies = indices[rn_title].tolist()
            movie_distances = distances[rn_title].tolist()
            
            # generally the selected movie itself is in the list in first place
            # in these cases the movie is removed
            if rn_title in similar_movies:
                id_movie = similar_movies.index(rn_title)
                similar_movies.remove(rn_title)
                movie_distances.pop(id_movie) 
                
            # sometimes movies have all 0 ratings so all movies are considered same by NearestNeighor algo
            # so in these cases the movie farthest away or in last is removed
            else:
                movie_distances = movie_distances[:number_neighbors-1]
                similar_movies = similar_movies[:number_neighbors-1]
           
            # similarity between movie is calculated by 1-movie_distance
            movie_similarity = movie_distances.copy()
            movie_similarity = [1-i for i in movie_similarity]
            movie_similarity2 = movie_similarity.copy()
            score = 0
            
            for similar in range(0, len(movie_similarity)):
                # if rating of similar movie is 0, then rating and similarity is ignored in predicted rating
                if df.iloc[similar_movies[similar], user_index] == 0:
                    if len(movie_similarity2) == (number_neighbors - 1):
                        movie_similarity2.pop(similar)
          
                    else:
                        movie_similarity2.pop(similar-(len(movie_similarity)-len(movie_similarity2)))
                # if rating of similar movies is not 0 then it is included in rating prediction
                else:
                    score = score + movie_similarity[similar]*df.iloc[similar_movies[similar],user_index]
          
            # checking if number of rating with non zero value is >0
            if len(movie_similarity2) > 0:
                if sum(movie_similarity2) > 0:
                    predicted_r = score/sum(movie_similarity2)
                    
                # if movies selected as similar has no similarity then predicted rating is given as 0 
                else:
                    predicted_r = 0
                    
            # if all rating of similar movie is 0 then also predicted rating is 0
            else:
                predicted_r = 0
                
            # predicted rating is put into original dataset
            df1.iloc[rn_title,user_index] = predicted_r
    recommend_movies(user,num_recommendation)

In [80]:
# call this function by passing userid for whom the movie is supposed to be recommended
# range of usedid = [1-610]

recommender_system(12)

The list of the movies user id 12 has Watched
'burbs, The (1989)
10 Things I Hate About You (1999)
Beavis and Butt-Head Do America (1996)
Big Daddy (1999)
Billy Elliot (2000)
Circle of Friends (1995)
Clueless (1995)
Deep Impact (1998)
Emma (1996)
First Knight (1995)
First Wives Club, The (1996)
Four Weddings and a Funeral (1994)
Ghostbusters II (1989)
Gone with the Wind (1939)
Groundhog Day (1993)
Junior (1994)
Karate Kid, Part II, The (1986)
Little Women (1994)
Love Actually (2003)
Miracle on 34th Street (1994)
Never Been Kissed (1999)
Notebook, The (2004)
Pride & Prejudice (2005)
Romeo and Juliet (1968)
She's All That (1999)
Shine (1996)
So I Married an Axe Murderer (1993)
Splash (1984)
Sweet Home Alabama (2002)
Titanic (1997)
Twilight (2008)
What Women Want (2000)


Recommended movies for 12 are:
1: French Kiss (1995) (predicted rating: 5.000000000000001)
2: I.Q. (1994) (predicted rating: 5.000000000000001)
3: 21 (2008) (predicted rating: 5.0)
4: Back to School (1986) (predicted rat