# Recommendation System

This project deals with devising a model which provides a list of movie recommendations based on the movies that have already been rated by a user. The model has been extended to incorporate the user's personal details like age and gender to provide a better prediction. It has been achieved by training K-Nearest Neighbour model. We have achieved an average accuracy score 53% for the model.

### Retriving Data From MovieLens folder using Pandas Dataframes

#### 1. Importing All Genres from u.genre as genre

In [1]:
import pandas

genre = pandas.read_csv('u.genre',  sep='|', names=['Genre', 'Genre Identifier'])
genre


Unnamed: 0,Genre,Genre Identifier
0,unknown,0
1,Action,1
2,Adventure,2
3,Animation,3
4,Children's,4
5,Comedy,5
6,Crime,6
7,Documentary,7
8,Drama,8
9,Fantasy,9


#### 2. Importing All Users from u.user as users

In [2]:
users = pandas.read_csv('u.user',  sep='|', names=['User ID', 'Age', 'Gender', 'Occupation', 'Zipcode'])
users

Unnamed: 0,User ID,Age,Gender,Occupation,Zipcode
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213
5,6,42,M,executive,98101
6,7,57,M,administrator,91344
7,8,36,M,administrator,05201
8,9,29,M,student,01002
9,10,53,M,lawyer,90703


#### 3. Importing All Ratings from u.data as rating_data

In [3]:
rating_data = pandas.read_csv('u.data',  sep='\t', names=['User ID', 'Item ID', 'Rating', 'Timestamp'])
rating_data


Unnamed: 0,User ID,Item ID,Rating,Timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596
5,298,474,4,884182806
6,115,265,2,881171488
7,253,465,5,891628467
8,305,451,3,886324817
9,6,86,3,883603013


#### 4. Importing All Movies from u[convertedToUTF8].item as movies

In [4]:
movies = pandas.read_csv('u[convertedToUTF8].item', sep='|', names = ['Movie ID', 'Movie Title', 'Release Date', 'Video Release Date',
              'IMDb URL', 'unknown', 'Action', 'Adventure', 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
              'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western'])
movies

Unnamed: 0,Movie ID,Movie Title,Release Date,Video Release Date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
5,6,Shanghai Triad (Yao a yao yao dao waipo qiao) ...,01-Jan-1995,,http://us.imdb.com/Title?Yao+a+yao+yao+dao+wai...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,7,Twelve Monkeys (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Twelve%20Monk...,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
7,8,Babe (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Babe%20(1995),0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
8,9,Dead Man Walking (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Dead%20Man%20...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,10,Richard III (1995),22-Jan-1996,,http://us.imdb.com/M/title-exact?Richard%20III...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0


### Task 1: Find Highest Rated Movies in Every Genre

#### Identify Highest Rated movies in every genre:

1. Import pandas library for reading .csv data files for u.user, u.genre, u.data and u.item as users, genre, rating_data and movies


2. Convert u.item file to UTF8 type in order to correctly open in Jupyter Notebook


3. Import numpy library for handling mathematical operations


4. The rating_data is grouped with respect to Item ID using groupby() function


5. Following this, grouped ratings for every movie (item) are aggregated to calculate the mean rating


6. Helper Function: Find_highest_rated_movie (genre)


7. Finally, we loop through all genres and Find_highest_rated_movie (genre) is called and returns the movies which are then printed later in form of a dataframe.


In [5]:
##### Task 1: Find Highest rated movie in each genre #####

import numpy

#Rating grouped
grouped = rating_data.groupby('Item ID')                       # Groups all ratings by Item ID
ratings_all = grouped['Rating'].agg(numpy.mean)                # Finds mean of all the different grouped ratings
ratings_all = round(ratings_all,2)                       


        
def group_movies(genre):
    grouped_by_genre = movies.groupby(genre)                   # Group all movies by Genre
    movies_index = grouped_by_genre['Movie ID'].groups[1]      # Returns index value of Movie ID in movies dataset
    movies_id = movies_index+1                                 # Movie ID is +1 of the index value
    pass

def find_highest_rated_movie(genre):
    
    """The function takes genre as input and returns highest rating value, 
    list of highest rated movies indices and titles. This is achieved by first, 
    grouping movies dataset with respect to genre and collect all movie ids 
    (movie index + 1) for that genre. After this, maximum rating value is calculated 
    and movies with those ratings are determined using indices for maximum rating value."""
    
    grouped_by_genre = movies.groupby(genre)
    movies_index = grouped_by_genre['Movie ID'].groups[1] 
    movies_id = movies_index+1
    highest_rating = max(ratings_all.loc[movies_id])                   # Finds the maximum rating value in each genre
    highest_rated_movie = ratings_all.loc[movies_id]==highest_rating   # Finds movie id of all movies with highest rating 

    highest_rated_movie_index = highest_rated_movie[highest_rated_movie == True].index  # Finds index values of all movies with highest rating
    
    return highest_rating, highest_rated_movie_index, movies['Movie Title'][highest_rated_movie_index]
    pass

ratings = []
movie_indices = []
movie_titles = []
for i in range(len(genre)):                          # Loops through each genre to print the highest rated movie in each
    group_movies(genre['Genre'][i])
    rating, movie_index, movie_title = find_highest_rated_movie(genre['Genre'][i])
    ratings.append(rating)
    movie_indices.append(movie_index)
    movie_titles.append(movie_title[movie_index])
    

print('Highest Rated Movies in each Genre')
print()

g = 0
for m in movie_titles:
    print('Genre: ', genre['Genre'][g])
    print(m, ' ', ratings[g])
    print()
    g += 1
    


Highest Rated Movies in each Genre

Genre:  unknown
267    Chasing Amy (1997)
Name: Movie Title, dtype: object   3.44

Genre:  Action
50    Legends of the Fall (1994)
Name: Movie Title, dtype: object   4.36

Genre:  Adventure
1293    Ayn Rand: A Sense of Life (1997)
Name: Movie Title, dtype: object   5.0

Genre:  Animation
408    Jack (1996)
Name: Movie Title, dtype: object   4.49

Genre:  Children's
1293    Ayn Rand: A Sense of Life (1997)
Name: Movie Title, dtype: object   5.0

Genre:  Comedy
1500    Prisoner of the Mountains (Kavkazsky Plennik) ...
Name: Movie Title, dtype: object   5.0

Genre:  Crime
1122    Last Time I Saw Paris, The (1954)
Name: Movie Title, dtype: object   5.0

Genre:  Documentary
814                             One Fine Day (1996)
1201    Maybe, Maybe Not (Bewegte Mann, Der) (1994)
Name: Movie Title, dtype: object   5.0

Genre:  Drama
1122    Last Time I Saw Paris, The (1954)
1189              That Old Feeling (1997)
1467                     Cure, The (1995)
15

### Task 2: Movie Recommendation using K Nearest Neighbours Algorithm

#### 1. Create Helper Functions

In [6]:
###### Helper functions for KNN ######

import numpy
import pandas
from sklearn.metrics.pairwise import cosine_similarity

def find_all_movie_ratings_by_user(user_ID):
    
    """We group rating_data with respect to User ID and pull all Movie IDs 
    for every user. These Movie IDs are then used to find the ratings provided 
    by user in rating_data dataset. Finally, Movie ID and Ratings are returned
    as numpy arrays."""
    
    group_by_user = rating_data.groupby('User ID')                                # Group all ratings by User ID
    grouped_movie_id = group_by_user['Item ID'].groups                            # Group all Movie ID by user
    
    user_movies_id = rating_data.loc[grouped_movie_id[user_ID], ['Item ID']]      # All Movie IDs
    user_movies = movies.loc[user_movies_id['Item ID'], ['Movie Title']]          # All Movie Titles
    user_movies_rating = rating_data.loc[grouped_movie_id[user_ID], ['Rating']]   # All Movie Ratings
    
    ratings = numpy.array(user_movies_rating['Rating'])
    movie_id = numpy.array(user_movies_id['Item ID'])
        
    user_data = numpy.vstack([movie_id, ratings])
    
    return user_data                                                              # Returns Movie IDs and Ratings for a particular user
    pass


def arrange_two_user_ratings(user_ID1, user_ID2):
    
    """The function creates a ratings matrix for all the movies for two 
    intended users. It merges all the movies rated by two users using 
    numpy.union1d(). In order to create a ratings matrix, a loop runs through
    all movies in the united array and checks whether user1 or user2 has rated
    the movie, if yes then the corresponding rating is added to the matrix 
    otherwise rating is set as zero for that particular entry."""
    
    user1_data = find_all_movie_ratings_by_user(user_ID1)                      # Gets all Movie ID and Ratings for two users
    user2_data = find_all_movie_ratings_by_user(user_ID2)

    same_movies_rated = numpy.intersect1d(user1_data[0], user2_data[0])        # Groups all commmon movies rated by two users 
    movies_union = numpy.union1d(user1_data[0], user2_data[0])                 # Merges all movies rated by both users without duplication
    
    u1_rating = []
    u2_rating = []

    for i in range(len(movies_union)):
        if movies_union[i] in user1_data[0]:                                   # If user has rated a particular movie then append that to rating matrix
            index = numpy.where(user1_data[0]==movies_union[i])
            u1_rating.append(user1_data[1][index][0])
        else:                                                                  # Otherwise append 0 for that movie to the rating matrix
            u1_rating.append(0)
            
        if movies_union[i] in user2_data[0]:
            index = numpy.where(user2_data[0]==movies_union[i])
            u2_rating.append(user2_data[1][index][0])  
        else:
            u2_rating.append(0)   
            
    
    return movies_union, u1_rating, u2_rating
    pass


def calc_cosine_similarity(target_user, user_x):
    similarity_score = cosine_similarity([target_user], [user_x])
    
    return similarity_score
    pass


def euclideanDistance(data1, data2, length):
    
    """Returns the Euclidean distance between two data sets by calculating 
    the total sum of the difference of each corresponding value squared. 
    The returned value is the square root of total sum."""
    
    distance = 0
    for x in range(length):
        distance += numpy.square(data1[x] - data2[x])
    return numpy.sqrt(distance)


def arrange_user_ratings(user, nearest_users, k):
    
    """The function work exactly on the same principle as 
    arrange_two_user_ratings() and returns a ratings matrix 
    for k neighbours."""
    
    user_data = [0]*k
    for i in range(0,k):
        user_data[i] = find_all_movie_ratings_by_user(nearest_users[i])
    
        
    movies_union = user[0]
    for i in range(0,k):
        movies_union = numpy.union1d(movies_union, user[i])
    
    users_ratings = []
    for i in range(len(movies_union)):
        single_user_ratings = []
        for j in range(0,k):
            if movies_union[i] in user[j]:
                index = numpy.where(user_data[j][0]==movies_union[i])
                single_user_ratings.append(user_data[j][1][index][0])
            else:
                single_user_ratings.append(0) 
         
        users_ratings.append(single_user_ratings)
        
    user_ratings = numpy.transpose(users_ratings)
    
    return movies_union, user_ratings 
    pass


def find_weighted_average(user_ratings, nearest_users_scores, k):
    
    """Returns the total sum of the weighted average of every movie rated
    by k neighbours but not the target user. Each weighted average is 
    calculated by multiplying movie rating and the Euclidean distance ratio
    which is neighbour’s Euclidean distance divided by sum of all neighbours’
    Euclidean distances."""
    
    user_weighted_score = [0]*k
    nearest_users_score_sum = numpy.sum(nearest_users_scores)
    
    for i in range(0, k):
        user_weighted_score[i] = user_ratings[i]*nearest_users_scores[i]/nearest_users_score_sum
        
    weighted_score = numpy.sum(user_weighted_score, axis=0)   
    
    return weighted_score
    pass

#### 2. Preparing Training and Test Datasets 

In [7]:
###### Preparing the Training and Test Sets ######

total_count = len(users['User ID'])
print (total_count)

split = int(total_count * 0.6)

# Shuffle the data to avoid any ordering bias..
numpy.random.seed(0)
shuffle = numpy.random.permutation(total_count)

x = users['User ID'][shuffle]

x_train = x[:split]

x_test = x[split:]


print('Training set size:', x_train.shape[0])
print('Test set size:', x_test.shape[0])

943
Training set size: 565
Test set size: 378


#### 3. K Nearest Neighbours Implementation

In [8]:
###### KNN Main Implementation ######

def k_nearest_neighbours(target_user, target_user_data, n, k):

    scores = []
    for user in x_train:
        combined_data = arrange_two_user_ratings(target_user, user)                    # Returns rating matrix for target user and current training user
        length = len(combined_data[1])
        similarity_score = euclideanDistance(combined_data[1],combined_data[2],length) # Calculates Similarity score using Euclidean Distance approach
        scores.append((similarity_score, user))
   

    ###### Score Management: Finding Nearest Users ######

    scores_only = [i[0] for i in scores]                                               # Separates scores
    users_only = [i[1] for i in scores]                                                # Separates User ID 
    sorted_scores = numpy.sort(scores_only)
    des_sorted_scores = sorted_scores[::-1]                                            # Sorts scores in descending order

    indices = []
    for i in range(0,k):                                                               # Finds K nearest neighbours
        index = numpy.where(des_sorted_scores[i]==scores_only)
        indices.append(index[0][0])

    nearest_users = [users_only[i] for i in indices]
    nearest_users_scores = [scores_only[i] for i in indices]


    ###### Nearest Users Analysis ######

    target_user_data = find_all_movie_ratings_by_user(target_user)

    all_new_movies = []

    for i in range(len(nearest_users)):
        user_data = find_all_movie_ratings_by_user(nearest_users[i])                 # Finds all movies rated by nearest user

        same_movies_rated = numpy.intersect1d(user_data[0], target_user_data[0])     # Returns only movies rated by both target and current nearest neighbour 

        new_movies = []
        for i in range(len(user_data[0])):
            if user_data[0][i] not in same_movies_rated:
                new_movies.append(user_data[0][i])                                   # If movie rated by nearest neighbbour does not exist in same_movies_rated then they are added to new movies dataset

        all_new_movies.append(new_movies)



    movies_union, user_ratings = arrange_user_ratings(all_new_movies, nearest_users, k) # Returns rating matrix for k nearest neighbours


    weighted_score = find_weighted_average(user_ratings, nearest_users_scores, k)    # Calculated weighted average    


    ###### Final Recommendations ######

    t_movies_union = numpy.transpose(movies_union)
    t_weighted_score = numpy.transpose(weighted_score)
    df_recommended_movies = pandas.DataFrame(t_movies_union, columns=['Movie ID'])
    df_recommended_movies['Weighted Score'] = t_weighted_score
    movie_titles = movies['Movie Title'][t_movies_union]

    df_recommended_movies_sorted = df_recommended_movies.sort_values(by=['Weighted Score'], ascending=False)   # Sorts all weighted averages in descending order

    top_movie_id = []
    selected_movies = []
    for i in range(0,n):
        top_movie_id.append(df_recommended_movies_sorted['Movie ID'][df_recommended_movies_sorted.index[i]])   # Stores top n movies to recommend


    print('Top ',n , ' Recommended Movies')  

    selected_movies.append(movies['Movie Title'][top_movie_id])
    selected_moviesdf = pandas.DataFrame(numpy.transpose(selected_movies), columns = ['Movie Title'])
    selected_moviesdf['Movie ID'] = top_movie_id
    return selected_moviesdf
    pass


target_user = 346
target_user_data = find_all_movie_ratings_by_user(target_user)
selected_moviesdf = k_nearest_neighbours(target_user, target_user_data, 10, 3)
selected_moviesdf

Top  10  Recommended Movies


Unnamed: 0,Movie Title,Movie ID
0,Lost in Space (1998),915
1,Harold and Maude (1971),427
2,Willy Wonka and the Chocolate Factory (1971),150
3,Sleepless in Seattle (1993),87
4,M (1931),655
5,"Magnificent Seven, The (1954)",509
6,"Secret of Roan Inish, The (1994)",462
7,Schindler's List (1993),317
8,"Postino, Il (1994)",13
9,Ridicule (1996),223


### Task 3: Additional Implementation of user data to improve recommendations

#### 1. Defining User Parameters

In [9]:
###### Defining user parameters ######

import numpy

age_max = max(users['Age'])

n_age = users['Age']/(age_max)         # normalized age parameter

n_gender = []
for i in range(users['Gender'].shape[0]):
    if users['Gender'][i]=='M':
        n_gender.append(1)             # +1  for Male
    else:
        n_gender.append(-1)            # -1 for Female


user_details_df = pandas.DataFrame(users['User ID'], columns = ['User ID']) 
user_details_df['Age Parameter'] = n_age
user_details_df['Gender Parameter'] = n_gender
user_details_df['User Weightage Factor'] = n_age*n_gender          # User Attribute Weightage Factor
user_details_df

Unnamed: 0,User ID,Age Parameter,Gender Parameter,User Weightage Factor
0,1,0.328767,1,0.328767
1,2,0.726027,-1,-0.726027
2,3,0.315068,1,0.315068
3,4,0.328767,1,0.328767
4,5,0.452055,-1,-0.452055
5,6,0.575342,1,0.575342
6,7,0.780822,1,0.780822
7,8,0.493151,1,0.493151
8,9,0.397260,1,0.397260
9,10,0.726027,1,0.726027


#### 2. Modified KNN Main Implementation


In [10]:
###### KNN Main Implementation ######

def modified_k_nearest_neighbours(target_user):
    
    """Same functionality as k_nearest_neighbours(target_user)"""
    
    target_user_data = find_all_movie_ratings_by_user(target_user)
    scores = []
    for user in x_train:
        combined_data = arrange_two_user_ratings(target_user, user)
        length = len(combined_data[1])
        similarity_score = euclideanDistance(combined_data[1],combined_data[2],length)
        scores.append((similarity_score, user))
    

    ###### Score Management: Finding Nearest Users ######

    scores_only = [i[0] for i in scores]   
    users_only = [i[1] for i in scores]


    ###### Updating Score with new parameters ######

    user_ids = [users_only[i] - 1 for i in range(len(users_only))]
    new_scores = scores_only*user_details_df['User Weightage Factor'][user_ids]       # Update score by multiply with user attribute weightage factor
    
    
    ###### Sorting Scores according to gender: Descending for Male, Ascending for Female #####

    new_sorted_scores = []
    if users['Gender'][target_user] == 'F':
        new_sorted_scores = numpy.sort(new_scores)         
    elif users['Gender'][target_user] == 'M':
        new_sorted_scores = numpy.sort(new_scores)
        new_sorted_scores = new_sorted_scores[::-1]
      

    ###### Find New Nearest Neighbours ######

    k = 3
    new_indices = []
    for i in range(0,k):
        index = numpy.where(new_sorted_scores[i]==new_scores)
        new_indices.append(index[0][0])

    new_nearest_users = [user_ids[i] for i in new_indices]
    new_nearest_users_scores = numpy.array(new_scores[new_nearest_users])


    ###### New Nearest Neighour Analysis ######

    target_user_data = find_all_movie_ratings_by_user(target_user)

    new_all_new_movies = []

    for i in range(len(new_nearest_users)):
        user_data = find_all_movie_ratings_by_user(new_nearest_users[i])

        new_same_movies_rated = numpy.intersect1d(user_data[0], target_user_data[0])

        new_movies = []
        for i in range(len(user_data[0])):
            if user_data[0][i] not in new_same_movies_rated:
                new_movies.append(user_data[0][i])

        new_all_new_movies.append(new_movies)


    ###### Create Ratings Matrix ######

    new_movies_union, new_user_ratings = arrange_user_ratings(new_all_new_movies, new_nearest_users, k)


    ###### Calculate New Weighted Average ######

    new_weighted_score = find_weighted_average(new_user_ratings, new_nearest_users_scores, k)  



    ###### Final Recommendations ######

    t_new_movies_union = numpy.transpose(new_movies_union)
    t_new_weighted_score = numpy.transpose(new_weighted_score)
    df_new_recommended_movies = pandas.DataFrame(t_new_movies_union, columns=['Movie ID'])
    df_new_recommended_movies['Weighted Score'] = t_new_weighted_score
    new_movie_titles = movies['Movie Title'][t_new_movies_union]

    df_new_recommended_movies_sorted = df_new_recommended_movies.sort_values(by=['Weighted Score'], ascending=False)

    new_top_movie_id = []
    new_selected_movies = []
    for i in range(0,10):
        new_top_movie_id.append(df_new_recommended_movies_sorted['Movie ID'][df_new_recommended_movies_sorted.index[i]])


    print('Top 10 Recommended Movies')  

    new_selected_movies.append(movies['Movie Title'][new_top_movie_id])
    new_selected_moviesdf = pandas.DataFrame(numpy.transpose(new_selected_movies), columns = ['Movie Title'])
    new_selected_moviesdf['Movie ID'] = new_top_movie_id
    return new_selected_moviesdf
    pass


target_user = 346
new_selected_moviesdf = modified_k_nearest_neighbours(target_user)
new_selected_moviesdf

Top 10 Recommended Movies


Unnamed: 0,Movie Title,Movie ID
0,Nikita (La Femme Nikita) (1990),197
1,Leaving Las Vegas (1995),275
2,Gattaca (1997),269
3,Mr. Holland's Opus (1995),14
4,Contact (1997),257
5,Unforgiven (1992),202
6,"First Wives Club, The (1996)",475
7,"Secret of Roan Inish, The (1994)",462
8,"Maltese Falcon, The (1941)",483
9,"Thin Man, The (1934)",492


### Task 4: Accuracy Testing

#### Test 1: Compare Movies rated by and recommended to the Target User 

The accuracy test has been designed to test the validity of the results obtained by the recommendation system. Following are the steps employed in the test:

1.	A target user is selected and its ‘Movie ID’ and ‘Rating’ data is obtained from the ratings database


2.	A minor chunk of the data is removed from the target user data and a new dataset is created (In this case 32 entries have been removed from 193 entries)


3.	This new dataset is used by the k_nearest_neighours() function to recommend top 300 movies for the user (based on the recommender build in task 2)


4.	The removed movies are checked whether they are present in the recommended list 


5.	The number of movies present are then divided by the total number of movies removed from the initial dataset to get the accuracy value.


In [11]:
###### Get Data for Target User ######

target_user_data = find_all_movie_ratings_by_user(target_user)
target_user_data_df = pandas.DataFrame(target_user_data[0], columns = ['Movie ID'])

target_user_data_df['Rating'] = target_user_data[1]

In [12]:
###### Remove movies ######

new_data = numpy.vstack([target_user_data_df['Movie ID'][0:160], target_user_data_df['Rating'][0:160]])
removed_data = [target_user_data_df['Movie ID'][160:192], target_user_data_df['Rating'][160:192]]
removed_data_df = pandas.DataFrame(removed_data[0], columns = ['Movie ID'])
removed_data_df['Rating'] = removed_data[1]
removed_data_df

Unnamed: 0,Movie ID,Rating
160,1258,4
161,203,4
162,1217,4
163,161,3
164,188,4
165,233,4
166,68,3
167,177,4
168,79,5
169,164,3


In [13]:
###### Finding recommendations on the new ratings ######

target_user = 346
selected_movies_testdf = k_nearest_neighbours(target_user, new_data, 300, 3)
selected_movies_testdf

Top  300  Recommended Movies


Unnamed: 0,Movie Title,Movie ID
0,Lost in Space (1998),915
1,Harold and Maude (1971),427
2,Willy Wonka and the Chocolate Factory (1971),150
3,Sleepless in Seattle (1993),87
4,M (1931),655
5,"Magnificent Seven, The (1954)",509
6,"Secret of Roan Inish, The (1994)",462
7,Schindler's List (1993),317
8,"Postino, Il (1994)",13
9,Ridicule (1996),223


In [14]:
###### Check the number of removed movies present in the recommended list ######

print('Movie ID of Recommended Movies:')
no_of_movies_recommended = 0
for i in range(len(removed_data[0])):
    if removed_data[0][i+160] in selected_movies_testdf['Movie ID']:
        no_of_movies_recommended += 1
        print(removed_data[0][i+160])


Movie ID of Recommended Movies:
203
161
188
233
68
177
79
164
182
159
196
88
96
175
58
226
195


In [15]:
###### Accuracy Score ######

accuracy = no_of_movies_recommended/len(removed_data[0])*100
print('Accuracy of Recommendation System: ', accuracy, '%')

Accuracy of Recommendation System:  53.125 %
