# Project- simple recommendation system

## Introduction

In this project, we will be using a user-based recommendation system and collaborative filtering to demonstrate the cleaning of a dataset and the process of recommending a movie to a user.
1. A user-based recommendation system is done using the data of each user and comparing them to another, in which you are able to recommend movies to people that have similar likings to others.

2. A collaborative filtering system is when you recommend movies to users based off of movies of a similar genre or when a person rates a certain style of movie highly, so the system recommends other movies that are similar to that style.

For both systems, we must first import all the needed modules, which allow us to access certain functions not in the main coding system and then use those to preform the code needed to preform these systems.

Dataset Source:[link](https://www.kaggle.com/shubhammehta21/movie-lens-small-latest-dataset?select=ratings.csv)

In [None]:

#Here we are importing the modules that allow us to be able to use certain
#functions and keywords, without them, this project wouldn't be possible.

import numpy as np
import pandas as pd
from math import sqrt
import matplotlib.pyplot as plt
from numpy import linalg as LA
%matplotlib inline

After importing all the needed modules, we set the variable equal to the set of data we are going to be using for easy access and usage.

In [None]:
#Downloading and Storing Variables and Dataframes

ratings_df=pd.read_csv('/content/ratings.csv')
ratings_df.head()
movies= pd.read_csv('/content/movies.csv')

#Below contains the userId, movieId, the users rating on a movie and timestamp


Before preforming any system on this data, it must first be cleaned of null values, which are values that, in a sense, don't exist. These are usually shown in a data table as "NaN" values.

Here we are setting the table equal to itself and removing a column that is not needed in this case, the "timestamp" column.
Next we're creating a table that will be used for our systems after it is all cleared out, using the columns of the user Id's and the movie Id's to create a table

In [None]:
#Dealing With Missing Values
ratings_df=ratings_df.drop(columns='timestamp')

missing_vals = ratings_df.pivot(index='movieId',columns='userId',values='rating')

missing_vals.head()


#Below, NaN values are values that are missing or not present

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.0,,,,4.0,,4.5,,,,,,,,2.5,,4.5,3.5,4.0,,3.5,,,,,,3.0,,,,5.0,3.0,3.0,,,,,,,5.0,...,,4.0,5.0,,,,,,4.0,3.0,,,,5.0,,,5.0,,,4.0,,,,,,4.0,4.0,,3.0,2.5,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,,,,,,4.0,,4.0,,,,,,,,,,3.0,3.0,3.0,3.5,,,,,,4.0,,,,,,,,,,,,,,...,,,4.5,,,,,,,,,,,,,4.0,,,,2.5,,4.0,,4.0,,,,,2.5,4.0,,4.0,,5.0,3.5,,,2.0,,
3,4.0,,,,,5.0,,,,,,,,,,,,,3.0,,,,,,,,,,,,,3.0,,,,,,,,,...,,,,,,,,,,,,,,,,,,3.0,,3.0,,,,4.0,,,,,1.5,,,,,,,,,2.0,,
4,,,,,,3.0,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.5,,,,,,,,,,
5,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,2.5,,,,3.0,,,,,,


In the code below, we are using the fillna() function, which replaces all NaN values with a value chosen by the writer. In almost all cases, you want the NaN values to be replaced by a mean in order not to alter the data of the users too much, lets say that I input 5 as all NaN values, that would alter all users individual data too much because not all users will rate all movies and if they did, the ratings would not all be 5/5's.

We will only consider the movies rated by more than 50 users and users who rated more than 10 movies. This will increase our confidence.

In [None]:
#  consider the movies rated by more than 50 users and userd who rated more than 10 movies. This will increase our confidence.

no_user = ratings_df.groupby('movieId')['rating'].agg('count')
no_movies = ratings_df.groupby('userId')['rating'].agg('count')
missing_vals = missing_vals.loc[no_user[no_user > 10].index,:]
missing_vals = missing_vals.loc[:,no_movies[no_movies > 50].index]
missing_vals.head()

userId,1,4,6,7,10,11,15,16,17,18,19,20,21,22,23,24,27,28,29,32,33,34,36,38,39,40,41,42,43,45,47,50,51,52,57,58,59,62,63,64,...,559,560,561,562,563,564,566,567,570,571,572,573,577,579,580,582,583,584,585,586,587,588,590,591,592,593,594,596,597,599,600,601,602,603,604,605,606,607,608,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.0,,,4.5,,,2.5,,4.5,3.5,4.0,,3.5,,,,3.0,,,3.0,3.0,,,,,5.0,,,5.0,4.0,,3.0,,,5.0,,,,5.0,4.0,...,5.0,3.0,4.0,4.5,,,,3.5,4.0,,4.0,5.0,,4.0,3.0,,,5.0,,,5.0,,4.0,,,,,4.0,4.0,3.0,2.5,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,5.0
2,,,4.0,,,,,,,3.0,3.0,3.0,3.5,,,,4.0,,,,,,,,,,,,,,,,4.5,,,,,4.0,,,...,4.0,,4.0,,2.5,,4.0,,3.5,,,4.5,,,,,,,,4.0,,,2.5,,4.0,,4.0,,,2.5,4.0,,4.0,,5.0,3.5,,,2.0,
3,4.0,,5.0,,,,,,,,3.0,,,,,,,,,3.0,,,,,,,,4.0,5.0,,,,4.0,,,3.0,,,,3.5,...,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,,,,4.0,,,1.5,,,,,,,,,2.0,
5,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,3.0,,,,,,4.0,,,,,...,,,3.0,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,2.5,,,,3.0,,,,,
6,4.0,,4.0,,,5.0,,,,4.0,,,,,4.0,4.5,,3.5,,3.0,,,,,,,,,,4.0,,,,,3.0,,,4.5,,4.5,...,5.0,,4.0,,,,,,,,,4.5,4.0,,4.0,,,,,,,5.0,3.5,,3.0,,,,3.0,4.5,,,3.0,4.0,3.0,,,,,5.0


We will be using 2.5 as missing values, mean of possible values. 

In [None]:
#Fillna replaces missing values with a number chosen by the coder
#This code is filling in missing values in the table so when
#we go to recommend a movie to a user, we don't have missing values,
#which would make it very difficult to recommend movies



missing_vals_table=missing_vals.fillna(2.5)

In [None]:
# Print table after data treatment

missing_vals_table

userId,1,4,6,7,10,11,15,16,17,18,19,20,21,22,23,24,27,28,29,32,33,34,36,38,39,40,41,42,43,45,47,50,51,52,57,58,59,62,63,64,...,559,560,561,562,563,564,566,567,570,571,572,573,577,579,580,582,583,584,585,586,587,588,590,591,592,593,594,596,597,599,600,601,602,603,604,605,606,607,608,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.0,2.5,2.5,4.5,2.5,2.5,2.5,2.5,4.5,3.5,4.0,2.5,3.5,2.5,2.5,2.5,3.0,2.5,2.5,3.0,3.0,2.5,2.5,2.5,2.5,5.0,2.5,2.5,5.0,4.0,2.5,3.0,2.5,2.5,5.0,2.5,2.5,2.5,5.0,4.0,...,5.0,3.0,4.0,4.5,2.5,2.5,2.5,3.5,4.0,2.5,4.0,5.0,2.5,4.0,3.0,2.5,2.5,5.0,2.5,2.5,5.0,2.5,4.0,2.5,2.5,2.5,2.5,4.0,4.0,3.0,2.5,4.0,2.5,4.0,3.0,4.0,2.5,4.0,2.5,5.0
2,2.5,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,3.0,3.0,3.0,3.5,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,...,4.0,2.5,4.0,2.5,2.5,2.5,4.0,2.5,3.5,2.5,2.5,4.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,4.0,2.5,4.0,2.5,2.5,2.5,4.0,2.5,4.0,2.5,5.0,3.5,2.5,2.5,2.0,2.5
3,4.0,2.5,5.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.0,5.0,2.5,2.5,2.5,4.0,2.5,2.5,3.0,2.5,2.5,2.5,3.5,...,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.0,3.0,2.5,2.5,2.5,4.0,2.5,2.5,1.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.0,2.5
5,2.5,2.5,5.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,5.0,3.0,2.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,...,2.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5
6,4.0,2.5,4.0,2.5,2.5,5.0,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,4.0,4.5,2.5,3.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,3.0,2.5,2.5,4.5,2.5,4.5,...,5.0,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.5,4.0,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,5.0,3.5,2.5,3.0,2.5,2.5,2.5,3.0,4.5,2.5,2.5,3.0,4.0,3.0,2.5,2.5,2.5,2.5,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
174055,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,...,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.0,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5
176371,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,...,2.5,2.5,2.5,2.5,2.5,2.5,2.5,5.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.5,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5
177765,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,...,2.5,2.5,2.5,2.5,2.5,2.5,2.5,1.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,4.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5
179819,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.5,2.5,2.5,...,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,5.0,2.5,2.5,2.5,2.5,2.5,2.5,2.5,3.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5


##  User based recommendation system.

In user based system we just recommend the movie which has the highest cosine similarity with the given movie. 

<img src="https://github.com/Dovahkiin1638/Project_1-Recommendation-System/blob/main/image_content_based.png?raw=true"/>

In the following table `A`,`B`,`C` and `D` are the people with their respective ratings for films. 


|   | A   | B   | C   | D   |
|-  |-  |-  |-  |-  |
| Film 1  | 0.3   | 0.1   | 0.4   | 0.9   |
| Film 2  | 0.3   | 0.8   | 0.7   | 0.7   |
| Film 3  | 0.4   | 0.2   | 0.6   | 0.4   |
| Film 4  | 0.6   | 0.4   | 0.3   | 0.2   |

Cosine similarity is given by :

$$Cos(x, y) = x . y / ||x|| \times ||y||$$,

where,

$x,y$ are vectors

$x.y$ is dot product

$||x||$ is norm of $x$

$||y||$ is norm of $y$



**Example:** Here, Film 1 and Film 2 have ratings $(0.3, 0.1, 0.4, 0.9)$ and $(0.3, 0.8, 0.7, 0.7)$ by user $A, B, C$ and $D$ respectively. 

Let, 

$x=[ 0.3, 0.1, 0.4, 0.9 ]$ and

$y=[ 0.3, 0.8, 0.7, 0.7 ]$

There dot product:

 $x.y= 0.3 \times 0.3 + 0.1 \times 0.8 + 0.4 \times 0.7 + 0.9 \times 0.7 = 1.08$

 $||x||= \sqrt{(0.3)^2 + (0.1)^2 + (0.4)^2 + (0.9)^2} = 1.03$

 $||y||= \sqrt{(0.3)^2 + (0.8)^2 + (0.7)^2 + (0.7)^2} = 1.30$

 Hence, Cosine similarity is = $(x . y) / ||x|| \times ||y|| = 0.79$

let us consider the first user and recommend the movies similar to it. 

So, which movie does the user $1$ liked most? 

In [None]:
# Table to be used for calculation
data_table=missing_vals_table

In [None]:
# Argument of movie most liked by User 1

movie_liked_1=np.argmax(np.array(data_table[1]))

# Movie most liked by user1
movie_liked_1

34

In [None]:
most_liked_movie = np.array(data_table.loc[movie_liked_1])

#Ratings of most liked movie
most_liked_movie

array([2.5, 2.5, 4. , 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 4. , 4. , 2.5,
       2.5, 2.5, 2.5, 5. , 2.5, 2.5, 2.5, 3. , 2.5, 2.5, 2.5, 2.5, 5. ,
       2.5, 2.5, 5. , 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3. ,
       4.5, 2.5, 1. , 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2. , 2.5,
       4. , 2.5, 2.5, 2.5, 2.5, 2.5, 5. , 4. , 4. , 5. , 2.5, 2.5, 2.5,
       2.5, 2.5, 4. , 2.5, 2.5, 2.5, 3. , 2.5, 2.5, 2.5, 2.5, 2. , 2.5,
       3. , 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 1.5, 2.5, 2. , 2.5, 2.5,
       4. , 2.5, 2.5, 3. , 2.5, 2.5, 2.5, 2.5, 4. , 2.5, 5. , 2.5, 2.5,
       2.5, 2.5, 4. , 2.5, 4. , 2.5, 2.5, 5. , 5. , 2.5, 2.5, 2.5, 2.5,
       2.5, 2.5, 4. , 2.5, 4. , 2.5, 2.5, 5. , 2.5, 2.5, 2.5, 2.5, 2.5,
       2.5, 2.5, 5. , 4. , 2.5, 4.5, 2.5, 2.5, 1. , 2.5, 2.5, 2.5, 2.5,
       5. , 2. , 2.5, 2.5, 3. , 4. , 2.5, 5. , 2.5, 2.5, 2.5, 2.5, 2.5,
       2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 4. , 2.5, 2.5, 2.5, 2.5, 2.5, 4. ,
       4. , 5. , 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 4. , 4. , 3.5, 2.

In [None]:
movie_array=np.array(data_table)

In [None]:
# Total number of movies

len(movie_array)

2121

In [None]:
# Find most similar movies

from numpy import linalg as LA

best_movies=[]

for i in range(len(movie_array)):
  best_movies.append(np.dot(movie_array[i], movie_array[34])/(LA.norm(movie_array[i])*LA.norm(movie_array[34])))

In [None]:
#Arguments of best movies

best_movies_arguments=np.array(best_movies).argsort()[-2:][::-1]

In [None]:
# Movie similar to movie liked by the user is
print(best_movies_arguments)

[  34 1644]


In [None]:

print(data_table[1644:1645])


userId   1    4    6    7    10   11   15   ...  603  604  605  606  607  608  610
movieId                                     ...                                   
33162    2.5  2.5  2.5  3.5  2.5  2.5  2.5  ...  2.5  2.5  2.5  2.5  2.5  2.5  3.0

[1 rows x 378 columns]


In [None]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [None]:
print("Recommended movies is for User 1 is : ")

movies.loc[movies['movieId'] == 33162]



Recommended movies is for User 1 is : 


Unnamed: 0,movieId,title,genres
5883,33162,Kingdom of Heaven (2005),Action|Drama|Romance|War


Hence, movie recommended to User1 as per user based system is **Kingdom of Heaven (2005)**. 

## Collaborative filtering

Here we are now using our Collaborative Based system, which is done using the cosine simularity of 2 columns through a for loop.

<img src="https://github.com/Dovahkiin1638/Project_1-Recommendation-System/blob/main/image_collaborative.png?raw=true"/>

In [None]:
# Find most similar user to User1

most_similar_user=[]


for i in range(len(movie_array[0])):
  most_similar_user.append(np.dot(movie_array[:,i], movie_array[:,0])/(LA.norm(movie_array[:,i])*LA.norm(movie_array[:,0])))

In [None]:
most_similar_user_arguments=np.array(most_similar_user).argsort()[-2:][::-1]

Here we are finding the highest simularity between 2 users to recommend movies similar to each others liking in the user-based recommendation system. Similarly to above, we created a for loop to add the highest simularity of user ratings into a list and used the cosine simularity formula, in which we then recommend a movie to that user based off of the rating of another user similar to him/her.

In [None]:
print(most_similar_user_arguments)

[  0 308]


Hence, user 308 is most similar to First User1.

Hence movie rated highest by User 308. 

In [None]:
#Prints the movie that is rated highest by user 308
movie1=np.argmax(np.array(data_table[308]))
movie1

483

In [None]:
print(data_table[483:484])

userId   1    4    6    7    10   11   15   ...  603  604  605  606  607  608  610
movieId                                     ...                                   
1246     2.5  2.5  2.5  1.5  2.5  2.5  2.5  ...  3.0  2.5  2.5  4.0  2.5  2.5  2.5

[1 rows x 378 columns]


In [None]:
print("Recommended movies is for User 1 is : ")

movies.loc[movies['movieId'] == 1246]

Recommended movies is for User 1 is : 


Unnamed: 0,movieId,title,genres
945,1246,Dead Poets Society (1989),Drama


Hence, movie recommended for User1 as per collaborative filtering is **Dead Poets Society (1989) .**

## Possible Improvements

1. Using normalization to tackle missing values.

2. Using average user rating to tackle missing values.

3. Using average movie rating to tackle missing values.
