<a href="https://www.kaggle.com/code/anucoolchandra/recommender-systems?scriptVersionId=110358977" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [13]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/movielens-100k-dataset/ml-100k/u.occupation
/kaggle/input/movielens-100k-dataset/ml-100k/u1.base
/kaggle/input/movielens-100k-dataset/ml-100k/u.info
/kaggle/input/movielens-100k-dataset/ml-100k/u4.test
/kaggle/input/movielens-100k-dataset/ml-100k/u.item
/kaggle/input/movielens-100k-dataset/ml-100k/README
/kaggle/input/movielens-100k-dataset/ml-100k/u1.test
/kaggle/input/movielens-100k-dataset/ml-100k/ua.test
/kaggle/input/movielens-100k-dataset/ml-100k/u.data
/kaggle/input/movielens-100k-dataset/ml-100k/u5.test
/kaggle/input/movielens-100k-dataset/ml-100k/mku.sh
/kaggle/input/movielens-100k-dataset/ml-100k/u5.base
/kaggle/input/movielens-100k-dataset/ml-100k/u.user
/kaggle/input/movielens-100k-dataset/ml-100k/ub.base
/kaggle/input/movielens-100k-dataset/ml-100k/u4.base
/kaggle/input/movielens-100k-dataset/ml-100k/u2.test
/kaggle/input/movielens-100k-dataset/ml-100k/ua.base
/kaggle/input/movielens-100k-dataset/ml-100k/u3.test
/kaggle/input/movielens-100k-dataset/ml-100k/u.

## What is a recommender system?

Recommender system is a framework for recommending things that one might be interested in purchasing/using based on one's past behaviour of purchasing/using. Example- The "*People who bought this also bought*" section on Amazon.com

### Collaboritave Filtering

Collaborative Filtering is just a fancy name for saying 'recommending stuffs based on the combination of what you did and what everybody else did'. So, it is looking at your behaviour and comparing that to everyone else's behaviour, to arrive at the things that might be interesting to you that you might not heard of.

### User Based Collaborative Filtering

This technique finds similarity among users and suggests what other users are using (watching in our case of movies).

### Item Based Collaborative Filtering

This technique finds similarity among items and suggests items which are similar to each other.

## Pipeline for Movie Recommender System

1. Dataset Loading
2. Selecting a user for which movies are to be recommended
3. Creating a matrix of movies which the user has watched.
4. Sorting the movies on the basis of ratings given by that user.
5. Creating a correlation matrix of 
    (movies watched by this user) vs (other movies in the dataset)
6. Creating a list of highly correlated movies
7. Dropping the movies already watched by the user from the list

In this way, we will get a list of say top 5 or 10 or 1 movies which the user should watch according to his movie watching and liking behaviour. 

In [14]:
#Importing relevant libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [15]:
#Loading the Movielens dataset
data1 = pd.read_csv('../input/movielens100k/movies.csv')
data1.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [16]:
data2 = pd.read_csv('../input/movielens100k/ratings.csv')
data2.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [34]:
#Merging both datasets to one
data = pd.merge(data1,data2,on = 'movieId')
data.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,3.0,851866703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,9,4.0,938629179
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,13,5.0,1331380058
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.0,997938310
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,19,3.0,855190091


In [18]:
#Creating a pivot table where there is one row for user ID, and ratings which that
#particular user gave to each of the movie in the dataset.

userRatings = data.pivot_table(index=['userId'],columns=['title'],values='rating')
userRatings

title,"""Great Performances"" Cats (1998)",$9.99 (2008),'Hellboy': The Seeds of Creation (2004),'Neath the Arizona Skies (1934),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),...,Zulu (1964),Zulu (2013),[REC] (2007),eXistenZ (1999),loudQUIETloud: A Film About the Pixies (2006),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931),İtirazım Var (2014)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
667,,,,,,,,,,,...,,,,,,,,,,
668,,,,,,,,,,,...,,,,,,,,,,
669,,,,,,,,,,,...,,,,,,,,,,
670,,,,,,,,,,,...,,,,,,,,,,


This table contains
users for every row
and movies for every column

Thus we have every user rating for every movie here. The NaN values here indicate that the particular user did not rate the particular movie.


The concept here is item based collaborative filtering. We will be finding the relationships the column. If we get a correlation score between any two columns, it will bw the correlation score between any movie pair.

We are going to take help of pandas' corr function that will compute the correlation score for every column pair found in the entire matrix.

In [19]:
corrMatrix = userRatings.corr(method='pearson')
corrMatrix

title,"""Great Performances"" Cats (1998)",$9.99 (2008),'Hellboy': The Seeds of Creation (2004),'Neath the Arizona Skies (1934),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),...,Zulu (1964),Zulu (2013),[REC] (2007),eXistenZ (1999),loudQUIETloud: A Film About the Pixies (2006),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931),İtirazım Var (2014)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"""Great Performances"" Cats (1998)",1.0,,,,,,,,,,...,,,,,,,,,,
$9.99 (2008),,1.0,,,,,,,,1.000000,...,,,,,,,,,,
'Hellboy': The Seeds of Creation (2004),,,,,,,,,,,...,,,,,,,,,,
'Neath the Arizona Skies (1934),,,,,,,,,,,...,,,,,,,,,,
'Round Midnight (1986),,,,,1.0,,,,,1.000000,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
xXx (2002),,,,,,,,0.944911,,0.424179,...,,,,1.000000,,1.000000,,-0.461163,,
xXx: State of the Union (2005),,,,,,,,,,,...,,,,,,,,,,
¡Three Amigos! (1986),,,,,,,,0.404226,,-0.617213,...,,,,0.043321,,-0.461163,,1.000000,,
À nous la liberté (Freedom for Us) (1931),,,,,,,,,,,...,,,,,,,,,,


In [20]:
#An idea is that, we only want to include those movies in suggestions which has been
#rated by atlest 50 people
#Using the min_period paramter
corrMatrix = userRatings.corr(method='pearson',min_periods=50)
corrMatrix

title,"""Great Performances"" Cats (1998)",$9.99 (2008),'Hellboy': The Seeds of Creation (2004),'Neath the Arizona Skies (1934),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),...,Zulu (1964),Zulu (2013),[REC] (2007),eXistenZ (1999),loudQUIETloud: A Film About the Pixies (2006),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931),İtirazım Var (2014)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"""Great Performances"" Cats (1998)",,,,,,,,,,,...,,,,,,,,,,
$9.99 (2008),,,,,,,,,,,...,,,,,,,,,,
'Hellboy': The Seeds of Creation (2004),,,,,,,,,,,...,,,,,,,,,,
'Neath the Arizona Skies (1934),,,,,,,,,,,...,,,,,,,,,,
'Round Midnight (1986),,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
xXx (2002),,,,,,,,,,,...,,,,,,,,,,
xXx: State of the Union (2005),,,,,,,,,,,...,,,,,,,,,,
¡Three Amigos! (1986),,,,,,,,,,,...,,,,,,,,,,
À nous la liberté (Freedom for Us) (1931),,,,,,,,,,,...,,,,,,,,,,


### Understanding Movie Recommendations with an Example
For example, we want to find movie suggestions for user ID no. 65



In [21]:
user65Rating = userRatings.loc[65].dropna()
user65Rating

title
2001: A Space Odyssey (1968)                         4.0
All That Jazz (1979)                                 5.0
Annie Hall (1977)                                    5.0
Apartment, The (1960)                                5.0
Back to the Future (1985)                            3.0
Bananas (1971)                                       5.0
Being There (1979)                                   5.0
Bonnie and Clyde (1967)                              5.0
Butch Cassidy and the Sundance Kid (1969)            4.0
Deliverance (1972)                                   4.0
Duck Soup (1933)                                     4.0
Godfather, The (1972)                                5.0
Godfather: Part II, The (1974)                       4.0
Gone with the Wind (1939)                            2.0
Hard Day's Night, A (1964)                           5.0
Killing Fields, The (1984)                           5.0
Local Hero (1983)                                    2.0
M*A*S*H (a.k.a. MASH) (19

In [22]:
#Creating a series for similar people like user65 going through every movie that
#user65 rated.
simUsers = pd.Series()
for i in range(len(user65Rating.index)):
    print('Adding Sims for ',user65Rating.index[i],'...')
    #Retreiving similar movies that user 65 rated
    sims = corrMatrix[user65Rating.index[i]].dropna()
    #Scaling its similarity by how well user rated these movies
    sims = sims.map(lambda x:x*user65Rating[i])
    #Add the score to the list of similar users
    simUsers = simUsers.append(sims)

#Glance of our results so far
print('sorting...')
simUsers.sort_values(inplace = True, ascending = False)
print(simUsers.head(10))

Adding Sims for  2001: A Space Odyssey (1968) ...
Adding Sims for  All That Jazz (1979) ...
Adding Sims for  Annie Hall (1977) ...
Adding Sims for  Apartment, The (1960) ...
Adding Sims for  Back to the Future (1985) ...
Adding Sims for  Bananas (1971) ...
Adding Sims for  Being There (1979) ...
Adding Sims for  Bonnie and Clyde (1967) ...
Adding Sims for  Butch Cassidy and the Sundance Kid (1969) ...
Adding Sims for  Deliverance (1972) ...
Adding Sims for  Duck Soup (1933) ...
Adding Sims for  Godfather, The (1972) ...
Adding Sims for  Godfather: Part II, The (1974) ...
Adding Sims for  Gone with the Wind (1939) ...
Adding Sims for  Hard Day's Night, A (1964) ...
Adding Sims for  Killing Fields, The (1984) ...
Adding Sims for  Local Hero (1983) ...
Adding Sims for  M*A*S*H (a.k.a. MASH) (1970) ...
Adding Sims for  Midnight Cowboy (1969) ...
Adding Sims for  Ordinary People (1980) ...
Adding Sims for  Player, The (1992) ...
Adding Sims for  Roger & Me (1989) ...
Adding Sims for  Sleepe

### Using the groupby command to combine rows

We are going to use the groupby command to group together all the rows that are for the same movie. Next we will sum up their correlation scores and look at the scores

In [23]:
simUsers = simUsers.groupby(simUsers.index).sum()
simUsers.sort_values(inplace = True, ascending = False)
simUsers.head()

Star Wars: Episode IV - A New Hope (1977)                11.508255
Godfather, The (1972)                                    11.386950
Godfather: Part II, The (1974)                           10.891982
Star Wars: Episode V - The Empire Strikes Back (1980)     9.968625
Star Wars: Episode VI - Return of the Jedi (1983)         9.282729
dtype: float64

### Filtering out already rated movies

We need to filter out the movies which the user65 has already rated because it doesnt make sense to recommend movies which the user has already watched.

In [33]:
filteredSims = simUsers.drop(user65Rating.index, errors = 'ignore')
print('Top 10 Movies user65 should watch...\n\n',filteredSims.head(10))

Top 10 Movies user65 should watch...

 Star Wars: Episode V - The Empire Strikes Back (1980)                             9.968625
One Flew Over the Cuckoo's Nest (1975)                                            8.756187
Goodfellas (1990)                                                                 7.459060
Braveheart (1995)                                                                 7.172702
Amadeus (1984)                                                                    6.771609
Casablanca (1942)                                                                 6.201660
E.T. the Extra-Terrestrial (1982)                                                 6.173970
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    6.108919
Men in Black (a.k.a. MIB) (1997)                                                  6.095050
Groundhog Day (1993)                                                              5.927587
dtype: float64
