### Item Based Collaborative Filtering
In this system, instead of finding relationship between users, used items like movies or stuffs are compared with each others.
In user based recommendation systems, habits of users can be changed. This situation makes hard to recommendation. However, in item based recommendation systems, movies or stuffs does not change. Therefore recommendation is easier.
On the other hand, there are almost 7 billion people all over the world. Comparing people increases the computational power. However, if items are compared, computational power is less.
In item based recommendation systems, we need to make user vs item matrix that we use also in user based recommender systems.
Each row is user and each column is items like movie, product or websites.
However, at this time instead of calculating similarity between rows, we need to calculate similarity between columns that are items like movies or stuffs.
Lets look at how it is works.
Example, there are similarities between lord of the rings and hobbit movies because both are liked by x different people. There is a similarity point between these two movies.
If the similarity is high enough, we can recommend hobbit to other people who had only watched lord of the rings movie.

In [1]:
## Import Libraries and packages
import pandas as pd

In [2]:
# read the downloaded file
movie_full = pd.read_csv('movie.csv', low_memory=False)

rating_full = pd.read_csv('rating.csv', low_memory=False)

In [3]:
movie_full.tail()

Unnamed: 0,movieId,title,genres
27273,131254,Kein Bund für's Leben (2007),Comedy
27274,131256,"Feuer, Eis & Dosenbier (2002)",Comedy
27275,131258,The Pirates (2014),Adventure
27276,131260,Rentun Ruusu (2001),(no genres listed)
27277,131262,Innocence (2014),Adventure|Fantasy|Horror


In [4]:
rating_full.tail()

Unnamed: 0,userId,movieId,rating,timestamp
20000258,138493,68954,4.5,2009-11-13 15:42:00
20000259,138493,69526,4.5,2009-12-03 18:31:48
20000260,138493,69644,3.0,2009-12-07 18:10:57
20000261,138493,70286,5.0,2009-11-13 15:42:24
20000262,138493,71619,2.5,2009-10-17 20:25:36


In [5]:
# Extract selected columns from movie and rating tables

movie = movie_full[['movieId','title']]

rating = rating_full[['userId','movieId','rating']]

In [6]:
# Merge movie and rating data
data_full = pd.merge(movie,rating)

In [7]:
data_full.describe()

Unnamed: 0,movieId,userId,rating
count,20000260.0,20000260.0,20000260.0
mean,9041.567,69045.87,3.525529
std,19789.48,40038.63,1.051989
min,1.0,1.0,0.5
25%,902.0,34395.0,3.0
50%,2167.0,69141.0,3.5
75%,4770.0,103637.0,4.0
max,131262.0,138493.0,5.0


In [8]:
data_full.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),3,4.0
1,1,Toy Story (1995),6,5.0
2,1,Toy Story (1995),8,4.0
3,1,Toy Story (1995),10,4.0
4,1,Toy Story (1995),11,4.5


In [9]:
data_full.shape

(20000263, 4)

In [10]:
data_sub = data_full.iloc[:1000000,:]

In [11]:
# Create table with user-title (aka user-item matrix)

pivot_table = data_sub.pivot_table(index = ["userId"],columns = ["title"],values = "rating")
pivot_table.head(10)

title,Ace Ventura: When Nature Calls (1995),Across the Sea of Time (1995),"Amazing Panda Adventure, The (1995)","American President, The (1995)",Angela (1995),Angels and Insects (1995),Anne Frank Remembered (1995),Antonia's Line (Antonia) (1995),Assassins (1995),Babe (1995),...,Unforgettable (1996),Up Close and Personal (1996),"Usual Suspects, The (1995)",Vampire in Brooklyn (1995),Waiting to Exhale (1995),When Night Is Falling (1995),"White Balloon, The (Badkonake sefid) (1995)",White Squall (1996),Wings of Courage (1995),"Young Poisoner's Handbook, The (1995)"
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,3.5,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,5.0,,,,,,,
4,3.0,,,,,,,,,,...,,,,,,,,,,
5,,,,5.0,,,,,,,...,,2.0,,,,,,,,
6,,,,,,,,,,,...,,4.0,,,,,,,,
7,,,,4.0,,,,,,,...,,,,,,,,,,
8,1.0,,,,,,,,,,...,,,,,,,,,,
10,,,,4.0,,,,,,,...,,,,,,,,,,
11,3.5,,,,,,,,,,...,,,,,,,,,,


In [16]:
# movie_watched = pivot_table["Up Close and Personal (1996)"]
movie_watched = pivot_table["Up Close and Personal (1996)"]
similarity_with_other_movies = pivot_table.corrwith(movie_watched)  # find correlation between "Bad Boys (1995)" and other movies
similarity_with_other_movies = similarity_with_other_movies.sort_values(ascending=False)
similarity_with_other_movies.head()

title
Up Close and Personal (1996)                           1.000000
Guardian Angel (1994)                                  0.898546
Wings of Courage (1995)                                0.889001
Race the Sun (1996)                                    0.626593
Silences of the Palace, The (Saimt el Qusur) (1994)    0.589256
dtype: float64