<a href="https://colab.research.google.com/github/cijagani/phd-work/blob/master/item_based_rs_demo_cijagani.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Attribute Information**

The two files from the data will be used in this study as ratings.csv and movies.csv

ratings.csv that contains ratings of movies by users:

*   userId
*   movieId
*   rating
*   timestamp 

movies.csv that contains movie information
*   movieId
*   title
*   genres


**Task Details**

> Step 1: Preparing the Data Set
  
>  Step 2: Creating User Movie Df

>  Step 3: Making Item-Based Movie Suggestions






# Step 1: Preparing the Data Set

**load required libraries and tools**

In [1]:
#import scientific computing package
import numpy as np

#for data manipulation and analysis
import pandas as pd

#Visualization library mainly for charts
import matplotlib.pyplot as plt


**get movices.csv file from user**

In [2]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving movies.csv to movies.csv
User uploaded file "movies.csv" with length 484688 bytes


**read uploaded CSV file**

In [3]:
movies = pd.read_csv("movies.csv")
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


**get ratings.csv file from user**

In [4]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving ratings.csv to ratings.csv
User uploaded file "ratings.csv" with length 2382886 bytes


**read uploaded CSV file**

In [5]:
ratings = pd.read_csv('ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


**apply left join**

In [52]:
df = movies.merge(ratings, how="left", on="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1.0,4.0,964982700.0
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5.0,4.0,847435000.0
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7.0,4.5,1106636000.0
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15.0,2.5,1510578000.0
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17.0,4.5,1305696000.0


# Step 2: Creating User Movie Df
Our main goal is to create the user_movie matrix with users in rows and movies in columns.

The total number of comments is 100854

In [53]:
df.shape

(100854, 6)

In [54]:
df["title"].nunique()

9737

**find unique movices from the dataset**

The number of unique movies is 9737

In [59]:
rating_counts = pd.DataFrame(df["title"].value_counts())
rating_counts.head()

Unnamed: 0,title
Forrest Gump (1994),329
"Shawshank Redemption, The (1994)",317
Pulp Fiction (1994),307
"Silence of the Lambs, The (1991)",279
"Matrix, The (1999)",278


Here the number of comments per movie can be seen.
Movies that do not receive much interaction can be excluded from the study in order to narrow the scope

In [60]:
rare_movies = rating_counts[rating_counts["title"] <= 50].index

common_movies = df[~df["title"].isin(rare_movies)]

common_movies.shape

(40712, 6)

After narrowing the scope to movies with 50 or more comments, the total number of comments is 40712 and the total number of movies is 437.

The next step is creating the user_movie matrix with users in rows and movies in columns.

In [61]:
user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")

user_movie_df.shape

(606, 437)

In [62]:
user_movie_df.head(10)

title,10 Things I Hate About You (1999),12 Angry Men (1957),2001: A Space Odyssey (1968),28 Days Later (2002),300 (2007),"40-Year-Old Virgin, The (2005)",A.I. Artificial Intelligence (2001),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),Air Force One (1997),Airplane! (1980),Aladdin (1992),Alien (1979),Aliens (1986),Almost Famous (2000),Amadeus (1984),"Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)",American Beauty (1999),American History X (1998),American Pie (1999),"American President, The (1995)",American Psycho (2000),Anchorman: The Legend of Ron Burgundy (2004),Animal House (1978),Annie Hall (1977),Apocalypse Now (1979),Apollo 13 (1995),Arachnophobia (1990),Armageddon (1998),Army of Darkness (1993),As Good as It Gets (1997),Austin Powers in Goldmember (2002),Austin Powers: International Man of Mystery (1997),Austin Powers: The Spy Who Shagged Me (1999),Avatar (2009),"Avengers, The (2012)",Babe (1995),Back to the Future (1985),...,Toy Story (1995),Toy Story 2 (1999),Toy Story 3 (2010),Traffic (2000),Training Day (2001),Trainspotting (1996),True Lies (1994),True Romance (1993),"Truman Show, The (1998)","Truth About Cats & Dogs, The (1996)",Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Twister (1996),Unbreakable (2000),"Untouchables, The (1987)",Up (2009),"Usual Suspects, The (1995)",V for Vendetta (2006),Vertigo (1958),WALL·E (2008),Wallace & Gromit: The Wrong Trousers (1993),War of the Worlds (2005),Waterworld (1995),Wayne's World (1992),Wedding Crashers (2005),"Wedding Singer, The (1998)",What Women Want (2000),What's Eating Gilbert Grape (1993),When Harry Met Sally... (1989),While You Were Sleeping (1995),Who Framed Roger Rabbit? (1988),Wild Wild West (1999),Willy Wonka & the Chocolate Factory (1971),"Wizard of Oz, The (1939)","Wolf of Wall Street, The (2013)",X-Men (2000),X-Men: The Last Stand (2006),X2: X-Men United (2003),Young Frankenstein (1974),Zombieland (2009),Zoolander (2001)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1.0,,,,,,,,4.0,,,,,,,4.0,,,,,5.0,5.0,,,,,,,4.0,,,,,,,5.0,,,,,5.0,...,4.0,,,,,,,,,,,3.0,,,,5.0,,,,,,,5.0,,4.0,,,,,5.0,,5.0,5.0,,5.0,,,5.0,,
2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,3.0,
3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4.0,,5.0,,,,,,,,,,,5.0,4.0,,,4.0,4.0,,5.0,,,,,,,,,,,,,,,4.0,4.0,,,,,...,,,,5.0,,,,,,4.0,2.0,,,,,,,,,,,,,,,,,,,,,4.0,5.0,,,,,,,
5.0,,,,,,,,,3.0,,3.0,,,4.0,,,,,,,,,,,,,,,3.0,,,,,,,,,,4.0,,...,4.0,,,,,,2.0,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,
6.0,,,,,,,,,3.0,2.0,3.0,,,5.0,,,,,,,,,4.0,,,,,,4.0,,,,,,,,,,4.0,,...,,,,,,,4.0,,,4.0,4.0,5.0,,,,1.0,,,,,,3.0,,,,,5.0,,4.0,,,3.0,,,,,,,,
7.0,,,4.0,,,,4.5,,,,,,,3.0,,,,,,4.0,,,,,,,,4.0,4.5,,4.0,,0.5,,3.5,2.0,,,,5.0,...,4.5,4.5,,,,,3.0,,3.0,,,,3.0,,,4.5,,,,,3.0,,,,,4.0,,,,,1.5,,,,3.5,4.0,4.0,,,
8.0,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,4.0,,,,,,,,,,5.0,,...,,,,,,,5.0,,,,3.0,,,,,5.0,,,,,,3.0,,,,,,,3.0,,,,,,,,,,,
9.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.0,,,,,,5.0,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
10.0,,,,,3.0,,,,,,,,,4.0,,,,,,1.0,,,,,,,,,,,,,3.5,,,,2.5,,,,...,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,3.0,,,,,,1.0,,,,,,


# Step 3: Making Item-Based Movie Suggestions

Now that we have the user-movie matrix, we can calculate the correlations. In user_movie_df the columns were the movie name, then if we fetch this column the user id-movie scores will come. This will be assigned to a variable named movie name.

In [63]:
movie_name = "Die Hard (1988)"
movie_name = user_movie_df[movie_name]

In [48]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
Die Hard (1988)                                1.000000
Crimson Tide (1995)                            0.645903
City of God (Cidade de Deus) (2002)            0.641139
Dark Knight Rises, The (2012)                  0.612144
Wallace & Gromit: The Wrong Trousers (1993)    0.610160
Outbreak (1995)                                0.593011
Social Network, The (2010)                     0.591668
Home Alone (1990)                              0.562301
Batman Begins (2005)                           0.555534
While You Were Sleeping (1995)                 0.547138
dtype: float64

The first movie is the movie we are looking for similar ones, so we are listing the movies other than that.

Below are the top five movies recommended by the item-based recommendation system for The Matrix movie.

In [49]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False)[1:6]

title
Crimson Tide (1995)                            0.645903
City of God (Cidade de Deus) (2002)            0.641139
Dark Knight Rises, The (2012)                  0.612144
Wallace & Gromit: The Wrong Trousers (1993)    0.610160
Outbreak (1995)                                0.593011
dtype: float64