# Item-Based Recommendation System using the Datasets *'movie and rating'* 

# Business problem

### An online movie viewing platform wants to develop a recommendation system with collaborative filtering. Trying out content-based recommendation systems, the company wants to develop recommendations to accommodate my place in the community. When the pioneers like a movie, it is desirable to learn about other movies that have a similar veil of appreciation.

# Dataset story

### The data set is provided by the mobile lens, it contains the movies and the scores given to these movies. The dataset contains more than 20000000 ratings for approximately 27000 movies per hour.

# Variables

### There are many different tables in this dataset, but there are 2 CSV files to use.

#### movie.csv

* movieId - Unique movie number

* title - movie name

#### rating.csv

* userid - Unique user number

* movieId - Unique movie number

* rating - the rating given to the movie by the user

* timestamp - review date

# Importing the libraries

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)

# Reading and combining the datasets

In [2]:
rating = pd.read_csv('/kaggle/input/movie-ratingcsv/rating.csv')
movie = pd.read_csv('/kaggle/input/movie-ratingcsv/movie.csv')
df_ = movie.merge(rating, how='left', on='movieId')
df = df_.copy()
df.columns = [col.lower() for col in df.columns]
df.head()

Unnamed: 0,movieid,title,genres,userid,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


In [3]:
df.shape

(20000797, 6)

# Creation of user movie dataframe

## Determination of the unique movie count

In [4]:
df.title.nunique()

27262

## Determination of rating counts of each movie

In [5]:
df.title.value_counts().head()

Pulp Fiction (1994)                 67310
Forrest Gump (1994)                 66172
Shawshank Redemption, The (1994)    63366
Silence of the Lambs, The (1991)    63299
Jurassic Park (1993)                59715
Name: title, dtype: int64

## Removing the movies taking rating less than 1000

In [6]:
comments_count = pd.DataFrame(df.title.value_counts())
comments_count[comments_count['title'] <= 1000]

Unnamed: 0,title
"Bear, The (Ours, L') (1988)",999
Rosewood (1997),999
Ted (2012),999
One Night at McCool's (2001),999
Marked for Death (1990),998
...,...
Rapture (Arrebato) (1980),1
"Education of Mohammad Hussein, The (2013)",1
Satanas (2007),1
Psychosis (2010),1


### Thus, we get the movies taking ratings less than 1000 and their names are also as follows since they are in index.

In [7]:
comments_count[comments_count['title'] <= 1000].index

Index(['Bear, The (Ours, L') (1988)', 'Rosewood (1997)', 'Ted (2012)',
       'One Night at McCool's (2001)', 'Marked for Death (1990)',
       'Three to Tango (1999)', 'Adam's Rib (1949)',
       'I Now Pronounce You Chuck and Larry (2007)',
       'Italian for Beginners (Italiensk for begyndere) (2000)',
       'Husbands and Wives (1992)',
       ...
       'Satan's Sword (Daibosatsu tôge) (1960)',
       'Blind Massage (Tui na) (2014)', 'Prêt à tout (2014)',
       'Ditchdigger's Daughters, The (1997)', 'A.K. (1985)',
       'Rapture (Arrebato) (1980)',
       'Education of Mohammad Hussein, The (2013)', 'Satanas (2007)',
       'Psychosis (2010)', 'Innocence (2014)'],
      dtype='object', length=24103)

### The rare movies can be named 'rare_movies' as follows:

In [8]:
rare_movies = comments_count[comments_count['title'] <= 1000].index

### Removing the rare movies from the dataframe and thus the new dataframe is named 'common_movies'

In [9]:
common_movies = df[~df['title'].isin(rare_movies)]

In [10]:
common_movies.shape

(17766015, 6)

### Thus, about 3 million movies were removed from the dataframe. Subsequently, we check the number of unique movies again. While approximately 18 million points were given to 3000 films, 3 billion points were given to approximately 24 thousand films. points have been given. Therefore, this process is so logical.

In [11]:
common_movies.title.nunique()

3159

### So, the movie number was reduced from about 27000 to 3159 through the above processes

### Now we can create the user_movie_df using the method 'pivot_table'. In this process, userids are in the observations and movie titles are in the variables and at the intersection of these two there are ratings.

In [12]:
user_movie_df = common_movies.pivot_table(index=['userid'], columns=['title'], values='rating')
user_movie_df.shape

(138493, 3159)

### So, there are 138493 abservations that occurred userids and 3159 variables that occurred movie titles.

# Making Item-based Movie Recommendation

### Here, when we examine the correlations between a movie and other movies, as if looking at the correlation between two variables, we actually find the similarities of the movies.

In [13]:
movie_name = 'Matrix, The (1999)'
movie_name = user_movie_df[movie_name]

### Let's find the correlation between this selected movie and other movies

In [14]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
Matrix, The (1999)                                           1.000000
Matrix Reloaded, The (2003)                                  0.516906
Matrix Revolutions, The (2003)                               0.449588
Animatrix, The (2003)                                        0.367151
Blade (1998)                                                 0.334493
Terminator 2: Judgment Day (1991)                            0.333882
Minority Report (2002)                                       0.332434
Edge of Tomorrow (2014)                                      0.326762
Mission: Impossible (1996)                                   0.320815
Lord of the Rings: The Fellowship of the Ring, The (2001)    0.318726
dtype: float64

### The results obtained here are invaluable. Here, the proposal was enriched and the consensus of a large community was taken behind it. Thus, collaborative filtering was accomplished.

### Let's take another movie

In [15]:
movie_name = "Ocean's Twelve (2004)"
movie_name = user_movie_df[movie_name]

In [16]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
Ocean's Twelve (2004)                                 1.000000
Ocean's Thirteen (2007)                               0.681654
Ocean's Eleven (2001)                                 0.551280
Eddie (1996)                                          0.474808
National Treasure: Book of Secrets (2007)             0.474230
Eagle Eye (2008)                                      0.473061
Pirates of the Caribbean: On Stranger Tides (2011)    0.472446
Ocean's Eleven (a.k.a. Ocean's 11) (1960)             0.470412
Analyze That (2002)                                   0.459010
Bad Boys II (2003)                                    0.458827
dtype: float64

### Let's choose a random movie from the data set and make suggestions

In [17]:
movie_name = pd.Series(user_movie_df.columns).sample(1).values[0]
movie_name = user_movie_df[movie_name]
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

title
Brady Bunch Movie, The (1995)                   1.000000
Very Brady Sequel, A (1996)                     0.779913
Stuck on You (2003)                             0.433953
Adventures of Tintin, The (2011)                0.433164
Spy Kids 2: The Island of Lost Dreams (2002)    0.432114
Flipper (1996)                                  0.417800
Spy Kids (2001)                                 0.414020
Beethoven (1992)                                0.409241
Kindergarten Cop (1990)                         0.407017
Hot Chick, The (2002)                           0.403465
dtype: float64

### The selected movie name was 'Flirting With Disaster (1996)'

### Let's fetch movies that contain any keyword

In [18]:
def check_film(dataframe, keyword):
    return [col for col in dataframe.columns if keyword in col]

### For example, let's bring movies containing the word 'Sherlock'

In [19]:
check_film(user_movie_df, 'Sherlock')

['Sherlock Holmes (2009)',
 'Sherlock Holmes: A Game of Shadows (2011)',
 'Young Sherlock Holmes (1985)']

### Now, let's bring movies containing the word 'Insomnia'

In [20]:
check_film(user_movie_df, 'Insomnia')

['Insomnia (1997)', 'Insomnia (2002)']

### Now, let's bring movies containing the word 'Spider'

In [21]:
check_film(user_movie_df, 'Spider')

['Along Came a Spider (2001)',
 'Amazing Spider-Man, The (2012)',
 'Spider (2002)',
 'Spider-Man (2002)',
 'Spider-Man 2 (2004)',
 'Spider-Man 3 (2007)']

# Thank you very much for checking my notebook!