# Movie Recommendation System

A movie recommendation system, also called as a movie recommender system, uses machine learning (ML) to predict or filter users' movie interests based on their prior decisions and actions. It's an advanced filtration that predicts the consumer in question's potential movie preferences and selections for a domain-specific item, like just a movie.



#### In this project we will - 
<ul>
    <li>Bulid a movie recommendation system</li>
</ul>    

<br>

Dataset : <a href="https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset"> MovieLens 100k Dataset from kaggle</a>

#### We are going to use two files form the dataset:

<ul>
    <li>u.item: contains Movied ID and name </li>
    <li>u.data: contains user reviews </li>
</ul>

In [1]:
# importing libraries
import pandas as pd

## for u.data

In [2]:
column_names = ['user_id','item_id','rating','timestamp']
df= pd.read_csv('dataset/u.data',sep="\t",names = column_names)

In [3]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [4]:
len(df)

100000

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype
---  ------     --------------   -----
 0   user_id    100000 non-null  int64
 1   item_id    100000 non-null  int64
 2   rating     100000 non-null  int64
 3   timestamp  100000 non-null  int64
dtypes: int64(4)
memory usage: 3.1 MB


## For u.items

In [8]:
m_cols = ['item_id','title']
movie_titles = pd.read_csv('dataset/u.item',sep="|",names = m_cols, usecols=range(2),encoding='latin-1')
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [9]:
movie_titles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1682 entries, 0 to 1681
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   item_id  1682 non-null   int64 
 1   title    1682 non-null   object
dtypes: int64(1), object(1)
memory usage: 26.4+ KB


In [10]:
len(movie_titles)

1682

In [12]:
df= pd.merge(df, movie_titles,on='item_id')
df.tail()

Unnamed: 0,user_id,item_id,rating,timestamp,title_x,title_y
99995,840,1674,4,891211682,Mamma Roma (1962),Mamma Roma (1962)
99996,655,1640,3,888474646,"Eighth Day, The (1996)","Eighth Day, The (1996)"
99997,655,1637,3,888984255,Girls Town (1996),Girls Town (1996)
99998,655,1630,3,887428735,"Silence of the Palace, The (Saimt el Qusur) (1...","Silence of the Palace, The (Saimt el Qusur) (1..."
99999,655,1641,3,887427810,Dadetown (1995),Dadetown (1995)


## Movie recommender

In [15]:
moviepivots = df.pivot_table(index='user_id', columns='title_x', values='rating')
moviepivots.head()

title_x,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,2.0,,,,,4.0,,,...,,,,4.0,,,,,4.0,


In [16]:
type(moviepivots)

pandas.core.frame.DataFrame

In [18]:
movie_user_rating = moviepivots['101 Dalmatians (1996)']
movie_user_rating.head(10)

user_id
1     2.0
2     NaN
3     NaN
4     NaN
5     2.0
6     NaN
7     NaN
8     NaN
9     NaN
10    NaN
Name: 101 Dalmatians (1996), dtype: float64

In [19]:
similar_to_movie = moviepivots.corrwith(movie_user_rating)
similar_to_movie

  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)


title_x
'Til There Was You (1997)               -1.000000
1-900 (1994)                                  NaN
101 Dalmatians (1996)                    1.000000
12 Angry Men (1957)                     -0.049890
187 (1997)                               0.269191
                                           ...   
Young Guns II (1990)                     0.680414
Young Poisoner's Handbook, The (1995)    0.000000
Zeus and Roxanne (1997)                  0.707107
unknown                                       NaN
Á köldum klaka (Cold Fever) (1994)            NaN
Length: 1664, dtype: float64

In [20]:
corrmovies = pd.DataFrame(similar_to_movie,columns = ['Correlation'])
corrmovies.dropna(inplace=True)
corrmovies.head()

Unnamed: 0_level_0,Correlation
title_x,Unnamed: 1_level_1
'Til There Was You (1997),-1.0
101 Dalmatians (1996),1.0
12 Angry Men (1957),-0.04989
187 (1997),0.269191
2 Days in the Valley (1996),0.048973


In [21]:
corrmovies.sort_values('Correlation',ascending=False).head(10)

Unnamed: 0_level_0,Correlation
title_x,Unnamed: 1_level_1
Hard Rain (1998),1.0
"Winter Guest, The (1997)",1.0
Fatal Instinct (1993),1.0
Faithful (1996),1.0
Trial by Jury (1994),1.0
April Fool's Day (1986),1.0
House Party 3 (1994),1.0
Grateful Dead (1995),1.0
"Tie That Binds, The (1995)",1.0
Frisk (1995),1.0


In [25]:
df= df.drop(['timestamp'],axis=1)
df=df.drop(['title_y'],axis=1)


In [26]:
df.head(10)

Unnamed: 0,user_id,item_id,rating,title_x
0,196,242,3,Kolya (1996)
1,63,242,3,Kolya (1996)
2,226,242,5,Kolya (1996)
3,154,242,3,Kolya (1996)
4,306,242,5,Kolya (1996)
5,296,242,4,Kolya (1996)
6,34,242,5,Kolya (1996)
7,271,242,4,Kolya (1996)
8,201,242,4,Kolya (1996)
9,209,242,4,Kolya (1996)


In [28]:
ratings = pd.DataFrame(df.groupby('title_x')['rating'].mean())
ratings.sort_values('rating',ascending=False)

ratings.head(10)


Unnamed: 0_level_0,rating
title_x,Unnamed: 1_level_1
'Til There Was You (1997),2.333333
1-900 (1994),2.6
101 Dalmatians (1996),2.908257
12 Angry Men (1957),4.344
187 (1997),3.02439
2 Days in the Valley (1996),3.225806
"20,000 Leagues Under the Sea (1954)",3.5
2001: A Space Odyssey (1968),3.969112
3 Ninjas: High Noon At Mega Mountain (1998),1.0
"39 Steps, The (1935)",4.050847


In [30]:
ratings['rating_count'] = pd.DataFrame(df.groupby('title_x')['rating'].count())
ratings.head()

Unnamed: 0_level_0,rating,rating_count
title_x,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41


In [31]:
ratings.sort_values('rating_count',ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
title_x,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),4.358491,583
Contact (1997),3.803536,509
Fargo (1996),4.155512,508
Return of the Jedi (1983),4.00789,507
Liar Liar (1997),3.156701,485


In [32]:
corrmovies.sort_values('Correlation',ascending=False).head()

Unnamed: 0_level_0,Correlation
title_x,Unnamed: 1_level_1
Hard Rain (1998),1.0
"Winter Guest, The (1997)",1.0
Fatal Instinct (1993),1.0
Faithful (1996),1.0
Trial by Jury (1994),1.0


In [33]:
corrmovies = corrmovies.join(ratings['rating_count'])
corrmovies.head(10)


Unnamed: 0_level_0,Correlation,rating_count
title_x,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),-1.0,9
101 Dalmatians (1996),1.0,109
12 Angry Men (1957),-0.04989,125
187 (1997),0.269191,41
2 Days in the Valley (1996),0.048973,93
"20,000 Leagues Under the Sea (1954)",0.266928,72
2001: A Space Odyssey (1968),-0.043407,259
"39 Steps, The (1935)",0.111111,59
8 1/2 (1963),0.522233,38
8 Seconds (1994),1.0,4


## The Result:

In [35]:
corrmovies[corrmovies['rating_count']>150].sort_values('Correlation',ascending=False).head(10)

Unnamed: 0_level_0,Correlation,rating_count
title_x,Unnamed: 1_level_1,Unnamed: 2_level_1
Murder at 1600 (1997),0.663965,218
Gone with the Wind (1939),0.512581,171
"Piano, The (1993)",0.498792,168
Top Gun (1986),0.492492,220
"Hunt for Red October, The (1990)",0.483807,227
Liar Liar (1997),0.469765,485
George of the Jungle (1997),0.458967,162
Good Will Hunting (1997),0.430095,198
"Lion King, The (1994)",0.426573,220
Dragonheart (1996),0.413787,158
