# Making Movies Recommendations Based on Popularity

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellín. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

## Restaurants data

In [None]:
import numpy as np
import pandas as pd

In [None]:
# rating.csv
url = 'https://drive.google.com/file/d/1wkuzj5jw4LL0UKBb5PtXUC5lxXLyNlma/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)#Rating of movies

#movies.csv
url='https://drive.google.com/file/d/1Orz5l34W7ZydmmdH7IlNUEBM_ylvKSDd/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
movies=pd.read_csv(path)#Movies


# links.csv
url = 'https://drive.google.com/file/d/1yZfbSchTk3Ong5vaDrxk2-8dNgEvKcAN/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
links = pd.read_csv(path)

# tags.csv'
# url = 'https://drive.google.com/file/d/1vv_OO0D2zN0vDwBusUE3El7rbPrGDyG-/view?usp=sharing'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
tags = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

In [None]:
frame.head(3)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224


In the `movies` dataset we have info about the movies. We will  use the `title` column.

In [None]:
movies.head(2)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy


In [None]:
tags.head(3)

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0


## Popularity/Quality based recommmender system

Let's group places by rating, and look at their average rating. This is an **explicit** rating given by users.

In [None]:
rating = pd.DataFrame(frame.groupby('movieId')['rating'].mean())
rating.sort_values("rating", ascending=False).head()

Unnamed: 0_level_0,rating
movieId,Unnamed: 1_level_1
88448,5.0
100556,5.0
143031,5.0
143511,5.0
143559,5.0


The top rated places have a perfect score of 5/5. But how many reviews do these places have?

In [None]:
frame.query("movieId==88448")

Unnamed: 0,userId,movieId,rating,timestamp
77875,483,88448,5.0,1315437602


Only 1 person rated this place.

We can also look at how many times each movie has received a rating. The ratings count is an **implicit** rating.

In [None]:
rating['rating_count'] = frame.groupby('movieId')['rating'].count()
rating.sort_values("rating_count", ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
356,4.164134,329
318,4.429022,317
296,4.197068,307
593,4.16129,279
2571,4.192446,278


Some places have been visited around 300 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular place, and get some info about it:

In [None]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
356,4.164134,329
318,4.429022,317
296,4.197068,307
593,4.16129,279
2571,4.192446,278


In [None]:
# movieId of most popular place
top_popular_placeID = rating.sort_values('rating_count', ascending=False).head(1).index[0]
top_popular_placeID#356-the most popular movie

# name of the most popular place
movies[movies['movieId']==top_popular_placeID]#The most popular is Forest Gump (1994)

Unnamed: 0,movieId,title,genres
314,356,Forrest Gump (1994),Comedy|Drama|Romance|War


The most popular movie is "Forest Gump(1994)", a comedy/drama of 1994 year. It has rating of 4.16 and received 329 reviews



Below is a hybrid system to sort movies, so that you can recommend the "best" movies: movies that are both high rated and popular.

In [None]:
rating.head()

Unnamed: 0_level_0,rating,rating_count
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3.92093,215
2,3.431818,110
3,3.259615,52
4,2.357143,7
5,3.071429,49


In [None]:
rating['overall_rating'] = round(rating['rating'].mean())
rating.sort_values(by='overall_rating', ascending=False).head()

In [None]:
overall_df = rating.merge(movies, on='movieId')
overall_df.sort_values(by='rating_count', ascending=False).head(10)

Unnamed: 0,movieId,rating,rating_count,overall_rating,title,genres
314,356,4.164134,329,3,Forrest Gump (1994),Comedy|Drama|Romance|War
277,318,4.429022,317,3,"Shawshank Redemption, The (1994)",Crime|Drama
257,296,4.197068,307,3,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller
510,593,4.16129,279,3,"Silence of the Lambs, The (1991)",Crime|Horror|Thriller
1938,2571,4.192446,278,3,"Matrix, The (1999)",Action|Sci-Fi|Thriller
224,260,4.231076,251,3,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi
418,480,3.75,238,3,Jurassic Park (1993),Action|Adventure|Sci-Fi|Thriller
97,110,4.031646,237,3,Braveheart (1995),Action|Drama|War
507,589,3.970982,224,3,Terminator 2: Judgment Day (1991),Action|Sci-Fi
461,527,4.225,220,3,Schindler's List (1993),Drama|War
