# Making Recommendations Based on Popularity

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellín. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

## Restaurants data

In [1]:
import numpy as np
import pandas as pd

In [2]:
# rating_final.csv
url = 'https://drive.google.com/file/d/1ptu4AlEXO4qQ8GytxKHoeuS1y4l_zWkC/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)

# chefmozcuisine.csv
url = 'https://drive.google.com/file/d/1S0_EGSRERIkSKW4D8xHPGZMqvlhuUzp1/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
cuisine = pd.read_csv(path)

# 'geoplaces2.csv'
url = 'https://drive.google.com/file/d/1ee3ib7LqGsMUksY68SD9yBItRvTFELxo/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
geodata = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

On the "frame" dataset we have the ratings users have given to places. Ratings go from 0 to 2.

In [3]:
frame.head(50)

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2
5,U1068,132740,0,0,0
6,U1068,132663,1,1,1
7,U1068,132732,0,0,0
8,U1068,132630,1,1,1
9,U1067,132584,2,2,2


In the `geodata` dataset we have info about the places. We will only use the `name` column.

In [4]:
geodata

Unnamed: 0,placeID,latitude,longitude,the_geom_meter,name,address,city,state,country,fax,...,alcohol,smoking_area,dress_code,accessibility,price,url,Rambience,franchise,area,other_services
0,134999,18.915421,-99.184871,0101000020957F000088568DE356715AC138C0A525FC46...,Kiku Cuernavaca,Revolucion,Cuernavaca,Morelos,Mexico,?,...,No_Alcohol_Served,none,informal,no_accessibility,medium,kikucuernavaca.com.mx,familiar,f,closed,none
1,132825,22.147392,-100.983092,0101000020957F00001AD016568C4858C1243261274BA5...,puesto de tacos,esquina santos degollado y leon guzman,s.l.p.,s.l.p.,mexico,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,open,none
2,135106,22.149709,-100.976093,0101000020957F0000649D6F21634858C119AE9BF528A3...,El Rincón de San Francisco,Universidad 169,San Luis Potosi,San Luis Potosi,Mexico,?,...,Wine-Beer,only at bar,informal,partially,medium,?,familiar,f,open,none
3,132667,23.752697,-99.163359,0101000020957F00005D67BCDDED8157C1222A2DC8D84D...,little pizza Emilio Portes Gil,calle emilio portes gil,victoria,tamaulipas,?,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,t,closed,none
4,132613,23.752903,-99.165076,0101000020957F00008EBA2D06DC8157C194E03B7B504E...,carnitas_mata,lic. Emilio portes gil,victoria,Tamaulipas,Mexico,?,...,No_Alcohol_Served,permitted,informal,completely,medium,?,familiar,t,closed,none
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,132866,22.141220,-100.931311,0101000020957F000013871838EC4A58C1B5DF74F8E396...,Chaires,Ricardo B. Anaya,San Luis Potosi,San Luis Potosi,Mexico,?,...,No_Alcohol_Served,not permitted,informal,completely,medium,?,familiar,f,closed,none
126,135072,22.149192,-101.002936,0101000020957F0000E7B79B1DB94758C1D29BC363D8AA...,Sushi Itto,Venustiano Carranza 1809 C Polanco,San Luis Potosi,SLP,Mexico,?,...,No_Alcohol_Served,none,informal,no_accessibility,medium,sushi-itto.com.mx,familiar,f,closed,none
127,135109,18.921785,-99.235350,0101000020957F0000A6BF695F136F5AC1DADF87B20556...,Paniroles,?,?,?,?,?,...,Wine-Beer,not permitted,informal,no_accessibility,medium,?,quiet,f,closed,Internet
128,135019,18.875011,-99.159422,0101000020957F0000B49B2E5C6E785AC12F9D58435241...,Restaurant Bar Coty y Pablo,Paseo de Las Fuentes 24 Pedregal de Las Fuentes,Jiutepec,Morelos,Mexico,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,closed,none


In [5]:
places =  geodata[['placeID', 'name']]
places.head()

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


In the `cuisine` dataset we have the type of cuisine that restaurants offer.

In [6]:
cuisine

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American
3,135106,Mexican
4,135105,Fast_Food
...,...,...
911,132005,Seafood
912,132004,Seafood
913,132003,International
914,132002,Seafood


## Popularity/Quality based recommmender system

Let's group places by rating, and look at their average rating. This is an **explicit** rating given by users.

In [7]:
frame.groupby('placeID')['rating'].mean()

placeID
132560    0.500000
132561    0.750000
132564    1.250000
132572    1.000000
132583    1.000000
            ...   
135088    1.000000
135104    0.857143
135106    1.200000
135108    1.181818
135109    1.000000
Name: rating, Length: 130, dtype: float64

In [8]:
rating = pd.DataFrame(frame.groupby('placeID')['rating'].mean())
rating.sort_values("rating", ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132955,2.0
135034,2.0
134986,2.0
132922,1.833333
132755,1.8


The top rated places have a perfect score of 2/2. But how many reviews do these places have?

In [10]:
frame.query("placeID==132955")

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
934,U1004,132955,2,2,2
960,U1061,132955,2,2,2
996,U1059,132955,2,1,2
1014,U1097,132955,2,2,1
1080,U1096,132955,2,2,2


In [11]:
frame.query("placeID==135034")

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
61,U1083,135034,2,2,2
582,U1095,135034,2,2,2
721,U1048,135034,2,2,1
912,U1061,135034,2,2,2
1017,U1097,135034,2,2,1


Looks like only 5 people went to this place. Maybe they're just the owner's friends! Or maybe they're really top-quality places, but too niche to recommend to the masses.

We can also look at how many times each restaurant has received a rating. The ratings count is an **implicit** rating.

In [12]:
frame.groupby('placeID')['rating'].count()

placeID
132560     4
132561     4
132564     4
132572    15
132583     4
          ..
135088     6
135104     7
135106    10
135108    11
135109     4
Name: rating, Length: 130, dtype: int64

In [14]:
rating['rating_count'] = frame.groupby('placeID')['rating'].count()
rating.sort_values("rating_count", ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


Some places have been visited around 30 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular place, and get some info about it:

In [15]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [16]:
rating.sort_values('rating_count', ascending=False).head(1)

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36


In [17]:
rating.sort_values('rating_count', ascending=False).head(1).index[0]

135085

In [18]:
# placeId of most popular place
top_popular_placeID = rating.sort_values('rating_count', ascending=False).head(1).index[0]

# name of the most popular place
places[places['placeID']==top_popular_placeID]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


In [19]:
# cuisine of the most popular place
cuisine[cuisine['placeID']==top_popular_placeID]

Unnamed: 0,placeID,Rcuisine
44,135085,Fast_Food


The most popular place is "Tortas Locas Hipocampo", a fast food place that has received 36 reviews and it has an average score of 1.33.

### Challenge 1:

Find a hybrid system to sort restaurants, so that you can recommend the "best" places: restaurants that are both high rated and popular.

Create a new column that represents new ranking of both columns ratings and ratings_count e.g., rating * (ratings_count /5)

Join with correpsonding table to show the name of the resturent.


In [None]:
# your code here

### Challenge 2:

Find a hybrid system to sort movies, so that you can recommend the "best" movies: movies that are both high rated and popular.

Join tables to show he name of the movies

In [None]:
import pandas as pd

url = 'https://drive.google.com/file/d/1S0CtDB8NYUs94KgO0VDv6b2R1CShQcLF/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
links = pd.read_csv(path)


url = 'https://drive.google.com/file/d/1sW3zww6gMzoln0-U0Zs7HW_bKYjtH99i/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
movies = pd.read_csv(path)

url = 'https://drive.google.com/file/d/1nUpoWkhzhnYtUFvGYTR317RHiq7XtTx9/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
ratings = pd.read_csv(path)

url = 'https://drive.google.com/file/d/1F9szBIzHvE9sk-p89sk1zpxVEG_gJezg/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
tags = pd.read_csv(path)

In [None]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352
