
___
# Chapter 1 - Simple Approaches to Recommender Systems
## Segment 2 - Popularity-Based Recommenders

In [1]:
import pandas as pd
import numpy as np

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel GonzÃ¡lez-Serna, Rafael Ponce-MedellÃ­n. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSysâ€™11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

In [2]:
frame = pd.read_csv('rating_final.csv')
cuisine = pd.read_csv('chefmozcuisine.csv')

In [3]:
frame.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [4]:
cuisine.head()

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American
3,135106,Mexican
4,135105,Fast_Food


## Recommending based on counts

To find the place that's most popular, we'll do that by counting up the number of ratings each place has gotten and converting that array to a data frame. 

So, to do that, we're going to say frame.groupby. We're going to group frame by the place ID and for each unique place ID, we want to look at the ratings column and take account of how many ratings there are.

In [8]:
rating_count = pd.DataFrame(frame.groupby('placeID')['rating'].count())

'''
Let's also sort the places in descending order, according to the number of reviews they received. 
To do that, we just take the rating_count data frame and we want to call this sort values method and we pass in rating, 
because we want it to sort by the rating, and we want it to sort in descending value, so we pass in the argument, 
ascending=False. Let's just look at the first few records.
'''
rating_count.sort_values('rating', ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


Now, it looks like the most reviewed place is a place with a ID number 135085 and it's got a total of 36 ratings

Let's take the top five most often rated places and see if they have any similarities between the cuisines that they serve. 

To do that, we'll first make a data frame of the place IDs of the most often rated places, then we'll merge that data frame with the cuisine data frame. 

So we create the data frame and we're just going to name the place IDs for each of the most reviewed places in the data set, so that's 135085... 132825... 135032... 135052... 132834

In [6]:
most_rated_places = pd.DataFrame([135085, 132825, 135032, 135052, 132834], index=np.arange(5), columns=['placeID'])


'''
Then we want to merge this data set, most_rated_places, with the cuisines data set and see if there're any similarities 
between the cuisines that are served at the most popular places in town
'''
summary = pd.merge(most_rated_places, cuisine, on='placeID')
summary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132825,Mexican
2,135032,Cafeteria
3,135032,Contemporary
4,135052,Bar
5,135052,Bar_Pub_Brewery
6,132834,Mexican


Let's see how many types of cuisines are available from places in this data set, in total.

In [7]:
cuisine['Rcuisine'].describe()

count         916
unique         59
top       Mexican
freq          239
Name: Rcuisine, dtype: object

You can see that two of the top rated places in town both serve Mexican food. 

The recommender is suggesting that Mexican food is popular and that places that serve it are good candidates for recommending.

From the description of our cuisine data frame, we see that Mexican food is the most frequently served type of cuisine in the data set. 

Our recommender is basically saying that places that serve the most popular types of cuisine are more likely to be appreciated by the average restaurant goer in the city. 