![title](Chapter-1.png "Header")
___
# Simple Approaches to Recommender Systems
## Segment 2 - Popularity-Based Recommenders

In [1]:
import pandas as pd
import numpy as np

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel GonzÃ¡lez-Serna, Rafael Ponce-MedellÃ­n. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSysâ€™11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

**Abstract:** The dataset was obtained from a recommender system prototype. The task was to generate a top-n list of restaurants according to the consumer preferences.

**Data Set Information:**

Two approaches were tested: a collaborative filter technique and a contextual approach.

(i) The `collaborative filter technique` used only one file i.e., `rating_final.csv` that comprises the user, item and rating attributes.

(ii) The `contextual approach` generated the recommendations using the remaining eight data files.



In [2]:
frame = pd.read_csv('rating_final.csv')
cuisine = pd.read_csv('chefmozcuisine.csv')

#### frame dataset
* rating_final.csv
* Instances: 1161
* Attributes: 5
* userID: Nominal
* placeID: Nominal
* rating: Numeric, 3 [0,1,2]
* food_rating: Numeric, 3 [0,1,2]
* service_rating: Numeric, 3 [0,1,2]

#### chefmozcuisine.csv dataset

* Instances: 916
* Attributes: 2
* placeID: Nominal
* Rcuisine: Nominal, 59 [Afghan,African,American,Armenian,Asian,Bagels,Bakery,Bar,Bar_Pub_Brewery,Barbecue,Brazilian,Breakfast-Brunch,Burgers,Cafe-Coffee_Shop, Cafeteria,California,Caribbean,Chinese,Contemporary,Continental-European,Deli-Sandwiches,Dessert-Ice_Cream,Diner,Dutch-Belgian,Eastern_European,Ethiopian,Family,Fast_Food,Fine_Dining,French,,Game,German,Greek,Hot_Dogs, International,Italian,Japanese,Juice,Korean,Latin_American,Mediterranean,Mexican,Mongolian,Organic-Healthy,Persian, Pizzeria,Polish,Regional,Seafood,Soup,Southern,Southwestern,Spanish,Steaks,Sushi,Thai,Turkish,Vegetarian,Vietnamese]

In [3]:
frame.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [77]:
frame_rate = pd.DataFrame(frame['rating'].value_counts())
frame_rate

Unnamed: 0,rating
2,486
1,421
0,254


In [78]:
frame_food = pd.DataFrame(frame['food_rating'].value_counts())
frame_food

Unnamed: 0,food_rating
2,516
1,379
0,266


In [79]:
frame_service = pd.DataFrame(frame['service_rating'].value_counts())
frame_service

Unnamed: 0,service_rating
1,426
2,420
0,315


**Ratings:**

* 0 = Worst rating
* 1 = mid rating
* 2 = best rating

In [101]:
# merged_data = [frame_rate, frame_food, frame_service]
merger = frame_rate.join([frame_food, frame_service], how='outer')
merger

Unnamed: 0,rating,food_rating,service_rating
0,254,266,315
1,421,379,426
2,486,516,420


In [102]:
frame.shape

(1161, 6)

In [103]:
frame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1161 entries, 0 to 1160
Data columns (total 6 columns):
userID            1161 non-null object
placeID           1161 non-null int64
rating            1161 non-null int64
food_rating       1161 non-null int64
service_rating    1161 non-null int64
ratings_data      3 non-null float64
dtypes: float64(1), int64(4), object(1)
memory usage: 54.5+ KB


In [104]:
frame.isnull().sum()

userID               0
placeID              0
rating               0
food_rating          0
service_rating       0
ratings_data      1158
dtype: int64

In [105]:
cuisine.head()

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American
3,135106,Mexican
4,135105,Fast_Food


In [106]:
cuisine.shape

(916, 2)

In [107]:
cuisine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 2 columns):
placeID     916 non-null int64
Rcuisine    916 non-null object
dtypes: int64(1), object(1)
memory usage: 14.4+ KB


In [108]:
cuisine.isnull().sum()

placeID     0
Rcuisine    0
dtype: int64

## Recommending based on counts

In [109]:
rating_count = pd.DataFrame(frame.groupby('placeID')['rating'].count())

rating_count.sort_values('rating', ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


In [110]:
rating_food = pd.DataFrame(frame.groupby('placeID')['food_rating'].count())

rating_food.sort_values('food_rating', ascending=False).head()

Unnamed: 0_level_0,food_rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


In [111]:
rating_food = pd.DataFrame(frame.groupby('placeID')['service_rating'].count())

rating_food.sort_values('service_rating', ascending=False).head()

Unnamed: 0_level_0,service_rating
placeID,Unnamed: 1_level_1
135085,36
132825,32
135032,28
135052,25
132834,25


### Popularity Based Recommendation

Using popularity based recommendation to identify the most popular cuisine, this is based on popularity count. this rely on:

* Purchase history data
* Didn't output personalised data

Result: A list of the most popular places in town, and a list of the Rcuisine of each of them

In [112]:
most_rated_places = pd.DataFrame([135085, 132825, 135032, 135052, 132834], index=np.arange(5), columns=['placeID'])

summary = pd.merge(most_rated_places, cuisine, on='placeID')
summary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132825,Mexican
2,135032,Cafeteria
3,135032,Contemporary
4,135052,Bar
5,135052,Bar_Pub_Brewery
6,132834,Mexican


In [113]:
cuisine['Rcuisine'].describe()

count         916
unique         59
top       Mexican
freq          239
Name: Rcuisine, dtype: object

* There are 59 unique cuisine in the most rated dataframe
* The top cuisine is Mexican cuisine
`