Import **Pandas** and **SpaCy** library

In [30]:
#!pip install pandas
#!pip install spacy
import pandas as pd
import spacy
from spacy.matcher import PhraseMatcher
from collections import defaultdict

Load **Data**

In [31]:
data = pd.read_json('data/restaurant.json')
data.head()

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
109,lDJIaF4eYRF4F7g6Zb9euw,lb0QUR5bc4O-Am4hNq9ZGg,r5PLDU-4mSbde5XekTXSCA,4,2,0,0,I used to work food service and my manager at ...,2013-01-27 17:54:54
1013,vvIzf3pr8lTqE_AOsxmgaA,MAmijW4ooUzujkufYYLMeQ,r5PLDU-4mSbde5XekTXSCA,4,0,0,0,We have been trying Eggplant sandwiches all ov...,2015-04-15 04:50:56
1204,UF-JqzMczZ8vvp_4tPK3bQ,slfi6gf_qEYTXy90Sw93sg,r5PLDU-4mSbde5XekTXSCA,5,1,0,0,Amazing Steak and Cheese... Better than any Ph...,2011-03-20 00:57:45
1251,geUJGrKhXynxDC2uvERsLw,N_-UepOzAsuDQwOUtfRFGw,r5PLDU-4mSbde5XekTXSCA,1,0,0,0,Although I have been going to DeFalco's for ye...,2018-07-17 01:48:23
1354,aPctXPeZW3kDq36TRm-CqA,139hD7gkZVzSvSzDPwhNNw,r5PLDU-4mSbde5XekTXSCA,2,0,0,0,"Highs: Ambience, value, pizza and deserts. Thi...",2018-01-21 10:52:58


Create a **list** that contains all menu items in the restaurant

In [32]:
menu = ["Cheese Steak", "Cheesesteak", "Steak and Cheese", "Italian Combo", "Tiramisu", "Cannoli",
        "Chicken Salad", "Chicken Spinach Salad", "Meatball", "Pizza", "Pizzas", "Spaghetti",
        "Bruchetta", "Eggplant", "Italian Beef", "Purista", "Pasta", "Calzones",  "Calzone",
        "Italian Sausage", "Chicken Cutlet", "Chicken Parm", "Chicken Parmesan", "Gnocchi",
        "Chicken Pesto", "Turkey Sandwich", "Turkey Breast", "Ziti", "Portobello", "Reuben",
        "Mozzarella Caprese",  "Corned Beef", "Garlic Bread", "Pastrami", "Roast Beef",
        "Tuna Salad", "Lasagna", "Artichoke Salad", "Fettuccini Alfredo", "Chicken Parmigiana",
        "Grilled Veggie", "Grilled Veggies", "Grilled Vegetable", "Mac and Cheese", "Macaroni",  
         "Prosciutto", "Salami"]

Load **SpaCy Model**

In [33]:
nlp = spacy.blank('en')

Create **PhraseMatcher** object

In [34]:
matcher = PhraseMatcher(nlp.vocab, attr='LOWER')

Create **a list of tokens** for each item in the menu

In [35]:
menu_tokens_list = [nlp(item) for item in menu]

Add the **item patterns** to the **matcher**

In [36]:
matcher.add("MENU", menu_tokens_list)

Matching on the **whole dataset**

In [37]:
item_ratings = defaultdict(list)

for idx, review in data.iterrows():
    doc = nlp(review.text)
    matches = matcher(doc)

    found_items = set([doc[match[1]:match[2]].text.lower() for match in matches]) 

    for item in found_items:
        item_ratings[item].append(review.stars)

Inspecting **item_ratings**<br>


*   **item_ratings** is a Python dictionary
*   **menu item** as **keys**
*   **list of rating (stars)** as **values**






In [38]:
print(item_ratings)

defaultdict(<class 'list'>, {'chicken parmigiana': [4, 5, 4, 5, 5, 5, 5, 5, 4, 4, 4, 3, 4, 5, 5, 4, 5], 'eggplant': [4, 3, 1, 5, 4, 3, 4, 3, 4, 5, 4, 5, 5, 5, 3, 5, 5, 5, 4, 5, 4, 5, 4, 4, 4, 5, 5, 5, 2, 5, 5, 5, 5, 4, 3, 5, 5, 5, 5, 5, 5, 4, 2, 4, 3, 5, 5, 5, 3, 4, 4, 5, 5, 2, 4, 4, 5, 5, 2, 5, 2, 5, 4, 4, 3, 5, 1, 5, 5], 'pizza': [5, 2, 3, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 5, 1, 4, 3, 5, 5, 5, 4, 5, 5, 5, 5, 3, 3, 4, 4, 5, 5, 5, 5, 2, 3, 5, 5, 5, 5, 5, 4, 5, 4, 4, 4, 5, 4, 4, 3, 5, 5, 5, 4, 5, 5, 5, 4, 5, 3, 3, 5, 4, 4, 5, 5, 5, 4, 5, 4, 5, 1, 4, 5, 3, 5, 5, 4, 5, 5, 5, 5, 5, 5, 4, 2, 5, 4, 5, 3, 5, 5, 5, 5, 5, 1, 4, 4, 5, 3, 4, 5, 5, 5, 5, 4, 2, 5, 5, 2, 5, 5, 2, 5, 5, 5, 4, 5, 5, 5, 1, 5, 5, 5, 5, 5, 4, 5, 5, 4, 4, 5, 5, 5, 5, 5, 4, 5, 5, 4, 5, 1, 5, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 3, 4, 5, 5, 5, 2, 3, 5, 5, 5, 5, 4, 4, 4, 3, 5, 5, 5, 3, 2, 5, 1, 5, 5, 3, 5, 5, 4, 4, 5, 5, 4, 3, 4, 4, 4, 4, 2, 5, 3, 5, 4, 4, 3, 5, 4, 3, 4, 5, 5, 4, 5, 1, 5, 5, 4, 5, 5, 5,

**Question 1: What is the average rating for each menu item?**<br>
**Question 2: Which menu item is most popular?**<br>
**Question 3: Which menu item is least popular?**

In [43]:
mean_ratings = {item: sum(ratings)/len(ratings) for item, ratings in item_ratings.items()}
df = pd.DataFrame(list(mean_ratings.items()),columns = ['Menu Item','Average Rating']) 
df = df.sort_values('Average Rating').reset_index(drop=True)
df

Unnamed: 0,Menu Item,Average Rating
0,chicken cutlet,3.4
1,turkey sandwich,3.8
2,spaghetti,3.888889
3,italian beef,3.92
4,macaroni,4.0
5,tuna salad,4.0
6,italian combo,4.047619
7,garlic bread,4.128205
8,roast beef,4.142857
9,eggplant,4.15942


**df** contains the average rating for each menu item.<br>
By inspecting **df**, it is clear that **fettuccini alfredo**, **artichoke salad**,**turkey breast** & **corned beef**	are the most poular menu items.<br>
And, **chicken cutlet** is the least popular menu items.