## **04 Run NLP Model on Reviews**
This notebook takes the processed reviews from an earlier notebook, runs the spacy NLP model on the text to determine brewery offerings, and generates a csv file containing the data needed for the streamlit app.

### **Notebook Objectives**
1. Load the spacy NLP model previously trained following the project.yml file in the main directory
2. Apply the model to all reviews to determine mentions of brewering offerings
3. Group by brewery and sum all mentions of each brewery offering
4. Merge grouped dataframe with table that contains location information (city, address, lat/lon, etc.) for Streamlit app purposes
5. Clean missing lat/lon info
6. Export final table as csv for use by streamlit app

In [1]:
from pathlib import Path
import random
import math
import pandas as pd
import numpy as np
import json
from collections import Counter
random.seed(11)
import requests
import time
from dotenv import dotenv_values
import spacy
from spacy import displacy

config = dotenv_values(dotenv_path=Path('../.env'))

In [2]:
# Import brewery review data into dataframe
filepath = Path('../assets/brewery_reviews.csv') 
reviews_df = pd.read_csv(filepath)
reviews_df.shape

(16727, 10)

In [3]:
# Load labels from Doccano export
labels_filepath = Path('../configs/label_config.json')
labels = []
with open(labels_filepath) as f:
    labels_file = json.load(f)
    labels = [label['text'] for label in labels_file]
print(labels)

['VIBE', 'LOCATION', 'BEER', 'FEATURE', 'FOOD', 'GAMES', 'OUTDOOR', 'MUSIC', 'TOUR', 'DOG']


In [4]:
# Load spacy model
nlp = spacy.load(Path('../training/model-pattern/'))

# Extract entity matches in lists
def get_entities(nlp, labels, review):
    doc = nlp(review)
    entities = [ [] for _ in range(len(labels)) ]
    for ent in doc.ents:
        for index, label in enumerate(labels):
            if ent.label_ == label:
                entities[index].append(ent.text.lower())
    return entities

columns = [ [] for _ in range(len(labels)) ]
for review in reviews_df['review']:
    entities = get_entities(nlp, labels, review)
    counts = [Counter(x) for x in entities]
    for index, label in enumerate(counts):
        columns[index].append(label)

In [5]:
# Inspect results
for index, label in enumerate(labels):
    reviews_df[label] = columns[index]
reviews_df.head(2)

Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,rating,review,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
0,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,5.0,Stout and Double IPAs...WOW!!. Only open W and...,{},{},"{'stout': 1, 'ipas': 1, 'growler': 1}",{},{},{},{},{},{},{}
1,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,5.0,"Good Local Beer!. Good beer, brewed right on s...",{},{},{'growler': 1},{},{'food truck': 1},{},{},{},{},{}


In [331]:
# Inspect review and feature matches
index = random.randint(0, len(reviews_df))
print(f'Index: {index}')
print(reviews_df['review'][index])
for label in labels:
    print(reviews_df[label][index])

Index: 14823
Somewhat strange experience. I’ve been to the Tap a few times, and it is a nice local hangout. The beer menu is great and the food is moderately good. However, the last time I went with a few friends, we sat out on the deck on a Friday...More
Counter()
Counter()
Counter()
Counter()
Counter({'food': 1})
Counter()
Counter()
Counter()
Counter()
Counter()


In [10]:
# Group by brewery
aggregate = {}
columns = ['name', 'state', 'city', 'street', 'longitude', 'latitude', 'website_url']
for column in columns:
    aggregate[column] = 'max'
aggregate['review'] = 'count'
aggregate['rating'] = 'mean'
# add labels to agg function
for label in labels:
    aggregate[label] = 'sum'
brewery_features = reviews_df.groupby(by='obdb_id').agg(aggregate).reset_index()
brewery_features['rating'] = brewery_features['rating'].map('{:.1f}'.format)
print(brewery_features.shape)
brewery_features.tail(2)

(664, 20)


Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,rating,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
662,zero-gravity-craft-brewery-burlington-1,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pinte Street,,,,30,4.3,"{'fun': 9, 'friendly': 5, 'amazing': 1, 'comfo...",{'spacious': 1},"{'ipas': 5, 'lagers': 2, 'stouts': 1, 'porters...",{},"{'chicken sandwich': 2, 'fries': 4, 'food': 12...",{'games': 1},"{'outside': 6, 'outdoor': 4, 'garden': 1, 'pat...",{'music': 1},{'tour': 1},{'dogs': 2}
663,zero-gravity-craft-brewery-burlington-2,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pine St,-73.214036,44.459546,http://www.zerogravitybeer.com,30,4.3,"{'fun': 9, 'comfortable': 1, 'friendly': 5, 'a...",{'spacious': 1},"{'lager': 6, 'ipa': 3, 'pilsner': 1, 'stout': ...",{},"{'food': 12, 'fries': 4, 'burger': 1, 'cheese'...",{'games': 1},"{'outdoor': 4, 'outside': 6, 'patio': 2, 'gard...",{'music': 1},{'tour': 1},{'dogs': 2}


In [11]:
# Remove any remaining missing lat/lon
brewery_features = brewery_features.dropna(subset=['latitude'])
print(f'Shape after removing missing lat/lon: {brewery_features.shape}')
brewery_features.head(5)

Shape after removing missing lat/lon: (565, 20)


Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,rating,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
0,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,7,4.6,{'friendly': 4},{},"{'stout': 1, 'ipas': 1, 'growler': 2, 'pale al...",{},"{'food truck': 2, 'lunch': 2, 'food': 2}",{},{},{},{},{}
2,14th-star-brewing-saint-albans,14th Star Brewing,Vermont,Saint Albans,133 N Main St Ste 7,-73.082451,44.814365,http://www.14thstarbrewing.com,30,4.1,"{'amazing': 4, 'friendly': 5, 'comfortable': 2...",{'warehouse': 1},"{'ipa': 4, 'growlers': 1, 'growler': 2, 'stout...",{},"{'food': 33, 'burgers': 1, 'fries': 9, 'burger...",{},"{'outside': 5, 'garden': 1}",{'music': 4},{'tour': 2},{}
3,16-stone-brewpub-holland-patent,16 Stone Brewpub,New York,Holland Patent,9542 Main St,-75.256519,43.242112,http://www.16stonebrewpub.com,15,5.0,{'friendly': 9},{},{},{},"{'food': 3, 'nachos': 3, 'pretzel': 3, 'chicke...",{},{},{},{},{}
4,1940s-brewing-company-holbrook,1940's Brewing Company,New York,Holbrook,1337 Lincoln Ave Unit 1,-73.085702,40.799628,http://www.1940sbrewingcompany.com,6,4.5,"{'friendly': 4, 'comfortable': 1}",{},"{'ipa': 1, 'stout': 1}",{'tv.my': 1},{},{},{'outside': 1},{},{'tour': 1},{}
5,1st-republic-brewing-co-essex-junction,1st Republic Brewing Co,Vermont,Essex Junction,39 River Rd Ste 6,-73.084476,44.483702,http://www.1strepublic-homebrew.com,29,4.9,"{'friendly': 13, 'fun': 8, 'cool': 3, 'relaxed...",{},"{'porter': 1, 'ipa': 2, 'ipas': 3, 'growler': 1}",{'tv': 1},"{'pizza': 3, 'chicken': 1, 'dinner': 1}",{'games': 2},{},{},{'tour': 2},{}


In [12]:
# Add columns for app filters
filt_columns = ['outdoor_', 'food_', 'dog_', 'games_', 'music_', 'tour_']
for col in filt_columns:
    # brewery_features[col] = np.zeros(brewery_features.shape[0], dtype=int)
    brewery_features[col] = ['No'] * brewery_features.shape[0]
brewery_features.tail(2)

Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,rating,...,OUTDOOR,MUSIC,TOUR,DOG,outdoor_,food_,dog_,games_,music_,tour_
661,young-lion-brewing-co-canandaigua,Young Lion Brewing Co,New York,Canandaigua,24 Lakeshore Dr,-77.26939,42.87561,http://www.younglionbrewing.com,35,3.9,...,"{'outdoor': 6, 'outdoor patio': 2, 'outside': 2}",{},{'tour': 1},{},No,No,No,No,No,No
663,zero-gravity-craft-brewery-burlington-2,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pine St,-73.214036,44.459546,http://www.zerogravitybeer.com,30,4.3,...,"{'outdoor': 4, 'outside': 6, 'patio': 2, 'gard...",{'music': 1},{'tour': 1},{'dogs': 2},No,No,No,No,No,No


In [13]:
# Inspect individual brewery
index = random.randint(0, brewery_features.shape[0])
index = 241
brewery_features.loc[index,'FOOD']

Counter({'lunch': 3,
         'vegetables': 6,
         'chicken': 6,
         'food': 9,
         'fries': 3,
         'salad': 3,
         'burger': 3,
         'cheese': 3,
         'sandwiches': 3})

In [14]:
# Add columns for app filters
filt_columns = ['outdoor_', 'food_', 'dog_', 'games_', 'music_', 'tour_']
for col in filt_columns:
    brewery_features[col] = ['No'] * brewery_features.shape[0]

# Assign filters values
label_columns = [label.strip('_').upper() for label in filt_columns]
min_reviews = 6 # min number of reviews to have confidence
min_count = 2 # min count for specific label to say offering is available

# Matches to remove. Note e.g. 'no dogs' match does NOT also include a second
# match on 'dog' because the 'no dogs' match rule comes after
# TODO handle beer names like 'Old Brown Dog'
stopwords = ['no dogs']

for index, row in brewery_features.iterrows():
    for label, filter in zip(label_columns, filt_columns):
        counter = row[label]
        if counter:
            # remove stopwords
            for word in stopwords:
                if word in counter:
                    print(f"Removing '{word}' from {index}: {counter}")
                    counter.pop(word)
            # print(counter)
        if counter:
            # evaluate if offering is present
            if (counter.most_common(1)[0][1] >= min_count 
                or len(counter) >= min_count):
                brewery_features.at[index, filter] = 'Yes'
            elif row['review'] <= min_reviews:
                brewery_features.at[index, filter] = 'Unknown'
        elif row['review'] <= min_reviews:
            brewery_features.at[index, filter] = 'Unknown'

Removing 'no dogs' from 109: Counter({'dog': 1, 'no dogs': 1})
Removing 'no dogs' from 127: Counter({'dogs': 1, 'no dogs': 1})
Removing 'no dogs' from 135: Counter({'dogs': 1, 'dog friendly': 1, 'no dogs': 1})
Removing 'no dogs' from 324: Counter({'pups': 1, 'no dogs': 1})
Removing 'no dogs' from 381: Counter({'dogs': 4, 'no dogs': 1})
Removing 'no dogs' from 469: Counter({'dog friendly': 2, 'no dogs': 1})
Removing 'no dogs' from 595: Counter({'dogs': 7, 'no dogs': 3, 'dog': 2})


In [15]:
# Inspect
inspect = brewery_features.drop(['obdb_id', 'state', 'city', 'street', 
    'longitude', 'latitude', 'website_url'], axis=1)
inspect.tail(10)

Unnamed: 0,name,review,rating,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG,outdoor_,food_,dog_,games_,music_,tour_
652,Woodland Farm Brewery,30,4.4,"{'hip': 2, 'friendly': 5, 'amazing': 5, 'comfo...",{'farm': 6},"{'growler': 1, 'ipa': 3, 'stout': 2}",{'tv': 1},"{'food': 22, 'sandwiches': 5, 'vegetables': 1,...",{},"{'outside': 6, 'patio': 1, 'outdoor': 1}","{'music': 6, 'bands': 1, 'band': 3}",{'tour': 1},{},Yes,Yes,No,No,Yes,No
653,Woodland Farms Brewery,18,3.9,"{'friendly': 4, 'cool': 2, 'comfortable': 1}",{'farm': 1},"{'ipa': 3, 'growlers': 1, 'lagers': 2, 'stout'...",{},"{'sandwiched': 1, 'food': 3, 'lunch': 1}",{'games': 1},"{'backyard': 2, 'outside': 1, 'outdoor': 1}",{},{},{},Yes,Yes,No,No,No,No
654,Woodman's Brewery,9,4.7,"{'friendly': 4, 'fun': 1, 'comfortable': 1}",{},{'growler': 2},{},{'pizza': 1},{},"{'outside': 1, 'outdoor': 1}",{},{},{},Yes,No,No,No,No,No
655,Woodstock Brewing,39,4.7,"{'fun': 6, 'accommodate': 1, 'friendly': 12, '...",{},"{'porters': 1, 'ipa': 2, 'ipas': 1}",{},"{'food': 32, 'chips': 3, 'burger': 3, 'nachos'...","{'games': 4, 'cornhole': 1}","{'outdoor': 7, 'outside': 9, 'garden': 4}","{'music': 2, 'musician': 1}",{},"{'dogs': 2, 'dog friendly': 1}",Yes,Yes,Yes,Yes,Yes,No
656,Woodstock Inn Brewery,45,4.0,"{'friendly': 6, 'amazing': 6, 'fun': 4, 'cool'...",{},{'ipas': 1},{},"{'pretzel': 1, 'food': 34, 'fries': 2, 'lunch'...",{},"{'outdoor': 4, 'garden': 2, 'outdoor patio': 1...",{'band': 1},{'tour': 1},{},Yes,Yes,No,No,No,No
657,Wormtown Brewery,29,4.4,"{'friendly': 4, 'cool': 5, 'amazing': 1, 'fun'...",{'warehouse': 1},"{'ipa': 3, 'stout': 1, 'amber': 1, 'porter': 1...",{},"{'pretzels': 2, 'food': 3, 'pizza': 3, 'dinner...",{'games': 1},"{'outside': 4, 'outdoor': 2, 'patio': 1, 'outd...",{'music': 3},{},{},Yes,Yes,No,No,Yes,No
658,WT Brews LLC,15,4.2,"{'friendly': 4, 'cool': 1, 'hidden gem': 1}",{'farm': 1},"{'porter': 2, 'brown ale': 1, 'growler': 3, 'i...",{},"{'food truck': 2, 'lobster roll': 1, 'dinner': 1}",{},"{'outside': 2, 'patio': 1}",{},{},{},Yes,Yes,No,No,No,No
659,Yonkers Brewing Co,45,4.3,"{'friendly': 7, 'fun': 9, 'cool': 5, 'amazing'...",{},"{'stout': 1, 'ipa': 6, 'pale ale).food': 1}",{'tv': 1},"{'food': 39, 'brunch': 3, 'lunch': 5, 'dinner'...",{'games': 1},"{'outdoor': 3, 'patio': 1, 'outside': 1, 'back...","{'music': 2, 'band': 1}","{'tour': 2, 'tours': 1}",{},Yes,Yes,No,No,Yes,Yes
661,Young Lion Brewing Co,35,3.9,"{'friendly': 12, 'fun': 2, 'amazing': 1}",{},"{'ipa': 4, 'lager': 2, 'stout': 1, 'porter': 1}",{},"{'food': 7, 'dinner': 2, 'pizza': 2, 'pretzel'...",{'games': 2},"{'outdoor': 6, 'outdoor patio': 2, 'outside': 2}",{},{'tour': 1},{},Yes,Yes,No,Yes,No,No
663,Zero Gravity Craft Brewery,30,4.3,"{'fun': 9, 'comfortable': 1, 'friendly': 5, 'a...",{'spacious': 1},"{'lager': 6, 'ipa': 3, 'pilsner': 1, 'stout': ...",{},"{'food': 12, 'fries': 4, 'burger': 1, 'cheese'...",{'games': 1},"{'outdoor': 4, 'outside': 6, 'patio': 2, 'gard...",{'music': 1},{'tour': 1},{'dogs': 2},Yes,Yes,Yes,No,No,No


In [16]:
# Generate lighter dataframe for app deployment
breweries_app = brewery_features.drop(['obdb_id'], axis=1).drop(labels, axis=1)
print(breweries_app.shape)
breweries_app.head(5)

(565, 15)


Unnamed: 0,name,state,city,street,longitude,latitude,website_url,review,rating,outdoor_,food_,dog_,games_,music_,tour_
0,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,7,4.6,No,Yes,No,No,No,No
2,14th Star Brewing,Vermont,Saint Albans,133 N Main St Ste 7,-73.082451,44.814365,http://www.14thstarbrewing.com,30,4.1,Yes,Yes,No,No,Yes,Yes
3,16 Stone Brewpub,New York,Holland Patent,9542 Main St,-75.256519,43.242112,http://www.16stonebrewpub.com,15,5.0,No,Yes,No,No,No,No
4,1940's Brewing Company,New York,Holbrook,1337 Lincoln Ave Unit 1,-73.085702,40.799628,http://www.1940sbrewingcompany.com,6,4.5,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
5,1st Republic Brewing Co,Vermont,Essex Junction,39 River Rd Ste 6,-73.084476,44.483702,http://www.1strepublic-homebrew.com,29,4.9,No,Yes,No,Yes,No,Yes


In [340]:
# Inspect examples with 'Unknown' filter values
inspect = breweries_app.query("food_ == 'Unknown'")
inspect.head(10)

Unnamed: 0,name,state,city,street,longitude,latitude,website_url,outdoor_,food_,dog_,games_,music_,tour_
4,1940's Brewing Company,New York,Holbrook,1337 Lincoln Ave Unit 1,-73.085702,40.799628,http://www.1940sbrewingcompany.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
38,Bacchus Brewing,New York,Dryden,15 Ellis Dr,-76.299042,42.49967,http://www.bacchusbrewing.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
51,Barrage Brewing Co,New York,Farmingdale,32 Allen Blvd Ste E,-73.424831,40.716251,http://www.barragebrewing.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
64,Beer On Earth,Rhode Island,Providence,425 W Fountain St #104,-71.423851,41.81897,http://www.beeronearth.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
90,"Boots Brewing Company, Inc.",New York,Watertown,89 Public Sq,-75.909046,43.974752,http://www.bootsbrew.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
94,Braven Brewing Company,New York,Brooklyn,,-73.99668,40.60386,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
95,Braven Brewing Company,New York,Brooklyn,362 Jefferson St # 320,-73.924647,40.705599,http://www.bravenbrewing.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
101,BrewSA Brewing Co,New York,Freeport,180 Woodcleft Ave,-73.582368,40.635176,http://www.brewsa.com,Yes,Unknown,Unknown,Yes,Yes,Unknown
106,Bridge And Tunnel Brewery,New York,Ridgewood,15-35 Decatur St,-73.901752,40.694335,http://www.bridgeandtunnelbrewery.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
108,Brindle Haus Brewing Company,New York,Spencerport,377 S Union St,-77.804122,43.185296,http://www.brindlehausbrewing.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown


In [341]:
# Export dataframe for app
filepath = Path('../streamlit/breweries_app.csv') 
breweries_app.to_csv(filepath, index=False)

In [305]:
# Inspect model results

# Load spacy model
nlp = spacy.load(Path('../training/model-pattern/'))

# text = ['Great location with outdoor seating. Dog-friendly on the patio. Delicious burgers and hot dogs.',
#     'I had a sandwich. The sandwiches were good']
# text = 'What a great place!  We sat on the patio with our dogs and it was just an awesome experience.  They are very dog friendly and came out a couple of times with special treats for the pooches.  Beers are delicious!  Highly recommend for a unique, laid back experience.'
text = 'What a great place! We sat on the patio with our dogs and it was an awesome experience. They are very dog friendly with all the outdoor space. The had corn hole and giant jenga available. Beers were delicious! Burgers and hot dogs were also great. Highly recommend!'
columns = [ [] for _ in range(len(labels)) ]
# for line in text:
entities = get_entities(nlp, labels, text)
counts = [Counter(x) for x in entities]
for index, label in enumerate(counts):
    columns[index].append(label)
print(labels)
print(columns)

['VIBE', 'LOCATION', 'BEER', 'FEATURE', 'FOOD', 'GAMES', 'OUTDOOR', 'MUSIC', 'TOUR', 'DOG']
[[Counter()], [Counter()], [Counter()], [Counter()], [Counter({'burgers': 1, 'hot dogs': 1})], [Counter({'corn hole': 1, 'jenga': 1})], [Counter({'patio': 1, 'outdoor': 1})], [Counter()], [Counter()], [Counter({'dogs': 1, 'dog friendly': 1})]]


In [312]:
# Generate visual
doc = nlp(text)
# color palette: https://www.colourlovers.com/palette/1097823/Lenas_Love_Letter
colors = {"FOOD": "#95CFB7",
          "GAMES": "#FF823A",
          "OUTDOOR": "#F2F26F",
          "MUSIC": "#FFF7BD",
          "DOG": "#FFF7BD"
          }
options = {"colors": colors}
svg = displacy.render(doc, style="ent", options=options, minify=False, 
                jupyter=False,page=True)
output_path = Path('../streamlit/example.svg')
output_path.open('w', encoding="utf-8").write(svg)

3071