## **04 Run NLP Model on Reviews**
This notebook takes the processed reviews from an earlier notebook, runs the spacy NLP model on the text to determine brewery offerings, and generates a csv file containing the data needed for the streamlit app.

### **Notebook Objectives**
1. Load the spacy NLP model previously trained following the project.yml file in the main directory
2. Apply the model to all reviews to determine mentions of brewering offerings
3. Group by brewery and sum all mentions of each brewery offering
4. Merge grouped dataframe with table that contains location information (city, address, lat/lon, etc.) for Streamlit app purposes
5. Clean missing lat/lon info
6. Export final table as csv for use by streamlit app

In [2]:
from pathlib import Path
import random
import math
import pandas as pd
import numpy as np
import json
from collections import Counter
random.seed(11)
import requests
import time
from dotenv import dotenv_values
import spacy

config = dotenv_values(dotenv_path=Path('../.env'))

In [66]:
# Import brewery review data into dataframe
filepath = Path('../assets/brewery_reviews.csv') 
reviews_df = pd.read_csv(filepath)
reviews_df.shape

(5985, 10)

In [4]:
# Load labels from Doccano export
labels_filepath = Path('../configs/label_config.json')
labels = []
with open(labels_filepath) as f:
    labels_file = json.load(f)
    labels = [label['text'] for label in labels_file]
print(labels)

['VIBE', 'LOCATION', 'BEER', 'FEATURE', 'FOOD', 'GAMES', 'OUTDOOR', 'MUSIC', 'TOUR', 'DOG']


In [67]:
# Load spacy model
nlp = spacy.load(Path('../training/model-pattern/'))

# Extract entity matches in lists
def get_entities(nlp, labels, review):
    doc = nlp(review)
    entities = [ [] for _ in range(len(labels)) ]
    for ent in doc.ents:
        for index, label in enumerate(labels):
            if ent.label_ == label:
                entities[index].append(ent.text.lower())
    return entities

columns = [ [] for _ in range(len(labels)) ]
for review in reviews_df['review']:
    entities = get_entities(nlp, labels, review)
    counts = [Counter(x) for x in entities]
    for index, label in enumerate(counts):
        columns[index].append(label) 

In [68]:
# Inspect results
for index, label in enumerate(labels):
    reviews_df[label] = columns[index]
reviews_df.head(2)

Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,rating,review,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
0,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,4.0,Nice local SE MASS Brewery. Definitely worth t...,{},{},{},{},{'food': 1},{},{},{},{},{}
1,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,4.0,"Tasty, fresh Brew. Went for a quick taste of t...",{'friendly': 1},{},"{'pale ale': 1, 'porter': 1}",{},"{'food truck': 1, 'lunch': 2, 'food': 1}",{},{},{},{},{}


In [69]:
# Inspect review and feature matches
index = random.randint(0, len(reviews_df))
print(f'Index: {index}')
print(reviews_df['review'][index])
for label in labels:
    print(reviews_df[label][index])

Index: 3814
Favorite boston-area brewery. Quick hits:- Great beer - don't miss the barrel-aged- No need to form a line at the bar- Dog friendly- Rotating foodtrucksExcellent brewery with fantastic variety. But don't make the mistake I made, which was overlooking their barreled beers. All of their beers are very good, but a friend shared a bottle of some apple brandy barrel-aged horchata-style stout and I realized I'd been missing out. Their IPAs are among my favorite beers generally, but don't get blinded.Also a *Hot Tip* that isn't, because there's signage everywhere saying so, but you don't need to line up at the bar. Just treat it like normal bar service and step up.  Service is quick and the bartenders have been good, in my experience, at keeping track of who has been waiting. The interior is large and open, and tends to be less busy than the patio, where it can be tough to get a seat on nicer days. Patio is dog friendly, too, but if your dog isn't good at socializing, this will be

In [81]:
# Group by brewery
aggregate = {}
columns = ['name', 'state', 'city', 'street', 'longitude', 'latitude', 'website_url']
for column in columns:
    aggregate[column] = 'max'
aggregate['review'] = 'count'
# add labels to agg function
for label in labels:
    aggregate[label] = 'sum'
brewery_features = reviews_df.groupby(by='obdb_id').agg(aggregate).reset_index()
print(brewery_features.shape)
brewery_features.tail(2)

(247, 19)


Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
245,zero-gravity-craft-brewery-burlington-1,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pinte Street,,,,30,"{'fun': 9, 'friendly': 5, 'amazing': 1, 'comfo...",{'spacious': 1},"{'ipas': 5, 'lagers': 2, 'stouts': 1, 'porters...",{},"{'chicken': 2, 'sandwich': 2, 'fries': 4, 'foo...",{'games': 1},"{'outside': 6, 'outdoor': 4, 'patio': 2}",{'music': 1},"{'touring': 1, 'tour': 1}",{'dogs': 4}
246,zero-gravity-craft-brewery-burlington-2,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pine St,-73.214036,44.459546,http://www.zerogravitybeer.com,30,"{'fun': 9, 'friendly': 5, 'amazing': 1, 'comfo...",{'spacious': 1},"{'porter': 3, 'lagers': 2, 'ipa': 3, 'lager': ...",{},"{'dinner': 1, 'food': 12, 'pizzas': 1, 'chicke...",{'games': 1},"{'outdoor': 4, 'patio': 2, 'outside': 6}",{'music': 1},"{'tour': 1, 'touring': 1}",{'dogs': 4}


In [82]:
# Merge any manually corrected lat lon corrections
filepath = Path('../assets/breweries_manual_clean.csv')
brewery_lat_lon = pd.read_csv(filepath)[['name', 'longitude', 'latitude']]
# brewery_lat_lon.head()
brewery_features = brewery_features.merge(brewery_lat_lon, on='name', how='left', suffixes=(None,'_x'))

for index, row in brewery_features.iterrows():
    if pd.isnull(row['latitude']):
        brewery_features.at[index, 'latitude'] = brewery_features.at[index, 'latitude_x']
        brewery_features.at[index, 'longitude'] = brewery_features.at[index, 'longitude_x']
brewery_features = brewery_features.drop(columns = ['latitude_x', 'longitude_x'])
print(f'Shape after merge: {brewery_features.shape}')
brewery_features.tail(5)

# Remove any remaining missing lat/lon
brewery_features = brewery_features.dropna(subset=['latitude'])
print(f'Shape after removing missing lat/lon: {brewery_features.shape}')
brewery_features.head(5)

Shape after merge: (247, 19)
Shape after removing missing lat/lon: (224, 19)


Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG
0,10th-district-brewing-company-abington,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,7,{'friendly': 4},{},"{'pale ale': 3, 'porter': 1, 'growler': 2, 'st...",{},"{'food': 2, 'food truck': 2, 'lunch': 2}",{},{},{},{},{}
1,14th-star-brewing-saint-albans,14th Star Brewing,Vermont,Saint Albans,133 N Main St Ste 7,-73.082451,44.814365,http://www.14thstarbrewing.com,30,"{'comfortable': 2, 'fun': 3, 'accommodating': ...",{'warehouse': 1},"{'ipa': 4, 'growlers': 1, 'growler': 2, 'stout...",{},"{'food': 33, 'fries': 9, 'lunch': 1, 'chicken'...",{},{'outside': 5},{'music': 4},{'tour': 2},{}
2,1st-republic-brewing-co-essex-junction,1st Republic Brewing Co,Vermont,Essex Junction,39 River Rd Ste 6,-73.084476,44.483702,http://www.1strepublic-homebrew.com,29,"{'friendly': 13, 'fun': 8, 'amazing': 1, 'comf...",{},"{'growler': 1, 'ipa': 2, 'ipas': 3, 'porter': 1}",{'tv': 1},"{'pizza': 3, 'chicken': 1, 'dinner': 1}",{'games': 2},{},{},"{'tour': 2, 'tournament': 1}",{}
3,3cross-fermentation-cooperative-worcester,3cross Fermentation Cooperative,Massachusetts,Worcester,4 Knowlton Ave,-71.830576,42.243649,http://www.3cross.coop,12,"{'friendly': 3, 'fun': 2, 'cool': 2, 'comforta...",{},"{'cans to go': 1, 'stouts': 2, 'porters': 2, '...",{'tv': 1},{'food': 6},{},{},"{'music': 2, 'bands': 1}",{},{}
4,603-brewery-londonderry,603 Brewery,New Hampshire,Londonderry,12 Liberty Dr Unit 7,-71.363,42.91646,http://www.603brewery.com,22,"{'friendly': 8, 'cool': 2, 'fun': 8, 'amazing'...",{},"{'ipa': 2, 'stout': 1, 'porters': 1, 'growlers...",{},"{'food': 6, 'chicken': 1, 'sandwich': 1, 'pizz...",{'games': 1},{'outdoor': 1},{},"{'tour': 2, 'tours': 1}",{}


In [83]:
# Add columns for app filters
filt_columns = ['outdoor_', 'food_', 'dog_', 'games_', 'music_', 'tour_']
for col in filt_columns:
    # brewery_features[col] = np.zeros(brewery_features.shape[0], dtype=int)
    brewery_features[col] = ['No'] * brewery_features.shape[0]
brewery_features.tail(2)

Unnamed: 0,obdb_id,name,state,city,street,longitude,latitude,website_url,review,VIBE,...,OUTDOOR,MUSIC,TOUR,DOG,outdoor_,food_,dog_,games_,music_,tour_
244,wormtown-brewery-worcester,Wormtown Brewery,Massachusetts,Worcester,72 Shrewsbury St,-71.791226,42.263499,http://www.wormtownbrewery.com,29,"{'cool': 5, 'fun': 6, 'friendly': 4, 'amazing'...",...,"{'outside': 4, 'outdoor': 3, 'patio': 2}",{'music': 3},{},{},No,No,No,No,No,No
246,zero-gravity-craft-brewery-burlington-2,Zero Gravity Craft Brewery,Vermont,Burlington,716 Pine St,-73.214036,44.459546,http://www.zerogravitybeer.com,30,"{'fun': 9, 'friendly': 5, 'amazing': 1, 'comfo...",...,"{'outdoor': 4, 'patio': 2, 'outside': 6}",{'music': 1},"{'tour': 1, 'touring': 1}",{'dogs': 4},No,No,No,No,No,No


In [84]:
# Inspect individual brewery
index = random.randint(0, brewery_features.shape[0])
index = 111
brewery_features.loc[index,'FOOD']

Counter({'food': 42,
         'lunch': 13,
         'sandwich': 2,
         'dinner': 6,
         'chicken': 9,
         'sandwiches': 1,
         'fries': 1,
         'pizza': 3,
         'chips': 3})

In [85]:
# Add columns for app filters
filt_columns = ['outdoor_', 'food_', 'dog_', 'games_', 'music_', 'tour_']
for col in filt_columns:
    brewery_features[col] = ['No'] * brewery_features.shape[0]

# Assign filters values
label_columns = [label.strip('_').upper() for label in filt_columns]
min_reviews = 6 # min number of reviews to have confidence
min_count = 2 # min count for specific label to say offering is available

# if label = dog
# check if 'no dogs'
# if no dogs > min count, assign no?

for index, row in brewery_features.iterrows():
    for label, filter in zip(label_columns, filt_columns):
        counter = row[label]
        ##
        # print(counter)
        # print(f"Counts: {row['review']}")
        # if counter:
        #     print(counter.most_common(1)[0][1])
        #     print(len(counter))
        ##
        if counter:
            if (counter.most_common(1)[0][1] >= min_count 
                or len(counter) >= min_count):
                brewery_features.at[index, filter] = 'Yes'
            elif row['review'] <= min_reviews:
                brewery_features.at[index, filter] = 'Unknown'
        elif row['review'] <= min_reviews:
            brewery_features.at[index, filter] = 'Unknown'

In [86]:
# Inspect
inspect = brewery_features.drop(['obdb_id', 'state', 'city', 'street', 
    'longitude', 'latitude', 'website_url'], axis=1)
inspect.tail(10)

Unnamed: 0,name,review,VIBE,LOCATION,BEER,FEATURE,FOOD,GAMES,OUTDOOR,MUSIC,TOUR,DOG,outdoor_,food_,dog_,games_,music_,tour_
236,Westfield River Brewing Co,30,"{'fun': 1, 'amazing': 1, 'friendly': 4, 'cool'...",{},"{'ipas': 1, 'ipa': 1, 'stout': 1}",{'tvs': 1},"{'food': 24, 'pizza': 1, 'chicken': 6, 'pizzas...",{},"{'outside': 2, 'outdoor': 1, 'patio': 1}","{'music': 3, 'bands': 1, 'band': 1}",{},{},Yes,Yes,No,No,Yes,No
237,Whalers Brewing Company,29,"{'fun': 13, 'friendly': 6, 'cool': 5}","{'old mill': 3, 'old warehouse': 1, 'warehouse...","{'stout': 3, 'porters': 1, 'stouts': 2, 'ipas'...",{},"{'food': 4, 'lunch': 1}","{'pool': 12, 'games': 10, 'corn hole': 5}",{'outside': 2},"{'music': 1, 'band': 1}",{'tour': 1},"{'dogs': 2, 'dog': 2}",Yes,Yes,Yes,Yes,Yes,No
238,Whetstone Craft Beers @ Whetstone Station,45,"{'friendly': 5, 'fun': 1, 'amazing': 1}",{},"{'lagers': 2, 'ipas': 2}",{},"{'food': 44, 'sandwich': 9, 'lunch': 7, 'fries...",{'games': 1},"{'outdoor': 1, 'patio': 1, 'outside': 4}",{},{},{},Yes,Yes,No,No,No,No
239,White Birch Brewing,2,{},{},{},{},{},{},{},{},{},{},Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
240,Widowmaker Brewing,21,"{'cool': 1, 'friendly': 11, 'fun': 3, 'comfort...",{'spacious': 3},"{'stout': 4, 'ipa': 2, 'stouts': 1, 'ipas': 8}","{'tv': 3, 'tvs': 1}","{'snacks': 4, 'food': 6, 'sandwiches': 1, 'foo...","{'games': 12, 'corn hole': 1}","{'outdoor': 3, 'outside': 2}",{'music': 2},{},"{'dogs': 3, 'dog': 4}",Yes,Yes,Yes,Yes,Yes,No
241,Winter Hill Brewing Company,38,"{'cool': 2, 'amazing': 6, 'friendly': 3, 'comf...",{},"{'pilsner': 4, 'porters': 2, 'ipa': 8, 'pale a...",{},"{'food': 36, 'lunch': 7, 'sandwich': 5, 'sandw...",{},"{'outdoor': 2, 'patio': 5, 'backyard': 2}",{},{},{'dog': 2},Yes,Yes,Yes,No,No,No
242,Woodman's Brewery,9,"{'friendly': 4, 'fun': 1, 'comfortable': 1}",{},{'growler': 2},{},{'pizza': 1},{},"{'outside': 1, 'outdoor': 1}",{},{},{},Yes,No,No,No,No,No
243,Woodstock Inn Brewery,45,"{'friendly': 6, 'amazing': 6, 'fun': 4, 'cool'...",{},{'ipas': 1},{},"{'food': 34, 'dinner': 9, 'chicken': 1, 'sandw...",{},"{'outdoor': 5, 'patio': 3, 'outside': 1}",{'band': 1},{'tour': 1},{},Yes,Yes,No,No,No,No
244,Wormtown Brewery,29,"{'cool': 5, 'fun': 6, 'friendly': 4, 'amazing'...",{'warehouse': 1},"{'ipa': 3, 'stout': 1, 'growler': 1, 'pilsner'...",{},"{'pizza': 3, 'food': 3, 'dinner': 1}",{'games': 1},"{'outside': 4, 'outdoor': 3, 'patio': 2}",{'music': 3},{},{},Yes,Yes,No,No,Yes,No
246,Zero Gravity Craft Brewery,30,"{'fun': 9, 'friendly': 5, 'amazing': 1, 'comfo...",{'spacious': 1},"{'porter': 3, 'lagers': 2, 'ipa': 3, 'lager': ...",{},"{'dinner': 1, 'food': 12, 'pizzas': 1, 'chicke...",{'games': 1},"{'outdoor': 4, 'patio': 2, 'outside': 6}",{'music': 1},"{'tour': 1, 'touring': 1}",{'dogs': 4},Yes,Yes,Yes,No,No,Yes


In [88]:
# Generate lighter dataframe for app deployment
breweries_app = brewery_features.drop(['obdb_id', 'review'], axis=1).drop(labels, axis=1)
print(breweries_app.shape)
breweries_app.head()

(224, 13)


Unnamed: 0,name,state,city,street,longitude,latitude,website_url,outdoor_,food_,dog_,games_,music_,tour_
0,10th District Brewing Company,Massachusetts,Abington,491 Washington St,-70.945941,42.105918,http://www.10thdistrictbrewing.com,No,Yes,No,No,No,No
1,14th Star Brewing,Vermont,Saint Albans,133 N Main St Ste 7,-73.082451,44.814365,http://www.14thstarbrewing.com,Yes,Yes,No,No,Yes,Yes
2,1st Republic Brewing Co,Vermont,Essex Junction,39 River Rd Ste 6,-73.084476,44.483702,http://www.1strepublic-homebrew.com,No,Yes,No,Yes,No,Yes
3,3cross Fermentation Cooperative,Massachusetts,Worcester,4 Knowlton Ave,-71.830576,42.243649,http://www.3cross.coop,No,Yes,No,No,Yes,No
4,603 Brewery,New Hampshire,Londonderry,12 Liberty Dr Unit 7,-71.363,42.91646,http://www.603brewery.com,No,Yes,No,No,No,Yes


In [90]:
# Inspect examples with 'Unknown' filter values
inspect = breweries_app.query("food_ == 'Unknown'")
inspect.head(10)

Unnamed: 0,name,state,city,street,longitude,latitude,website_url,outdoor_,food_,dog_,games_,music_,tour_
13,Anawan Brewing Company,Massachusetts,Dighton,,-71.128681,41.86512,http://www.anawanbrewingco.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
20,BareWolf Brewing,Massachusetts,Amesbury,12 Oakland St,-70.923365,42.856069,http://www.barewolfbrewing.com,Unknown,Unknown,Unknown,Yes,Yes,Unknown
26,Beer On Earth,Rhode Island,Providence,425 W Fountain St #104,-71.423851,41.81897,http://www.beeronearth.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
44,Building 8 Brewing,Massachusetts,Florence,320 Riverside Dr Ste 8,-72.665272,42.320271,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
47,Buttonwoods Brewery,Rhode Island,Cranston,530 Wellington Ave Ste 22,-71.426378,41.773156,http://www.buttonwoodsbrewery.com,Yes,Unknown,Unknown,Unknown,Unknown,Unknown
85,From the Barrel Brewing Company,New Hampshire,Londonderry,15 Londonderry Rd Unit 9,-71.341337,42.876425,http://www.drinkfromthebarrel.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
86,Frost Beer Works,Vermont,Hinesburg,171 Commerce St,-73.108981,44.335325,http://www.frostbeerworks.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
89,Granite Roots Brewing,New Hampshire,Troy,244 N Main St,-72.189731,42.844014,http://www.graniterootsbrewing.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
95,Halyard Brewing Company,Vermont,South Burlington,80 Ethan Allen Dr Ste 2,-73.154307,44.482956,http://www.halyardbrewing.us,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
106,Honest Weight Artisan Beer,Massachusetts,Orange,131 W Main St Ste 104,-72.316371,42.591709,http://www.honestweightbeer.com,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown


In [91]:
# Export dataframe for app
filepath = Path('../streamlit/breweries_app.csv') 
breweries_app.to_csv(filepath, index=False)