In [1]:
import pandas as pd

### Explore fast food restaurants dataset
In order to focus only on the chain establishments, we are going to join our CFI data with a list containing popular food and coffeehouse chains. 

We can use [this fast food restaurants dataset](https://www.kaggle.com/datafiniti/fast-food-restaurants) (specifically,  the FastFoodsRestaurants file) to which we will add some relevant chains present in Chicago.

In [323]:
fast_foods = pd.read_csv('./datasets/FastFoodRestaurants.csv')

In [324]:
#we only keep the restaurants that appear more than once
chain_counts = (fast_foods['name'].value_counts())
fast_food_chains = value_counts[value_counts >= 2].index.tolist()

#to this list, we add some of the most famous chains that exist in the USA :
fast_food_chains.extend(['Starbucks', 'KRISPY KRUNCHY CHICKEN', 'Caribou coffee',\
                         "PEET'S COFFEE & TEA", 'PHILZ COFFEE', 'INTELLIGENTSIA', 'PROTEIN BAR'])

#we remove the char '\'' for simplicity, transform everything to lowercase and remove duplicates
fast_food_chains = set([chain_name.replace("\'", "").lower() for chain_name in fast_food_chains])
fast_food_chains

{'a&w',
 'a&w all american food',
 'a&w all-american food',
 'a&w restaurant',
 'amigos/kings classic',
 'arbys',
 'arctic circle',
 'au bon pain',
 'auntie annes',
 'back yard burgers',
 'baja fresh',
 'baja fresh mexican grill',
 'bakers drive thru',
 'baskin-robbins',
 'blakes lotaburger',
 'blimpie',
 'blimpie subs & sandwiches',
 'bob evans',
 'bob evans restaurant',
 'bojangles famous chicken n biscuits',
 'boston market',
 'braums',
 'burger king',
 'capriottis sandwich shop',
 'captain ds',
 'captain ds seafood',
 'caribou coffee',
 'carls jr',
 'carls jr / green burrito',
 'carls jr.',
 'carls jr. / green burrito',
 'checkers',
 'chick-fil-a',
 'chicken express',
 'chipotle mexican grill',
 'churchs chicken',
 'cook out',
 'cook-out',
 'cousins subs',
 'crown fried chicken',
 'culvers',
 'dairy cheer',
 'dairy queen',
 'dangelo grilled sandwiches',
 'del taco',
 'dominos pizza',
 'dunkin donuts',
 'el pollo loco',
 'farmer boys',
 'fazolis',
 'firehouse subs',
 'five guys',
 '

### Get all chains inspections

In [325]:
inspections = pd.read_pickle('./datasets/cleaned_inspections.pickle')

In [326]:
# We only leave the inspections for food chains
chain_inspections = inspections[inspections['AKA Name'].str.replace("\'", "")\
                                .str.lower().isin(fast_food_chains)].reset_index(drop=True)
chain_inspections

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location,Community Area
0,2072164,STARBUCKS COFFEE #47565,STARBUCK'S,2543156,restaurant,2,150 N RIVERSIDE PLZ,2017-08-09,License Re-Inspection,Pass,,41.885089,-87.638406,"41.88508945576888, -87.63840559417187",Lincoln Square
1,2072016,STARBUCKS COFFEE #47565,STARBUCK'S,2543156,restaurant,2,150 N RIVERSIDE PLZ,2017-08-08,License,Fail,8. SANITIZING RINSE FOR EQUIPMENT AND UTENSILS...,41.885089,-87.638406,"41.88508945576888, -87.63840559417187",Lincoln Square
2,2069847,STARBUCKS COFFEE #47565,STARBUCK'S,2543156,restaurant,2,150 N RIVERSIDE PLZ,2017-07-14,License,Not Ready,,41.885089,-87.638406,"41.88508945576888, -87.63840559417187",Lincoln Square
3,2252796,DUNKIN DONUTS,DUNKIN DONUTS,2630933,restaurant,2,11601 W TOUHY AVE,2019-01-18,Canvass,Pass w/ Conditions,16. FOOD-CONTACT SURFACES: CLEANED & SANITIZED...,42.008536,-87.914428,"42.008536400868735, -87.91442843927048",South Lawndale
4,2229879,DUNKIN DONUTS,DUNKIN DONUTS,2630933,restaurant,2,11601 W TOUHY AVE,2018-10-18,License,Pass w/ Conditions,5. PROCEDURES FOR RESPONDING TO VOMITING AND D...,42.008536,-87.914428,"42.008536400868735, -87.91442843927048",South Lawndale
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11208,545448,MCDONALDS,MCDONALDS,0,restaurant,2,7601 S MICHIGAN AVE,2011-07-26,Complaint,Business Not Located,,41.756381,-87.621352,"41.756380553593104, -87.6213519417889",Near North Side
11209,487855,DUNKIN DONUTS,DUNKIN DONUTS,0,restaurant,3,2640 W DIVERSEY AVE,2010-12-23,Complaint,Business Not Located,,41.932226,-87.694274,"41.9322255020809, -87.69427433079002",Avondale
11210,479084,QUIZNOS,QUIZNOS,1444651,restaurant,2,7222 N HARLEM AVE,2010-12-10,Canvass,Out of Business,,42.012424,-87.806777,"42.012424169692075, -87.80677654908463",Lincoln Park
11211,148211,QUIZNO'S SUB,QUIZNO'S SUB,1800194,restaurant,1,7222 N HARLEM AVE,2010-01-14,Out of Business,Pass,,42.012424,-87.806777,"42.012424169692075, -87.80677654908463",Lincoln Park


We now have all inspected chain establishments. In order to analyse the chains, we can add a flag column to to the dataset that indicates whether the facility is a chain or not. This will help us analyze the average riskiness of chains and the risk score we will compute for facilities. For the interactive interface, the user will have the choice to include the food chains in the recommended establishments. 