# Join Old and New Datasets

In this notebook we show the code used to merge the old and new Yelp datasets. The old dataset is provided by Kaggle. From this dataset we have only kept the data about restaurants and more specifically those restaurants that were still open when the dataset was collected. The new dataset was pulled from the Yelp Business API and it contains the most recent information for the same restaurants (with some minor mistakes that we will try to find by matching the old and new information).

We have already tried merging the old dataset with the information pulled from the Yelp Search API but a lot of this information was not matched because Yelp does not always return results for restaurants that closed a long time ago. In addition some restaurants rebranded or moved. 

The restaurants that did not match in name and address with our initial effort are contained in the *restaurants_new_notmerged* dataframe and are marked as "found: 0" in the *restaurants_old* dataframe.

In [1]:
import pandas as pd
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Now we will read the 2 dataframes containing the new data pulled from the Yelp Business API. One of these dataframes was already merged successfully with the old data and the other one was not. Both datasets contain more information that the Yelp Search API provides and for this we will merge them again and will ignore the information from the Yelp Search API.

In [2]:
# Read the 2 dataframes with the new dataset pulled from the Yelp Business API
restaurants_new_notmerged = pd.read_pickle('./yelp_dataset_processed/yelp_df_notmerged_new.pkl')
restaurants_new_merged = pd.read_pickle('./yelp_dataset_processed/yelp_df_merged_new.pkl')

# Read the dataframe with the old Yelp dataset marked as found (1 for merged, 0 for notmerged)
#restaurants_old = pd.read_pickle('./yelp_old/yelp_training_set/yelp_training_open_restaurants.pkl')
restaurants_old = pd.read_pickle('./yelp_dataset_processed/yelp_df_found.pkl')

Let's take a look at the data

In [3]:
# Dataframe with new info from restaurants that did not match in name and address in our initial effort
restaurants_new_notmerged.head()

Unnamed: 0,categories,coordinates.latitude,coordinates.longitude,display_phone,error.code,error.description,hours,id,image_url,is_claimed,...,location.state,location.zip_code,name,phone,photos,price,rating,review_count,transactions,url
0,"[{'alias': 'mexican', 'title': 'Mexican'}]",33.480195,-112.081313,(602) 266-1423,,,"[{'open': [{'is_overnight': False, 'start': '0...",la-paloma-mexican-food-phoenix,https://s3-media2.fl.yelpcdn.com/bphoto/qUjY7x...,False,...,AZ,85013,La Paloma Mexican Food,16022661423,[https://s3-media2.fl.yelpcdn.com/bphoto/qUjY7...,$,3.0,18.0,[],https://www.yelp.com/biz/la-paloma-mexican-foo...
1,"[{'alias': 'mexican', 'title': 'Mexican'}]",32.945603,-112.725909,(928) 683-6382,,,"[{'open': [{'is_overnight': False, 'start': '0...",sofias-mexican-food-gila-bend,https://s3-media2.fl.yelpcdn.com/bphoto/WddzkO...,False,...,AZ,85337,Sofia's Mexican Food,19286836382,[https://s3-media2.fl.yelpcdn.com/bphoto/Wddzk...,$,4.0,86.0,[],https://www.yelp.com/biz/sofias-mexican-food-g...
2,"[{'alias': 'delis', 'title': 'Delis'}]",33.423805,-111.735786,(480) 641-4170,,,"[{'open': [{'is_overnight': False, 'start': '0...",yee-haws-mesa,https://s3-media2.fl.yelpcdn.com/bphoto/5HfVEs...,False,...,AZ,85205,Yee-haws,14806414170,[https://s3-media2.fl.yelpcdn.com/bphoto/5HfVE...,$,4.5,3.0,[],https://www.yelp.com/biz/yee-haws-mesa?adjust_...
3,"[{'alias': 'chinese', 'title': 'Chinese'}]",33.568398,-112.100121,(602) 678-4293,,,"[{'open': [{'is_overnight': False, 'start': '1...",panda-gourmet-phoenix,https://s3-media3.fl.yelpcdn.com/bphoto/eIsbpn...,True,...,AZ,85021,Panda Gourmet,16026784293,[https://s3-media3.fl.yelpcdn.com/bphoto/eIsbp...,,3.5,5.0,[],https://www.yelp.com/biz/panda-gourmet-phoenix...
4,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",33.504062,-111.929431,(480) 945-1928,,,"[{'open': [{'is_overnight': False, 'start': '1...",five-guys-scottsdale-2,https://s3-media2.fl.yelpcdn.com/bphoto/xjJkzA...,True,...,AZ,85251,Five Guys,14809451928,[https://s3-media2.fl.yelpcdn.com/bphoto/xjJkz...,$$,3.5,87.0,[],https://www.yelp.com/biz/five-guys-scottsdale-...


In [4]:
# Dataframe with new info from restaurants that matched in name and address in our initial effort 
# (and therefore business id was known)
restaurants_new_merged.head()

Unnamed: 0,categories,coordinates.latitude,coordinates.longitude,display_phone,hours,id,image_url,is_claimed,is_closed,location.address1,...,location.state,location.zip_code,name,phone,photos,price,rating,review_count,transactions,url
0,"[{'alias': 'bagels', 'title': 'Bagels'}, {'ali...",33.713596,-112.200125,(623) 825-0355,"[{'open': [{'is_overnight': False, 'start': '0...",hot-bagels-and-deli-glendale-az,https://s3-media4.fl.yelpcdn.com/bphoto/efZ-8J...,True,False,6520 W Happy Valley Rd,...,AZ,85310,Hot Bagels & Deli,16238250355,[https://s3-media4.fl.yelpcdn.com/bphoto/efZ-8...,$,3.5,77,[],https://www.yelp.com/biz/hot-bagels-and-deli-g...
1,"[{'alias': 'sandwiches', 'title': 'Sandwiches'...",33.378695,-111.812692,(480) 632-6453,"[{'open': [{'is_overnight': False, 'start': '1...",jersey-mikes-subs-gilbert-2,https://s3-media1.fl.yelpcdn.com/bphoto/BYIdRj...,True,False,891 E Baseline Rd,...,AZ,85233,Jersey Mike's Subs,14806326453,[https://s3-media2.fl.yelpcdn.com/bphoto/eTDf8...,$,3.0,29,[pickup],https://www.yelp.com/biz/jersey-mikes-subs-gil...
2,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",33.61746,-111.92626,(480) 321-8800,"[{'open': [{'is_overnight': False, 'start': '1...",sauce-pizza-and-wine-scottsdale-2,https://s3-media2.fl.yelpcdn.com/bphoto/1lTHQh...,True,False,14418 N Scottsdale Rd,...,AZ,85254,Sauce Pizza and Wine,14803218800,[https://s3-media2.fl.yelpcdn.com/bphoto/1lTHQ...,$$,3.5,163,[pickup],https://www.yelp.com/biz/sauce-pizza-and-wine-...
3,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",33.56699,-112.116241,(602) 870-1111,,fuddruckers-phoenix-3,https://s3-media4.fl.yelpcdn.com/bphoto/gcmpOL...,True,False,8941 North Black Canyon Hwy,...,AZ,85021,Fuddruckers,16028701111,[https://s3-media4.fl.yelpcdn.com/bphoto/gcmpO...,$$,3.5,81,[],https://www.yelp.com/biz/fuddruckers-phoenix-3...
4,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",33.581867,-111.881622,(480) 451-1803,,mcdonalds-scottsdale-5,https://s3-media4.fl.yelpcdn.com/bphoto/ofHjsT...,False,False,9251 E Shea Blvd,...,AZ,85258,McDonald's,14804511803,[https://s3-media4.fl.yelpcdn.com/bphoto/ofHjs...,$,2.5,16,[],https://www.yelp.com/biz/mcdonalds-scottsdale-...


In [5]:
# Dataframe with old info (Kaggle dataset)
restaurants_old.head()

Unnamed: 0,business_id,categories,city,full_address,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,found
0,PzOqRohWw7F7YEPBz6AubA,"[Food, Bagels, Delis, Restaurants]",Glendale,"6520 W Happy Valley Rd\nSte 101\nGlendale Az, ...",33.712797,-112.200264,Hot Bagels & Deli,[],True,14,3.5,AZ,business,1
1,qarobAbxGSHI7ygf1f7a_Q,"[Sandwiches, Restaurants]",Gilbert,"891 E Baseline Rd\nSuite 102\nGilbert, AZ 85233",33.378839,-111.812007,Jersey Mike's Subs,[],True,10,3.5,AZ,business,1
2,gA5CuBxF-0CnOpGnryWJdQ,"[Mexican, Restaurants]",Phoenix,"519 W Thomas Rd\nPhoenix, AZ 85013",33.480105,-112.081361,La Paloma Mexican Food,[],True,5,4.0,AZ,business,0
3,JxVGJ9Nly2FFIs_WpJvkug,"[Pizza, Restaurants]",Scottsdale,"14418 N Scottsdale Rd\nSuite 181\nScottsdale, ...",33.617459,-111.926272,Sauce,[],True,55,4.0,AZ,business,1
4,Jj7bcQ6NDfKoz4TXwvYfMg,"[Burgers, Restaurants]",Phoenix,"8941 N Black Canyon Hwy\nPhoenix, AZ 85021",33.566989,-112.116243,Fuddruckers,[],True,23,4.0,AZ,business,1


Now we will check if the old dataset contains the same number of rows as the old and new datasets. The entries for not found (*found=0*) restaurants should match the entries of the *notmerged* dataframe while the entries of the found restaurants (*found=1*) should match the entries of the *merged* dataframe.

In [6]:
# Check if length of dataframes is the same
len(restaurants_old[restaurants_old['found'] == 0]) == len(restaurants_new_notmerged)

True

In [7]:
len(restaurants_old[restaurants_old['found'] == 1]) == len(restaurants_new_merged)

True

Here, we will try to match the names from the old and new datasets that were notmerged with our previous effort. If the first 4 consecutive letters of the name from the old or new dataset is found in the other one, then the names match. If the first 4 consecutive letters of the old address match the 4 first consecutive letters of the others of the new address then the addresses match.

If none of the address or name match then we throw out that datapoint and mark old entry as "not found".

If a name matches but the address doesn't, then mark as closed if new location closed, else throw out and mark old entry as "not found".

If an address matches but the name doesn't, then throw out and mark old entry as "not found".


In [39]:
# A dictionary of names that remained similar. These are used as an equivalency criterion when tables are merged
changed_names = {'O.H.S.O. Eatery + nanoBrewery':'OHSO Brewery- Arcadia',
                'Kentucky Fried Chicken':'KFC',
                 'Kfc': 'KFC',
                'Pancho Taqueria':'Taqueria Don Pancho',
                'International House of Pancakes':'IHOP',
                "PF Chang's China Bistro": "P.F. Chang's",
                "Aj's Cafe": "AJ's",
                'AZ 88':'AZ88',
                'Azteca Bakery & Mexican Fast Food': 'AZTECA Bakeries & Restaurant',
                'Losbetos Mexican Food': 'Los Betos Mexican Food',
                'U.S. Egg': 'US Egg Restaurant',
                'Mr Hunan': 'Mr. Hunan',
                'NCounter': 'Ncounter',
                'The Café at MIM': 'Musical Instrument Museum',
                 'Lox Stock & Bagel': 'Lox, Stock & Bagel',
                 'Kiku Sushi': 'KiKu Revolving Sushi',
                 'Caffe Sarajevo': 'Old Town Sarajevo',
                 "YC'S Mongolian Grill": "YC's Mongolian Grill",
                 "Harvest Buffet": "The Buffet"
                }

In [9]:
restaurants_all_notmerged = pd.DataFrame()

restaurants_old_notfound = restaurants_old[restaurants_old['found'] == 0].reset_index() # old index will become a new column
restaurants_old['found2'] = 0

# make sure address starts the same and names are similar
for i, restaurant in restaurants_new_notmerged.iterrows():

    if restaurants_old_notfound.loc[i]["name"] in changed_names:
        # if name has changed, take dictionary into account
        if restaurant['location.address1'][:4] == restaurants_old_notfound.loc[i].full_address[:4] and \
        ((restaurant["name"][:4] in restaurants_old_notfound.loc[i]["name"]) or (restaurants_old_notfound.loc[i]["name"][:4] in restaurant["name"])\
        or changed_names[restaurants_old_notfound.loc[i]["name"]] == restaurant["name"]):
            restaurants_old.loc[restaurants_old_notfound.loc[i,'index'],'found2'] = 1
            restaurants_all_notmerged = \
            restaurants_all_notmerged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_notfound.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)

    else:
        # if name not in dict, just check address and name similarity
        if restaurant['error.code'] != 'BUSINESS_UNAVAILABLE':
            if (restaurant['location.address1'] is not None):
            # If not a foodtruck
                if restaurant['location.address1'][:4] == restaurants_old_notfound.loc[i].full_address[:4] and \
                ((restaurant["name"][:4] in restaurants_old_notfound.loc[i]["name"]) or (restaurants_old_notfound.loc[i]["name"][:4] in restaurant["name"])):
                    restaurants_old.loc[restaurants_old_notfound.loc[i,'index'],'found2'] = 1
                    restaurants_all_notmerged = \
                    restaurants_all_notmerged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_notfound.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)
            else:
                # if a foodtruck (no address)
                if ((restaurant["name"][:4] in restaurants_old_notfound.loc[i]["name"]) or (restaurants_old_notfound.loc[i]["name"][:4] in restaurant["name"])):
                    restaurants_old.loc[restaurants_old_notfound.loc[i,'index'],'found2'] = 1
                    restaurants_all_notmerged = \
                    restaurants_all_notmerged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_notfound.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)


Check if merging was done correctly by looking at the data and the number of lines.

In [18]:
restaurants_all_notmerged.head()

Unnamed: 0,categories_new,coordinates.latitude,coordinates.longitude,display_phone,error.code,error.description,hours,id,image_url,is_claimed,...,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,found
0,"[{'alias': 'mexican', 'title': 'Mexican'}]",33.4802,-112.081,(602) 266-1423,,,"[{'open': [{'is_overnight': False, 'start': '0...",la-paloma-mexican-food-phoenix,https://s3-media2.fl.yelpcdn.com/bphoto/qUjY7x...,False,...,33.4801,-112.081,La Paloma Mexican Food,[],True,5,4.0,AZ,business,0
1,"[{'alias': 'delis', 'title': 'Delis'}]",33.4238,-111.736,(480) 641-4170,,,"[{'open': [{'is_overnight': False, 'start': '0...",yee-haws-mesa,https://s3-media2.fl.yelpcdn.com/bphoto/5HfVEs...,False,...,33.4238,-111.736,Yee-haws,[],True,3,3.5,AZ,business,0
2,"[{'alias': 'chinese', 'title': 'Chinese'}]",33.5684,-112.1,(602) 678-4293,,,"[{'open': [{'is_overnight': False, 'start': '1...",panda-gourmet-phoenix,https://s3-media3.fl.yelpcdn.com/bphoto/eIsbpn...,True,...,33.5683,-112.1,Panda Gourmet,[],True,3,3.5,AZ,business,0
3,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",33.5041,-111.929,(480) 945-1928,,,"[{'open': [{'is_overnight': False, 'start': '1...",five-guys-scottsdale-2,https://s3-media2.fl.yelpcdn.com/bphoto/xjJkzA...,True,...,33.5033,-111.928,Five Guys Burger and Fries,[],True,52,3.5,AZ,business,0
4,"[{'alias': 'sandwiches', 'title': 'Sandwiches'...",33.3356,-111.963,(480) 763-1776,,,"[{'open': [{'is_overnight': False, 'start': '1...",forefathers-gourmet-cheesesteaks-and-fries-tempe,https://s3-media3.fl.yelpcdn.com/bphoto/4VKNBZ...,True,...,33.3352,-111.963,Forefathers Gourmet Cheesesteaks & Fries,[],True,106,4.0,AZ,business,0


In [17]:
len(restaurants_old[restaurants_old['found2'] == 1]) == len(restaurants_all_notmerged)

True

In [11]:
len(restaurants_all_notmerged)

930

In [19]:
# Percentage of closed restaurants within the merged list that were not merged initially
len(restaurants_all_notmerged[restaurants_all_notmerged['is_closed']==1])/len(restaurants_all_notmerged)

0.6827956989247311

Repeat the same process to merge the information from the restaurants that were already merged in our previous effort (with info from the Yelp Search API). The Yelp Business API contains more info and therefore can be utilized better.

In [40]:
restaurants_all_merged = pd.DataFrame()

restaurants_old_found = restaurants_old[restaurants_old['found'] == 1].reset_index() # old index will become a new column

# make sure address starts the same and names are similar
for i, restaurant in restaurants_new_merged.iterrows():

    if restaurants_old_found.loc[i]["name"] in changed_names:
        # if name has changed, take dictionary into account
        if restaurant['location.address1'][:4] == restaurants_old_found.loc[i].full_address[:4] and \
        ((restaurant["name"][:4] in restaurants_old_found.loc[i]["name"]) or (restaurants_old_found.loc[i]["name"][:4] in restaurant["name"])\
        or (changed_names[restaurants_old_found.loc[i]["name"]] in restaurant["name"]) or (restaurant["name"] in changed_names[restaurants_old_found.loc[i]["name"]])):
            restaurants_old.loc[restaurants_old_found.loc[i,'index'],'found2'] = 1
            restaurants_all_merged = \
            restaurants_all_merged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_found.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)

    else:
        # if name not in dict, just check address and name similarity
        if (restaurant['location.address1'] is not None):
        # If not a foodtruck
            if restaurant['location.address1'][:4] == restaurants_old_found.loc[i].full_address[:4] and \
            ((restaurant["name"][:4] in restaurants_old_found.loc[i]["name"]) or (restaurants_old_found.loc[i]["name"][:4] in restaurant["name"])):
                restaurants_old.loc[restaurants_old_found.loc[i,'index'],'found2'] = 1
                restaurants_all_merged = \
                restaurants_all_merged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_found.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)
        else:
            # if a foodtruck (no address)
            if ((restaurant["name"][:4] in restaurants_old_found.loc[i]["name"]) or (restaurants_old_found.loc[i]["name"][:4] in restaurant["name"])):
                restaurants_old.loc[restaurants_old_found.loc[i,'index'],'found2'] = 1
                restaurants_all_merged = \
                restaurants_all_merged.append(restaurant.to_frame().T.reset_index(drop = True).join(restaurants_old_found.loc[i].to_frame().T.reset_index(drop = True),lsuffix='_new'),ignore_index=True)


In [41]:
restaurants_all_merged.head()

Unnamed: 0,categories_new,coordinates.latitude,coordinates.longitude,display_phone,hours,id,image_url,is_claimed,is_closed,location.address1,...,longitude,name,neighborhoods,open,review_count,stars,state,type,found,found2
0,"[{'alias': 'bagels', 'title': 'Bagels'}, {'ali...",33.7136,-112.2,(623) 825-0355,"[{'open': [{'is_overnight': False, 'start': '0...",hot-bagels-and-deli-glendale-az,https://s3-media4.fl.yelpcdn.com/bphoto/efZ-8J...,True,False,6520 W Happy Valley Rd,...,-112.2,Hot Bagels & Deli,[],True,14,3.5,AZ,business,1,1
1,"[{'alias': 'sandwiches', 'title': 'Sandwiches'...",33.3787,-111.813,(480) 632-6453,"[{'open': [{'is_overnight': False, 'start': '1...",jersey-mikes-subs-gilbert-2,https://s3-media1.fl.yelpcdn.com/bphoto/BYIdRj...,True,False,891 E Baseline Rd,...,-111.812,Jersey Mike's Subs,[],True,10,3.5,AZ,business,1,1
2,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",33.6175,-111.926,(480) 321-8800,"[{'open': [{'is_overnight': False, 'start': '1...",sauce-pizza-and-wine-scottsdale-2,https://s3-media2.fl.yelpcdn.com/bphoto/1lTHQh...,True,False,14418 N Scottsdale Rd,...,-111.926,Sauce,[],True,55,4.0,AZ,business,1,1
3,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",33.567,-112.116,(602) 870-1111,,fuddruckers-phoenix-3,https://s3-media4.fl.yelpcdn.com/bphoto/gcmpOL...,True,False,8941 North Black Canyon Hwy,...,-112.116,Fuddruckers,[],True,23,4.0,AZ,business,1,1
4,"[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",33.5819,-111.882,(480) 451-1803,,mcdonalds-scottsdale-5,https://s3-media4.fl.yelpcdn.com/bphoto/ofHjsT...,False,False,9251 E Shea Blvd,...,-111.882,McDonald's,[],True,3,2.5,AZ,business,1,1


In [42]:
len(restaurants_all_merged) == len(restaurants_old[restaurants_old['found'] == 1])

False

In [43]:
len(restaurants_all_merged)

2397

In [44]:
len(restaurants_old[restaurants_old['found'] == 1])

2398

In [45]:
# 2 restaurant that wes matched before wes not matched now
restaurants_old[(restaurants_old['found'] == 1) & (restaurants_old['found2'] == 0)]

Unnamed: 0,business_id,categories,city,full_address,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,found,found2
803,jfXqQTdxktGQWkELGn-7wA,"[Food, Desserts, American (Traditional), Ameri...",Mesa,Superstition Springs Center\n6613 East Souther...,33.392319,-111.68773,The Cheesecake Factory,[],True,35,3.5,AZ,business,1,0


In [46]:
restaurants_old_found[(restaurants_old_found['index'] == 803)]

Unnamed: 0,index,business_id,categories,city,full_address,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,found,found2
509,803,jfXqQTdxktGQWkELGn-7wA,"[Food, Desserts, American (Traditional), Ameri...",Mesa,Superstition Springs Center\n6613 East Souther...,33.392319,-111.68773,The Cheesecake Factory,[],True,35,3.5,AZ,business,1,0


In [47]:
restaurants_new_merged.loc[509]

categories                  [{'alias': 'desserts', 'title': 'Desserts'}, {...
coordinates.latitude                                                  33.3914
coordinates.longitude                                                -111.689
display_phone                                                  (480) 641-7300
hours                       [{'open': [{'is_overnight': False, 'start': '1...
id                                                the-cheesecake-factory-mesa
image_url                   https://s3-media3.fl.yelpcdn.com/bphoto/xg3SZX...
is_claimed                                                               True
is_closed                                                               False
location.address1                                         6613 E Southern Ave
location.address2                                                            
location.address3                                                            
location.city                                                   

In [48]:
# Percentage of closed restaurants within the merged list that were not merged initially
len(restaurants_all_merged[restaurants_all_merged['is_closed']==1])/len(restaurants_all_merged)

0.05590321234876929

In [50]:
restaurants_all = restaurants_all_merged.append(restaurants_all_notmerged.drop(['error.code','error.description'],axis=1),ignore_index=True)

In [51]:
restaurants_all.head()

Unnamed: 0,business_id,categories,categories_new,city,coordinates.latitude,coordinates.longitude,display_phone,found,found2,full_address,...,photos,price,rating,review_count,review_count_new,stars,state,transactions,type,url
0,PzOqRohWw7F7YEPBz6AubA,"[Food, Bagels, Delis, Restaurants]","[{'alias': 'bagels', 'title': 'Bagels'}, {'ali...",Glendale,33.7136,-112.2,(623) 825-0355,1,1,"6520 W Happy Valley Rd\nSte 101\nGlendale Az, ...",...,[https://s3-media4.fl.yelpcdn.com/bphoto/efZ-8...,$,3.5,14,77,3.5,AZ,[],business,https://www.yelp.com/biz/hot-bagels-and-deli-g...
1,qarobAbxGSHI7ygf1f7a_Q,"[Sandwiches, Restaurants]","[{'alias': 'sandwiches', 'title': 'Sandwiches'...",Gilbert,33.3787,-111.813,(480) 632-6453,1,1,"891 E Baseline Rd\nSuite 102\nGilbert, AZ 85233",...,[https://s3-media2.fl.yelpcdn.com/bphoto/eTDf8...,$,3.0,10,29,3.5,AZ,[pickup],business,https://www.yelp.com/biz/jersey-mikes-subs-gil...
2,JxVGJ9Nly2FFIs_WpJvkug,"[Pizza, Restaurants]","[{'alias': 'italian', 'title': 'Italian'}, {'a...",Scottsdale,33.6175,-111.926,(480) 321-8800,1,1,"14418 N Scottsdale Rd\nSuite 181\nScottsdale, ...",...,[https://s3-media2.fl.yelpcdn.com/bphoto/1lTHQ...,$$,3.5,55,163,4.0,AZ,[pickup],business,https://www.yelp.com/biz/sauce-pizza-and-wine-...
3,Jj7bcQ6NDfKoz4TXwvYfMg,"[Burgers, Restaurants]","[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",Phoenix,33.567,-112.116,(602) 870-1111,1,1,"8941 N Black Canyon Hwy\nPhoenix, AZ 85021",...,[https://s3-media4.fl.yelpcdn.com/bphoto/gcmpO...,$$,3.5,23,81,4.0,AZ,[],business,https://www.yelp.com/biz/fuddruckers-phoenix-3...
4,4IAzFJ159GEaIGX1-y6Bmw,"[Burgers, Fast Food, Restaurants]","[{'alias': 'hotdogs', 'title': 'Fast Food'}, {...",Scottsdale,33.5819,-111.882,(480) 451-1803,1,1,"9251 E Shea Blvd\nScottsdale, AZ 85258",...,[https://s3-media4.fl.yelpcdn.com/bphoto/ofHjs...,$,2.5,3,16,2.5,AZ,[],business,https://www.yelp.com/biz/mcdonalds-scottsdale-...


In [52]:
len(restaurants_all)

3327

In [54]:
len(restaurants_all[restaurants_all['is_closed'] == 1])/len(restaurants_all)

0.23113916441238352

In [55]:
restaurants_all.to_pickle('./yelp_dataset_processed/yelp_df_all_final.pkl')

In [56]:
restaurants_all_merged.to_pickle('./yelp_dataset_processed/restaurants_df_all_merged.pkl')
restaurants_all_notmerged.to_pickle('./yelp_dataset_processed/restaurants_df_all_notmerged.pkl')