In [1]:
import pandas as pd

In [2]:
inspections = pd.read_csv("../data/inspections.csv", index_col=0)

In [3]:
inspections.head(3)

Unnamed: 0,CAMIS,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INSPECTION TYPE
0,41158108,NICK'S GOURMET DELI,QUEENS,7415,DITMARS BOULEVARD,11370,7182788338,American,07/21/2015,Violations were cited in the following area(s).,04L,Evidence of mice or live mice present in facil...,Critical,11.0,,,09/27/2016,Cycle Inspection / Initial Inspection
1,41187577,HANSOL NUTRITION CENTER,QUEENS,16026,NORTHERN BOULEVARD,11358,7188880200,Korean,07/13/2016,Violations were cited in the following area(s).,06A,Personal cleanliness inadequate. Outer garment...,Critical,30.0,,,09/27/2016,Cycle Inspection / Initial Inspection
2,41705988,KURA,MANHATTAN,130,ST MARKS PLACE,10009,2122281010,Japanese,05/08/2013,Violations were cited in the following area(s).,02B,Hot food item not held at or above 140Âº F.,Critical,27.0,,,09/27/2016,Pre-permit (Operational) / Initial Inspection


In [3]:
inspections['INSPECTION DATE'] = pd.to_datetime(inspections['INSPECTION DATE'])

In [47]:
# random_camis = inspections.sample().iloc[0]['CAMIS']
# inspections[inspections['CAMIS'] == random_camis].sort_values(by='INSPECTION DATE')['INSPECTION DATE']

Get initial inspection date.

In [4]:
inspections_f = inspections.groupby('CAMIS')\
                            .apply(lambda df: inspections[inspections['CAMIS'] == df['CAMIS'].iloc[0]]\
                                               .sort_values(by="INSPECTION DATE")\
                                               .iloc[0]\
                                               .drop('CAMIS')
                                  )\
                            .rename(columns={'INSPECTION DATE': 'INITIAL INSPECTION DATE'})

In [5]:
inspections_f.sample(1)

Unnamed: 0_level_0,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INITIAL INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INSPECTION TYPE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
50001937,TASTY CHICKEN,BROOKLYN,1687,86TH STREET,11214,7182591111,American,08/21/2013,Violations were cited in the following area(s).,15L,Smoke free workplace smoking policy inadequate...,Not Critical,,,,09/27/2016,Smoke-Free Air Act / Initial Inspection


Not sure why, but this operation drops the `DBA` field. None others are affected. Maybe something with the indexing?

Let's take the most recent inspection date.

In [5]:
inspections_ff = inspections_f.copy()
inspections_ff['LATEST INSPECTION DATE'] = inspections_ff\
        .apply(lambda srs: inspections[inspections['CAMIS'] == srs.name]\
                                  .sort_values(by='INSPECTION DATE')\
                                  .iloc[-1]\
                                  ['INSPECTION DATE'],
              axis='columns')

Rename the `INSPECTION TYPE` to `INITIAL INSPECTION TYPE` (this flags new establishments from pre-existing ones, in the cotext of our dataset).

In [6]:
inspections_ff = inspections_ff.rename(columns={'INSPECTION TYPE': 'INITIAL INSPECTION TYPE'})

Checking the flags:

In [91]:
inspections_ff['INITIAL INSPECTION TYPE'].value_counts()

Cycle Inspection / Initial Inspection                          12537
Pre-permit (Operational) / Initial Inspection                   7630
Pre-permit (Non-operational) / Initial Inspection               2256
Cycle Inspection / Re-inspection                                 877
Administrative Miscellaneous / Initial Inspection                683
Smoke-Free Air Act / Initial Inspection                          281
Trans Fat / Initial Inspection                                   247
Pre-permit (Operational) / Re-inspection                          82
Trans Fat / Compliance Inspection                                 80
Trans Fat / Re-inspection                                         75
Inter-Agency Task Force / Initial Inspection                      53
Calorie Posting / Initial Inspection                              30
Trans Fat / Second Compliance Inspection                          27
Administrative Miscellaneous / Re-inspection                      27
Smoke-Free Air Act / Re-inspection

A lot of them are null.

In [93]:
inspections_ff['INITIAL INSPECTION TYPE'].isnull().astype(int).sum()

1094

Remember, this is new establishments that have not been inspected yet. That the numbers match up here is encouraging.

In [94]:
inspections['INSPECTION TYPE'].isnull().astype(int).sum()

1094

We reattach the lost DBA column.

In [95]:
inspections_ff.head(1)

Unnamed: 0_level_0,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INITIAL INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INITIAL INSPECTION TYPE,LATEST INSPECTION DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
30075445,,BRONX,1007.0,MORRIS PARK AVE,10462,7188925000.0,Bakery,2013-06-01,Violations were cited in the following area(s).,16B,The original nutritional fact labels and/or in...,Not Critical,,,,09/24/2016,Trans Fat / Compliance Inspection,2016-02-18


In [7]:
inspections_fff = inspections_ff.copy()
inspections_fff['DBA'] = inspections_fff.apply(lambda srs: 
                                                   inspections[inspections['CAMIS'] == srs.name]\
                                                   .iloc[0]['DBA'],
                                               axis='columns')

Prepend descriptors, to more easily distinguish things down the road.

In [8]:
inspections_fff.columns = ["DOHMH " + col for col in inspections_fff.columns]

In [105]:
inspections_fff.sample()

Unnamed: 0_level_0,DOHMH DBA,DOHMH BORO,DOHMH BUILDING,DOHMH STREET,DOHMH ZIPCODE,DOHMH PHONE,DOHMH CUISINE DESCRIPTION,DOHMH INITIAL INSPECTION DATE,DOHMH ACTION,DOHMH VIOLATION CODE,DOHMH VIOLATION DESCRIPTION,DOHMH CRITICAL FLAG,DOHMH SCORE,DOHMH GRADE,DOHMH GRADE DATE,DOHMH RECORD DATE,DOHMH INITIAL INSPECTION TYPE,DOHMH LATEST INSPECTION DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
40374834,CASA BELLA,MANHATTAN,127.0,MULBERRY STREET,10013,2124314000.0,Italian,2013-08-15,Violations were cited in the following area(s).,06D,"Food contact surface not properly washed, rins...",Critical,6.0,,,09/24/2016,Cycle Inspection / Initial Inspection,2016-04-26


Now we get information from Yelp!

In [9]:
from yelp.client import Client
from yelp.oauth1_authenticator import Oauth1Authenticator
from yelp.errors import BusinessUnavailable
import os
import json

def import_credentials(filename='../data/yelp_credentials.json'):
    try:
        data = json.load(open(filename))
        return data
    except:
        raise IOError('This API requires Yelp credentials to work. Did you forget to define them?')

credentials = import_credentials()

auth = Oauth1Authenticator(
    consumer_key=credentials['consumer_key'],
    consumer_secret=credentials['consumer_secret'],
    token=credentials['token'],
    token_secret=credentials['token_secret']
)

client = Client(auth)

In [10]:
from tqdm import tqdm

In [77]:
import httplib2.HttpLib2Error

ImportError: No module named 'httplib2.error'

In [82]:
from urllib.error import HTTPError
import yelp

def yelp_phone_fetch(num):
    """
    Performs the phone search described in notebook 02 to fetch information on the entity associated with a number.
    """
    if not num:
        return None
    else:
        try:
            business = client.phone_search(num).businesses[0]
            if business and business.location and business.location.coordinate:
                return {
                    'Yelp ID': business.id,
                    'Yelp Is Claimed': business.is_claimed,
                    'Yelp Is Closed': business.is_closed,
                    'Yelp Name': business.name,
                    'Yelp URL': business.url,
                    'Yelp Review Count': business.review_count,
                    'Yelp Categories': business.categories,
                    'Yelp Rating': business.rating,
                    'Yelp Address': business.location.address,
                    'Yelp Neighborhoods': business.location.neighborhoods,
                    'Yelp Latitude': business.location.coordinate.latitude,
                    'Yelp Longitude': business.location.coordinate.longitude,
                       }
            else:  # Partial information, skip.
                return None
        except IndexError:  # Phone search failed!
            return None
        except yelp.errors.InvalidParameter:  # Invalid number!
            return None

After some testing there appears to be a *very* significant miss rate. Remember, we're fuzzy matching phone numbers from one data set with phone numbers in another dataset. It's far from 100% that we'll get something back out. I'm interested in what percentage of the time we're successful.

In [12]:
def random_number_dba_tuple():
    random_number, random_dba = inspections_fff.sample().iloc[0][['DOHMH PHONE', 'DOHMH DBA']]
    random_number = str(int(random_number))
    return random_number, random_dba

In [15]:
random_number_dba_tuple()

(2125878880, 'MULTI TASTES DINER')

In [127]:
yelp_phone_fetch(random_number_dba_tuple()[0])

{'Yelp Address': ['41 E 11th St'],
 'Yelp Categories': [Category(name='Japanese', alias='japanese')],
 'Yelp ID': 'ootoya-greenwich-village-new-york',
 'Yelp Is Claimed': True,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.7333107,
 'Yelp Longitude': -73.9929962,
 'Yelp Name': 'Ootoya Greenwich Village',
 'Yelp Neighborhoods': ['Greenwich Village'],
 'Yelp Rating': 4.0,
 'Yelp Review Count': 135,
 'Yelp URL': 'https://www.yelp.com/biz/ootoya-greenwich-village-new-york?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

Hey I've been there! Nice. More rigorously:

In [133]:
one_hundred_randoms = [random_number_dba_tuple() for i in range(100)]

In [138]:
testset = [yelp_phone_fetch(num) for num, placename in tqdm(one_hundred_randoms)]


  0%|                                                  | 0/100 [00:00<?, ?it/s]
  1%|▍                                         | 1/100 [00:00<00:25,  3.85it/s]
  2%|▊                                         | 2/100 [00:00<00:28,  3.45it/s]
  3%|█▎                                        | 3/100 [00:00<00:25,  3.74it/s]
  4%|█▋                                        | 4/100 [00:02<00:52,  1.84it/s]
  5%|██                                        | 5/100 [00:02<00:42,  2.25it/s]
  6%|██▌                                       | 6/100 [00:02<00:35,  2.62it/s]
  7%|██▉                                       | 7/100 [00:02<00:31,  2.99it/s]
  8%|███▎                                      | 8/100 [00:02<00:29,  3.10it/s]
  9%|███▊                                      | 9/100 [00:03<00:32,  2.78it/s]
 10%|████                                     | 10/100 [00:03<00:30,  2.94it/s]
 11%|████▌                                    | 11/100 [00:03<00:26,  3.33it/s]
 12%|████▉                             

In [145]:
np.array([entity == None for entity in testset]).astype(int).sum()

19

19 misses in 100 randoms. So we're able to match 80% of the time. Not bad as far as fuzzy matches go.

If our data is missing at random (MAR), then we are happy, because this makes for a statistically valid sample of all restaurants in New York City&mdash;we can simply drop the other values.

But there's a high likelihood, in my professional opinion, that the data is missing not at a random (MNAR). I expect that it's more likely for a restaurant or eatery in a *poor* neighborhood to be missing the phone number information in Yelp! than one in a richer, whiter community. This is a severe under-reporting issue that will invalidate any conclusions we try to make using this data with regarding to "gentrification" and whatnot.

We'll need to validate the data geospatially. We'll do that next.

Run the full Yelp! API data through, one slice at a time.

In [150]:
len(inspections_fff)

26074

In [13]:
inspections_fff.to_csv("../data/inspections_flattened_initial.csv", encoding="utf-8")

Oy vey. This will require two days of processing, since the API limit is 25000/day.

In [15]:
from tqdm import tqdm_notebook

In [14]:
del inspections
del inspections_f
del inspections_ff

In [38]:
inspections_fff['DOHMH PHONE'].iloc[2702:2705]

CAMIS
40788706    7187231080
40788884    __________
40788886    6462307208
Name: DOHMH PHONE, dtype: object

Uh, ok.

In [39]:
raw_yelp_5000 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][:5000])]




In [42]:
raw_yelp_5000_2 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][5000:10000])]

In [43]:
raw_yelp_5000_3 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][10000:15000])]

In [54]:
import pickle

with open("../data/raw_yelp_list.pkl", "wb") as f:
    pickle.dump(raw_yelp_5000+ raw_yelp_5000_2 + raw_yelp_5000_3, f)

In [18]:
import pickle

with open("../data/raw_yelp_list.pkl", "rb") as f:
    raw_yelp_1_to_3 = pickle.load(f)

In [20]:
len(raw_yelp_1_to_3)

15000

In [24]:
raw_yelp_5000_4 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][15000:20000])]

In [35]:
inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]

1646644665.0

This appears to be an invalid phone number. Parsing it through Yelp raises a reliable error, which I've now coded into the loop.

In [84]:
yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]))

There seem to be multiple such numbers in there.

In [85]:
yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 4065]))

In [87]:
raw_yelp_5000_5 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][20000:25000])]

In [88]:
raw_yelp_5000_6 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][25000:])]

In [90]:
all_raws = raw_yelp_1_to_3 + raw_yelp_5000_4 + raw_yelp_5000_5 + raw_yelp_5000_6

In [91]:
len(all_raws)

26064

With the data thus acquired, we now assign it to our dataframe. Before we do that, first we need to clean up what we have.

In [92]:
inspections_fff.sample(1)

Unnamed: 0_level_0,DOHMH DBA,DOHMH BORO,DOHMH BUILDING,DOHMH STREET,DOHMH ZIPCODE,DOHMH PHONE,DOHMH CUISINE DESCRIPTION,DOHMH INITIAL INSPECTION DATE,DOHMH ACTION,DOHMH VIOLATION CODE,DOHMH VIOLATION DESCRIPTION,DOHMH CRITICAL FLAG,DOHMH SCORE,DOHMH GRADE,DOHMH GRADE DATE,DOHMH RECORD DATE,DOHMH INITIAL INSPECTION TYPE,DOHMH LATEST INSPECTION DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
41405535,TWO BOOTS,MANHATTAN,625.0,9 AVENUE,10036,2129563000.0,Pizza,2014-06-24,Violations were cited in the following area(s).,04L,Evidence of mice or live mice present in facil...,Critical,15.0,,,09/27/2016,Cycle Inspection / Initial Inspection,2016-09-13


In [127]:
all_raws[0] = {'Yelp Address': ['1007 Morris Park Avenue'],
 'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')",
  "Category(name='Desserts', alias='desserts')"],
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': ['Morris Park'],
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27,
 'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

In [131]:
all_raws[0]

{'Yelp Address': ['1007 Morris Park Avenue'],
 'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')",
  "Category(name='Desserts', alias='desserts')"],
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': ['Morris Park'],
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27,
 'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

In [232]:
import re
import copy
re_cat_word = re.compile("name='[-\w\s&()/,']+(?=')")

def safe_group(match_obj):
    try:
        return match_obj.group().replace("name='", "")
    except AttributeError:
        return None

def format_data(yelp_dict):
    if yelp_dict:
        ret = copy.deepcopy(yelp_dict)
        if ret['Yelp Address']:
            ret['Yelp Address'] = ret['Yelp Address'][0]
        cats = ret['Yelp Categories']
        # print(cats)
        if cats:
            try:
                parsed_cats = [re.search(re_cat_word, str(cat)).group().replace("name='", "") for cat in cats]
            except:
                print(cats)
                parsed_cats = []
            # print(parsed_cats)
            ret['Yelp Categories'] = "|".join(parsed_cats)
        neighborhoods = ret['Yelp Neighborhoods']
        if neighborhoods:
            ret['Yelp Neighborhoods'] = "|".join(ret['Yelp Neighborhoods'])
        del ret['Yelp URL']
        return ret
    else:
        return None

In [235]:
formatted_yelp_data = []

for raw in tqdm_notebook(all_raws):
    formatted_yelp_data.append(format_data(raw))

[Category(name="Women's Clothing", alias='womenscloth')]
[Category(name='Department Stores', alias='deptstores'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name="Men's Clothing", alias='menscloth'), Category(name='Used, Vintage & Consignment', alias='vintage')]
[Category(name='Jewelry', alias='jewelry'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name='Motorcycle Gear', alias='motorcyclinggear'), Category(name="Men's Clothing", alias='menscloth'), Category(name='Coffee & Tea', alias='coffee')]
[Category(name="Children's Clothing", alias='childcloth'), Category(name='Ice Cream & Frozen Yogurt', alias='icecream'), Category(name='Desserts', alias='desserts')]
[Category(name='Shoe Stores', alias='shoes'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothi

In [236]:
formatted_yelp_data[0]

{'Yelp Address': '1007 Morris Park Avenue',
 'Yelp Categories': 'Bakeries|Desserts',
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': 'Morris Park',
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27}

Now assign.

In [241]:
inspections_ffff = inspections_fff.copy()

for key in formatted_yelp_data[0].keys():
    inspections_ffff[key] = [s[key] if s else None for s in formatted_yelp_data]

In [249]:
inspections_ffff.columns = [col.upper() for col in inspections_ffff.columns]

In [250]:
inspections_ffff

Unnamed: 0_level_0,DOHMH DBA,DOHMH BORO,DOHMH BUILDING,DOHMH STREET,DOHMH ZIPCODE,DOHMH PHONE,DOHMH CUISINE DESCRIPTION,DOHMH INITIAL INSPECTION DATE,DOHMH ACTION,DOHMH VIOLATION CODE,...,YELP LATITUDE,YELP ADDRESS,YELP ID,YELP RATING,YELP REVIEW COUNT,YELP IS CLAIMED,YELP LONGITUDE,YELP NEIGHBORHOODS,YELP CATEGORIES,YELP NAME
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
30075445,MORRIS PARK BAKE SHOP,BRONX,1007.0,MORRIS PARK AVE,10462,7.188925e+09,Bakery,2013-06-01,Violations were cited in the following area(s).,16B,...,40.848446,1007 Morris Park Avenue,morris-pk-bake-shop-bronx,4.5,27.0,False,-73.856079,Morris Park,Bakeries|Desserts,Morris Pk Bake Shop
30112340,WENDY'S,BROOKLYN,469.0,FLATBUSH AVENUE,11225,7.182875e+09,Hamburgers,2014-06-05,Violations were cited in the following area(s).,10B,...,40.662952,469 Flatbush Ave,wendys-brooklyn-4,2.0,23.0,False,-73.961753,Prospect Heights|Prospect Lefferts Gardens,Fast Food|Burgers,Wendy's
30191841,DJ REYNOLDS PUB AND RESTAURANT,MANHATTAN,351.0,WEST 57 STREET,10019,2.122453e+09,Irish,2013-07-22,Violations were cited in the following area(s).,10B,...,40.767750,351 W 57th St,dj-reynolds-new-york-3,3.0,75.0,False,-73.984870,Midtown West|Hell's Kitchen,Irish|Pubs,DJ Reynolds
40356018,RIVIERA CATERER,BROOKLYN,2780.0,STILLWELL AVENUE,11224,7.183723e+09,American,2013-06-05,Violations were cited in the following area(s).,10F,...,40.579521,2780 Stillwell Ave,riviera-caterers-brooklyn,4.0,23.0,True,-73.982430,Coney Island,Caterers,Riviera Caterers
40356151,BRUNOS ON THE BOULEVARD,QUEENS,8825.0,ASTORIA BOULEVARD,11369,7.183351e+09,American,2014-04-11,Violations were cited in the following area(s).,04J,...,40.764240,8825 Astoria Blvd,events-by-brunos-jackson-heights,4.0,15.0,True,-73.880410,East Elmhurst,Caterers|Venues & Event Spaces,Events By Bruno's
40356483,WILKEN'S FINE FOOD,BROOKLYN,7114.0,AVENUE U,11234,7.184444e+09,Delicatessen,2013-07-09,Violations were cited in the following area(s).,02H,...,40.619900,7114 Ave U,wilkens-ii-deli-brooklyn,3.5,25.0,False,-73.906853,Bergen Beach,Delis,Wilkens II Deli
40356731,TASTE THE TROPICS ICE CREAM,BROOKLYN,1839.0,NOSTRAND AVENUE,11226,7.188561e+09,"Ice Cream, Gelato, Yogurt, Ices",2013-07-10,Violations were cited in the following area(s).,10B,...,40.640820,1839 Nostrand Ave,taste-the-tropics-brooklyn,4.5,16.0,False,-73.948151,Flatbush,Ice Cream & Frozen Yogurt,Taste the Tropics
40357217,WILD ASIA,BRONX,2300.0,SOUTHERN BOULEVARD,10460,7.182208e+09,American,2013-06-19,Violations were cited in the following area(s).,10B,...,,,,,,,,,,
40357437,C & C CATERING SERVICE,BROOKLYN,7715.0,18 AVENUE,11214,7.182323e+09,American,2014-04-16,Violations were cited in the following area(s).,06D,...,40.611713,7715 18th Ave,c-and-c-catering-service-brooklyn,3.0,2.0,True,-73.997261,Bensonhurst,Caterers,C & C Catering Service
40359480,1 EAST 66TH STREET KITCHEN,MANHATTAN,1.0,EAST 66 STREET,10065,2.128794e+09,American,2014-05-07,Violations were cited in the following area(s).,10B,...,40.768684,1 E 66th St,wyeth-james-new-york,0.0,0.0,False,-73.969337,Upper East Side,,Wyeth James


In [251]:
inspections_ffff.to_csv("../data/yelp_dohmh_agg_data.csv", encoding='utf-8')