# NY Food review project

This notebook contains testing and scratch work

### Imports

In [131]:
%load_ext autoreload
%autoreload 2

# Import ds libraries
import pandas as pd
import numpy as np
import re

# Import acquire functions
import nick_acquire as a

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Data dictionary

|          feature          |                            description                           |
| ------------------------- | ---------------------------------------------------------------- |
| camis                     | Unique identifier for the restaurant                             |
| dba                       | Name of the business                                             |
| boro                      | Borough in which restaurant is located                           |
| building                  | Building number for restaurant                                   |
| street                    | Street name for establishment                                    |
| zipcode                   | Zip code for the establishment                                   |
| phone                     | Phone number for the establishment                               |
| inspection_date           | Date of the inspection of the restaurant                         |
| critical_flag             | Indicator of critical violation                                  |
| record_date               | The date when the extract was run to produce this data set       |
| latitude                  | Latitude                                                         |
| longitude                 | Longitude                                                        |
| community_board           | ?                                                                |
| council_district          | ?                                                                |
| census_tract              | This is a geographic region defined for the purpose of taking a census|
| bin                       | This stands for Building Identification Number.                  |
| bbl                       | Borough, Block, and Lot. It's a unique real state id             |
| nta                       | Neighborhood Tabulation Area                                     |
| cuisine_description       | Describes type of cuisine at the restaurant                      |
| action                    | The actions that is associated with each restaurant inspection   |
| violation_code            | Violation code associated with establishment inspection          |
| violation_description     | Violation description associated with establishment inspection   |
| score                     | Total score for a particular inspection                          |
| grade                     | Grade associated with inspection                                 |
| grade_date                | Date when the current grade was issued                           |
| inspection_type           | Combination of the inspection program and the type of inspection |

This field represents the actions that is associated with each restaurant inspection. ;
• Violations were cited in the following area(s). 
• No violations were recorded at the time of this inspection. • Establishment re-opened by DOHMH 
• Establishment re-closed by DOHMH 
• Establishment Closed by DOHMH.  Violations were cited in the following area(s) and those requiring immediate action were addressed. 
• "Missing" = not yet inspected;

Grade associated with the inspection; 
• N = Not Yet Graded
• A = Grade A
• B = Grade B
• C = Grade C
• Z = Grade Pending
• P = Grade Pending issued on re-opening following an initial inspection that resulted in a closure

In [134]:
ny = a.acquire_ny()
ny.head(3)

Unnamed: 0,camis,dba,boro,building,street,zipcode,phone,inspection_date,critical_flag,record_date,...,bbl,nta,cuisine_description,action,violation_code,violation_description,score,grade,grade_date,inspection_type
0,50106756,UNGARO COAL FIRED PIZZA CAFE,Staten Island,1298,FOREST AVENUE,10302.0,6464690930,1900-01-01T00:00:00.000,Not Applicable,2023-10-26T06:00:14.000,...,5003870000.0,SI07,,,,,,,,
1,50105716,STELLA'S,Brooklyn,559,5 AVENUE,11215.0,4155703174,1900-01-01T00:00:00.000,Not Applicable,2023-10-26T06:00:14.000,...,3010480000.0,BK37,,,,,,,,
2,41168748,DUNKIN,Bronx,880,GARRISON AVENUE,10474.0,7188614171,2022-03-30T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,...,2027390000.0,BX27,Donuts,Violations were cited in the following area(s).,10J,Hand wash sign not posted,13.0,A,2022-03-30T00:00:00.000,Cycle Inspection / Initial Inspection


In [135]:
ny.camis.nunique()

28232

In [181]:
ny.dba.nunique()

22114

In [177]:
ny.isna().sum()

camis                         0
dba                         508
boro                          0
building                    351
street                        6
zipcode                       0
phone                         7
inspection_date               0
critical_flag                 0
record_date                   0
latitude                    257
longitude                   257
community_board            3247
council_district           3251
census_tract               3251
bin                        4237
bbl                         573
nta                        3247
cuisine_description        2305
action                     2305
violation_code             3452
violation_description      3452
score                      9706
grade                    105753
grade_date               114506
inspection_type            2305
dtype: int64

In [136]:
ny_info = pd.DataFrame(ny.isna().sum())
ny_info['dtype'] = ny.dtypes
ny_info = ny_info.rename(columns={0:'nulls'})

In [137]:
ny_info.T

Unnamed: 0,camis,dba,boro,building,street,zipcode,phone,inspection_date,critical_flag,record_date,...,bbl,nta,cuisine_description,action,violation_code,violation_description,score,grade,grade_date,inspection_type
nulls,0,508,0,351,6,2680,7,0,0,0,...,573,3247,2305,2305,3452,3452,9706,105753,114506,2305
dtype,int64,object,object,object,object,float64,object,object,object,object,...,float64,object,object,object,object,object,float64,object,object,object


In [138]:
len(ny)

207929

In [147]:
# Clean phone numbers by removing non-digit characters and dropping nulls
ny.phone = ny.phone.str.replace(' ','')
ny.phone = ny.phone.str.replace('_','')
ny.phone = ny.phone.dropna()

In [148]:
# Clean zipcodes by filling nulls with 0 and then converting to integers
ny.zipcode = ny.zipcode.fillna(0)
ny.zipcode = ny.zipcode.astype(int)

In [165]:
ny.score[ny.grade == 'A'].value_counts()

12.0    16197
13.0    14803
10.0     8024
11.0     7012
9.0      6614
7.0      5062
8.0      2633
0.0      2540
5.0      2413
2.0      2229
4.0      1633
6.0       948
3.0       787
43.0        6
19.0        3
16.0        2
Name: score, dtype: int64

In [178]:
ny.score[ny.grade == 'B'].value_counts()

27.0    1172
23.0    1051
21.0    1049
26.0    1003
25.0     909
19.0     808
18.0     806
16.0     792
24.0     786
22.0     721
20.0     638
14.0     599
17.0     466
15.0     264
9.0       11
0.0        8
30.0       7
33.0       6
42.0       5
12.0       3
7.0        2
5.0        2
Name: score, dtype: int64

In [180]:
ny.score[ny.grade == 'C'].value_counts()

28.0    403
30.0    392
33.0    354
31.0    339
35.0    307
       ... 
20.0      3
16.0      3
14.0      3
12.0      3
6.0       2
Name: score, Length: 78, dtype: int64

In [176]:
ny[ny.score.isna()].inspection_date.value_counts()

1900-01-01T00:00:00.000    2305
2023-04-24T00:00:00.000      38
2023-03-22T00:00:00.000      36
2023-04-26T00:00:00.000      36
2023-02-02T00:00:00.000      35
                           ... 
2023-01-21T00:00:00.000       1
2021-10-19T00:00:00.000       1
2018-03-02T00:00:00.000       1
2018-04-20T00:00:00.000       1
2021-01-12T00:00:00.000       1
Name: inspection_date, Length: 998, dtype: int64

In [161]:
for uniq in ny.grade.unique():
    mini = ny.score[(ny.grade == uniq) & (ny.grade.notna())].mean()
    maxi = ny.score[ny.grade == uniq].max()
    print(f' {uniq}   {mini}   {maxi} ')

 nan   nan   nan 
 A   9.754463656108086   43.0 
 P   8.745644599303136   27.0 
 Z   28.856806959793087   92.0 
 N   28.547915476870358   168.0 
 B   21.400072020165645   42.0 
 C   40.95534897896982   102.0 


In [152]:
ny.grade.value_counts()

A    70908
B    11108
N     8770
C     6563
Z     4253
P      574
Name: grade, dtype: int64

In [151]:
ny[['score', 'grade']]

Unnamed: 0,score,grade
0,,
1,,
2,13.0,A
3,,
4,,
...,...,...
207924,7.0,A
207925,26.0,Z
207926,0.0,A
207927,48.0,C
