# NY Food review project

This notebook contains testing and scratch work

### Imports

In [1]:
%load_ext autoreload
%autoreload 2

# Import ds libraries
import pandas as pd
import numpy as np
import re

# Import acquire functions
import nick_acquire as a
import nick_prepare as prep

In [2]:
pd.set_option('display.max_columns', None)

### Data dictionary

|          feature          |                            description                           |
| ------------------------- | ---------------------------------------------------------------- |
| camis                     | Unique identifier for the restaurant                             |
| dba                       | Name of the business                                             |
| boro                      | Borough in which restaurant is located                           |
| building                  | Building number for restaurant                                   |
| street                    | Street name for establishment                                    |
| zipcode                   | Zip code for the establishment                                   |
| phone                     | Phone number for the establishment                               |
| inspection_date           | Date of the inspection of the restaurant                         |
| critical_flag             | Indicator of critical violation                                  |
| record_date               | The date when the extract was run to produce this data set       |
| latitude                  | Latitude                                                         |
| longitude                 | Longitude                                                        |
| community_board           | Local government body in the five boroughs of New York City      |
| council_district          | District of a New York City Council member                       |
| census_tract              | This is a geographic region  for the purpose of a census         |
| bin                       | This stands for Building Identification Number.                  |
| bbl                       | Borough, Block, and Lot. It's a unique real state id             |
| nta                       | Neighborhood Tabulation Area                                     |
| cuisine_description       | Describes type of cuisine at the restaurant                      |
| action                    | The actions that is associated with each restaurant inspection   |
| violation_code            | Violation code associated with establishment inspection          |
| violation_description     | Violation description associated with establishment inspection   |
| score                     | Total score for a particular inspection                          |
| grade                     | Grade associated with inspection                                 |
| grade_date                | Date when the current grade was issued                           |
| inspection_type           | Combination of the inspection program and the type of inspection |

This field represents the actions that is associated with each restaurant inspection. ; 

* Violations were cited in the following area(s). 
* No violations were recorded at the time of this inspection. 
* Establishment re-opened by DOHMH 
* Establishment re-closed by DOHMH 
* Establishment Closed by DOHMH.  Violations were cited in the following area(s) and those requiring immediate action were addressed. 
* "Missing" = not yet inspected;

In [3]:
ny = a.acquire_ny()
ny.head(3)

Unnamed: 0,camis,dba,boro,building,street,zipcode,phone,inspection_date,critical_flag,record_date,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,cuisine_description,action,violation_code,violation_description,score,grade,grade_date,inspection_type
0,50106756,UNGARO COAL FIRED PIZZA CAFE,Staten Island,1298,FOREST AVENUE,10302.0,6464690930,1900-01-01T00:00:00.000,Not Applicable,2023-10-26T06:00:14.000,40.626371,-74.133111,501.0,50.0,20100.0,5170408.0,5003870000.0,SI07,,,,,,,,
1,50105716,STELLA'S,Brooklyn,559,5 AVENUE,11215.0,4155703174,1900-01-01T00:00:00.000,Not Applicable,2023-10-26T06:00:14.000,40.665416,-73.989417,307.0,39.0,14100.0,3337750.0,3010480000.0,BK37,,,,,,,,
2,41168748,DUNKIN,Bronx,880,GARRISON AVENUE,10474.0,7188614171,2022-03-30T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,40.816753,-73.892364,202.0,17.0,9300.0,2098685.0,2027390000.0,BX27,Donuts,Violations were cited in the following area(s).,10J,Hand wash sign not posted,13.0,A,2022-03-30T00:00:00.000,Cycle Inspection / Initial Inspection


 ## Unique counts of columns within dataframe

In [4]:
ny.nunique()

camis                    28232
dba                      22114
boro                         6
building                  7479
street                    2403
zipcode                    226
phone                    25633
inspection_date           1678
critical_flag                3
record_date                  3
latitude                 23115
longitude                23115
community_board             69
council_district            51
census_tract              1183
bin                      20020
bbl                      19709
nta                        193
cuisine_description         89
action                       5
violation_code             143
violation_description      221
score                      130
grade                        6
grade_date                1455
inspection_type             31
dtype: int64

In [5]:
ny.camis.nunique()

28232

In [6]:
ny.dba.nunique()

22114

In [7]:
ny.isna().sum()

camis                         0
dba                         508
boro                          0
building                    351
street                        6
zipcode                    2680
phone                         7
inspection_date               0
critical_flag                 0
record_date                   0
latitude                    257
longitude                   257
community_board            3247
council_district           3251
census_tract               3251
bin                        4237
bbl                         573
nta                        3247
cuisine_description        2305
action                     2305
violation_code             3452
violation_description      3452
score                      9706
grade                    105753
grade_date               114506
inspection_type            2305
dtype: int64

In [8]:
ny_info = pd.DataFrame(ny.isna().sum())
ny_info['dtype'] = ny.dtypes
ny_info = ny_info.rename(columns={0:'nulls'})

In [9]:
ny_info.T

Unnamed: 0,camis,dba,boro,building,street,zipcode,phone,inspection_date,critical_flag,record_date,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,cuisine_description,action,violation_code,violation_description,score,grade,grade_date,inspection_type
nulls,0,508,0,351,6,2680,7,0,0,0,257,257,3247,3251,3251,4237,573,3247,2305,2305,3452,3452,9706,105753,114506,2305
dtype,int64,object,object,object,object,float64,object,object,object,object,float64,float64,float64,float64,float64,float64,float64,object,object,object,object,object,float64,object,object,object


In [10]:
len(ny)

207929

In [11]:
ny = a.acquire_ny()

### Drop useless columns

In [12]:
def remove_columns(ny , trash_columns = ['bin', 'bbl', 'nta', 'census_tract', 'council_district', 'community_board', 'grade_date']):
    ny = ny.drop(columns=trash_columns)
    return ny

In [13]:
trash_columns = ['bin', 'bbl', 'nta', 'census_tract', 'council_district', 'community_board', 'grade_date']
ny = ny.drop(columns=trash_columns)

### Clean phone numbers

In [14]:
def clean_phones(ny):    
    # Clean phone numbers by removing non-digit characters and dropping nulls
    ny.phone = ny.phone.str.replace(' ','')
    ny.phone = ny.phone.str.replace('_','')
    ny = ny[ny.phone.notna()]
    return ny

In [15]:
ny = clean_phones(ny)

### Clean zipcodes

In [16]:
def clean_zipcodes(ny):
    # Clean zipcodes by filling nulls with 0 and then converting to integers
    ny.zipcode = ny.zipcode.fillna(0)
    ny.zipcode = ny.zipcode.astype(int)
    ny = ny[ny.zipcode.notna()]  # Drop nulls
    return ny

In [17]:
ny = clean_zipcodes(ny)

### Clean streets

In [18]:
def clean_streets(ny):
    # Remove nulls from street
    ny = ny[ny.street.notna()]
    return ny

In [19]:
ny = clean_streets(ny)

### Clean scores

In [20]:
def clean_scores(data):
    ny = data.copy()
    ny = ny[ny.inspection_date != '1900-01-01T00:00:00.000']  # Remove all values with no inspections done
    
    # Create a new list of scores that replaces null scores for no violation for 0s
    new_scores = []  # Empty list
    for score,rep in zip(ny.score, ny.action.str.contains('No violation')):  # Loop through 2 iterable values
        if rep == True:  # If no violation, append score 0
            new_scores.append(0)
        else:  # Else keep score the same
            new_scores.append(score)
    ny.score = new_scores
    
    ny = ny[ny.score.notna()]
    return ny

In [21]:
ny = clean_scores(ny)

### Clean actions

In [22]:
def clean_actions(ny):
    # Remove nulls from action
    ny = ny[ny.action.notna()]
    # Rename actions to something more concise
    ny.action = np.where(ny.action == 'Violations were cited in the following area(s).', 'Violations cited', ny.action)
    ny.action = np.where(ny.action == 'Establishment Closed by DOHMH. Violations were cited in the following area(s) and those requiring immediate action were addressed.', 'Closed', ny.action)
    ny.action = np.where(ny.action == 'Establishment re-opened by DOHMH.', 'Re-opened', ny.action)
    ny.action = np.where(ny.action == 'No violations were recorded at the time of this inspection.', 'No violations', ny.action)
    return ny

In [23]:
ny = clean_actions(ny)

### Clean grades

In [24]:
def clean_grades(data):
    ny = data.copy()  # Create copy of df
    # Create empty list to hold new values for restaurant
    new_grades = []
    # Use scores to determine grades
    for grade, score in zip(ny.grade, ny.score):
        if score <= 13:
            new_grades.append('A')
        elif score <= 27:
            new_grades.append('B')
        elif score > 27:
            new_grades.append('C')
    ny.grade = new_grades
    return ny

In [25]:
ny = clean_grades(ny)

### Clean violation code

In [26]:
def clean_violations(data):
    ny = data.copy()
    # Create empty lists
    new_codes = []
    new_description = []
    # Loop through actions and violations and if there is no violations in action, append no violations to code and description
    for action, code, description in zip(ny.action, ny.violation_code, ny.violation_description):
        if action == 'No violations':
            new_codes.append('No violation')
            new_description.append('No violation')
        else:
            new_codes.append(code)
            new_description.append(description)
            
    # Replace df values with new ones
    ny.violation_code = new_codes
    ny.violation_description = new_description

    return ny  # Return data

In [27]:
ny = clean_violations(ny)

### Everything else

In [28]:
ny = ny.dropna()

In [29]:
ny.isna().sum()

camis                    0
dba                      0
boro                     0
building                 0
street                   0
zipcode                  0
phone                    0
inspection_date          0
critical_flag            0
record_date              0
latitude                 0
longitude                0
cuisine_description      0
action                   0
violation_code           0
violation_description    0
score                    0
grade                    0
inspection_type          0
dtype: int64

In [30]:
len(ny)

198289

In [31]:
def clean_ny(ny):
    
    """This function just takes in all other cleaning functions for ny data and cleans each element of it"""
    
    ny = remove_columns(ny)  # Removes useless columns from ny health inspection data
    
    ny = clean_phones(ny)  # Clean phone numbers
    
    ny = clean_zipcodes(ny)  # Cleans zip codes
    
    ny = clean_streets(ny)  # Cleans streets
    
    ny = clean_scores(ny)  # Cleans scores
    
    ny = clean_actions(ny)  # Cleans actions
    
    ny = clean_grades(ny)  # Cleans grades
    
    ny = clean_violations(ny)  # Cleans violation codes and descriptions
    
    ny = ny.dropna()  # Drops all remaining null values
    
    return ny  # Return clean dataframe

In [32]:
ny = a.acquire_ny()
ny = clean_ny(ny)

In [33]:
len(ny)

198289

In [34]:
ny.isna().sum()

camis                    0
dba                      0
boro                     0
building                 0
street                   0
zipcode                  0
phone                    0
inspection_date          0
critical_flag            0
record_date              0
latitude                 0
longitude                0
cuisine_description      0
action                   0
violation_code           0
violation_description    0
score                    0
grade                    0
inspection_type          0
dtype: int64

In [35]:
ny = a.acquire_ny()
ny = prep.clean_ny(ny)

In [36]:
ny.isna().sum()

camis                    0
dba                      0
boro                     0
building                 0
street                   0
zipcode                  0
phone                    0
inspection_date          0
record_date              0
latitude                 0
longitude                0
cuisine_description      0
action                   0
violation_code           0
violation_description    0
score                    0
grade                    0
dtype: int64

In [37]:
ny.nunique()

camis                    25820
dba                      20593
boro                         5
building                  7249
street                    2256
zipcode                    221
phone                    23872
inspection_date           1613
record_date                  2
latitude                 21802
longitude                21802
cuisine_description         89
action                       4
violation_code              73
violation_description      151
score                      130
grade                        3
dtype: int64

### Combine addresses

In [38]:
def combine_address(ny):
    """This function combines the addresses of the restaurants into one single feature."""
    full_addy = ny.building + ' ' + ny.street + ' ' + ny.zipcode.astype(str)  # Concat the address together
    ny['full_address'] = full_addy  # Create new feature
    ny = ny.drop(columns=['building', 'street', 'zipcode'])  # Drop old features
    return ny  # Return df

In [39]:
ny = combine_address(ny)

In [40]:
ny.head(3)

Unnamed: 0,camis,dba,boro,phone,inspection_date,record_date,latitude,longitude,cuisine_description,action,violation_code,violation_description,score,grade,full_address
2,41168748,DUNKIN,Bronx,7188614171,2022-03-30T00:00:00.000,2023-10-26T06:00:11.000,40.816753,-73.892364,Donuts,Violations cited,10J,Hand wash sign not posted,13,A,880 GARRISON AVENUE 10474
6,41688142,TABLE 87,Brooklyn,9176186100,2017-01-25T00:00:00.000,2023-10-26T06:00:11.000,40.683447,-73.975691,Pizza,No violations,No violation,No violation,0,A,620 ATLANTIC AVENUE 11217
18,50100336,SUBWAY,Brooklyn,7186808808,2022-04-05T00:00:00.000,2023-10-26T06:00:11.000,40.622569,-74.031412,Sandwiches,Violations cited,09B,Thawing procedures improper.,10,A,8711 3 AVENUE 11209


### Aggregate violations

In [41]:
def aggregate_violations(ny):
    """This function will aggregate all rows for each inspection for each restaurant into on row by combining the violations."""
    # Create aggregated df indexed by camis and inspection_date
    agg_violations = ny.groupby(['camis','inspection_date']).agg({'violation_code': lambda x: x.tolist(),
                                                                  'violation_description':lambda x: x.tolist()})
    # Create separate df without code & description
    ny2 = ny.drop(columns=['violation_code', 'violation_description']).copy()
    ny2 = ny2.drop_duplicates()  # Drop duplicates
    
    # Create empty lists
    agg_data_code = []
    agg_data_description = []
    
    # Loop through df without duplicates and create lists of aggregated violations
    for cam, date in zip(ny2.camis, ny2.inspection_date):
        agg_data_code.append(agg_violations.loc[(cam, date)][0])
        agg_data_description.append(agg_violations.loc[(cam, date)][1])
        
    # Insert new, aggregated violations into df
    ny2['violation_code'] = agg_data_code
    ny2['violation_description'] = agg_data_description
    
    return ny2

In [42]:
ny_test = aggregate_violations(ny)

In [102]:
ny_test.head(3)

Unnamed: 0,camis,dba,boro,phone,inspection_date,record_date,latitude,longitude,cuisine_description,action,score,grade,full_address,violation_code,violation_description
2,41168748,DUNKIN,Bronx,7188614171,2022-03-30T00:00:00.000,2023-10-26T06:00:11.000,40.816753,-73.892364,Donuts,Violations cited,13,A,880 GARRISON AVENUE 10474,"[10J, 04N, 08A]","[Hand wash sign not posted, Filth flies or foo..."
6,41688142,TABLE 87,Brooklyn,9176186100,2017-01-25T00:00:00.000,2023-10-26T06:00:11.000,40.683447,-73.975691,Pizza,No violations,0,A,620 ATLANTIC AVENUE 11217,[No violation],[No violation]
18,50100336,SUBWAY,Brooklyn,7186808808,2022-04-05T00:00:00.000,2023-10-26T06:00:11.000,40.622569,-74.031412,Sandwiches,Violations cited,10,A,8711 3 AVENUE 11209,"[09B, 10F, 06D]","[Thawing procedures improper., Non-food contac..."


In [77]:
ny3 = a.acquire_ny()

In [52]:
def check_data(n, ny, ny2):
    c, d = ny2.iloc[n][0], ny2.iloc[n][4]
    print(ny2.iloc[n][0], ny2.iloc[n][4], ny2.iloc[n][-1])
    print(ny2.iloc[n][-2])
    return ny[(ny.camis == c) & (ny.inspection_date == d)]

In [113]:
check_data(14, ny3, ny_test)

50116983 2021-11-23T00:00:00.000 ['Thawing procedures improper.', "Live roaches present in facility's food and/or non-food areas.", 'Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.', 'Personal cleanliness inadequate. Outer garment soiled with possible contaminant. Effective hair restraint not worn in an area where food is prepared.', 'No violation', 'Filth flies or food/refuse/sewage-associated (FRSA) flies present in facility’s food and/or non-food areas.  Filth flies include house flies, little house flies, blow flies, bottle flies and flesh flies.  Food/refuse/sewage-associated flies include fruit flies, drain flies and Phorid flies.', 'Plumbing not properly installed or maintained; anti-siphonage or backflow prevention device not provided where required; equipment or floor not properly drained; sewage disposal system in disrepair or not functioning properly.', 'Non-food contact surface improperly con

Unnamed: 0,camis,dba,boro,building,street,zipcode,phone,inspection_date,critical_flag,record_date,latitude,longitude,community_board,council_district,census_tract,bin,bbl,nta,cuisine_description,action,violation_code,violation_description,score,grade,grade_date,inspection_type
65,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,09B,Thawing procedures improper.,20.0,,,Pre-permit (Operational) / Initial Inspection
75847,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,04M,Live roaches present in facility's food and/or...,20.0,,,Pre-permit (Operational) / Initial Inspection
87141,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,08A,Facility not vermin proof. Harborage or condit...,20.0,,,Pre-permit (Operational) / Initial Inspection
94183,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,06A,Personal cleanliness inadequate. Outer garment...,20.0,,,Pre-permit (Operational) / Initial Inspection
129058,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Not Applicable,2023-10-26T06:00:13.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,No violations were recorded at the time of thi...,,,,,,Administrative Miscellaneous / Initial Inspection
137129,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,04N,Filth flies or food/refuse/sewage-associated (...,20.0,,,Pre-permit (Operational) / Initial Inspection
150574,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,10B,Plumbing not properly installed or maintained;...,20.0,,,Pre-permit (Operational) / Initial Inspection
199662,50116983,EL BASURERO BAR REST.,Queens,3217,STEINWAY ST,11103.0,7185457077,2021-11-23T00:00:00.000,Not Critical,2023-10-26T06:00:11.000,40.758499,-73.919364,401.0,22.0,15700.0,4011015.0,4006760000.0,QN70,Spanish,Violations were cited in the following area(s).,10F,Non-food contact surface improperly constructe...,20.0,,,Pre-permit (Operational) / Initial Inspection


In [194]:
def clean_phones(ny):
    ny = ny[ny.phone.notna()]

    new_phone = []

    for phone in ny.phone:
        new_phone.append(re.sub(r'\D', '', phone))
    ny.phone = new_phone

    newer_phones = [phone if len(phone) > 1 else '0' for phone in ny.phone]

    ny.phone = newer_phones

    ny['phone'] = pd.to_numeric(ny['phone'], errors='coerce')
    # Convert it to an integer
    ny['phone'] = ny['phone'].astype(int)
    return ny

In [134]:
import re

In [247]:
ny = a.acquire_ny()
ny = prep.clean_ny(ny)

ny = aggregate_violations(ny)

In [243]:
check_data(14, ny, ny2)

40401445 2022-01-13T00:00:00.000 ["Live roaches present in facility's food and/or non-food areas.", 'Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.', 'Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.', 'Hot food item not held at or above 140º F.', 'Sanitized equipment or utensil, including in-use food dispensing utensil, improperly used or stored.', 'Pesticide use not in accordance with label or applicable laws. Prohibited chemical used/stored. Open bait station used.', 'Plumbing not properly installed or maintained; anti-siphonage or backflow prevention device not provided where required; equipment or floor not properly drained; sewage disposal system in disrepair or not functioning 

Unnamed: 0,camis,dba,boro,phone,inspection_date,latitude,longitude,cuisine_description,action,violation_code,violation_description,score,grade,full_address
14,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,04M,Live roaches present in facility's food and/or...,29,C,2101 RALPH AVENUE 11234
36777,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,10F,Non-food contact surface improperly constructe...,29,C,2101 RALPH AVENUE 11234
66904,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,08A,Facility not vermin proof. Harborage or condit...,29,C,2101 RALPH AVENUE 11234
78083,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,02B,Hot food item not held at or above 140º F.,29,C,2101 RALPH AVENUE 11234
79454,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,06E,"Sanitized equipment or utensil, including in-u...",29,C,2101 RALPH AVENUE 11234
110619,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,08C,Pesticide use not in accordance with label or ...,29,C,2101 RALPH AVENUE 11234
118637,40401445,IHOP,Brooklyn,7182512262,2022-01-13T00:00:00.000,40.625986,-73.917933,Pancakes/Waffles,Violations cited,10B,Plumbing not properly installed or maintained;...,29,C,2101 RALPH AVENUE 11234


In [218]:
def clean_code(ny):
    
    clean_codes = []
    clean_description = []
    
    for row1, row2 in zip(ny.violation_code, ny.violation_description):
        
        code_list1 = row1
        code_list2 = row2
        
        if len(code_list1) > 1 and 'No violation' in code_list1:
            code_list1.remove('No violation')
            clean_codes.append(code_list1)
        else:
            clean_codes.append(code_list1)
            
        if len(code_list2) > 1 and 'No violation' in code_list2:
            code_list2.remove('No violation')
            clean_description.append(code_list2)
        else:
            clean_description.append(code_list2)
            
    ny.violation_code = clean_codes
    ny.violation_description = clean_description
    
    return ny

In [244]:
ny = clean_code(ny2)

In [220]:
check_data(14, ny, ny2)

50116983 2021-11-23T00:00:00.000 ['Thawing procedures improper.', "Live roaches present in facility's food and/or non-food areas.", 'Facility not vermin proof. Harborage or conditions conducive to attracting vermin to the premises and/or allowing vermin to exist.', 'Personal cleanliness inadequate. Outer garment soiled with possible contaminant. Effective hair restraint not worn in an area where food is prepared.', 'Filth flies or food/refuse/sewage-associated (FRSA) flies present in facility’s food and/or non-food areas.  Filth flies include house flies, little house flies, blow flies, bottle flies and flesh flies.  Food/refuse/sewage-associated flies include fruit flies, drain flies and Phorid flies.', 'Plumbing not properly installed or maintained; anti-siphonage or backflow prevention device not provided where required; equipment or floor not properly drained; sewage disposal system in disrepair or not functioning properly.', 'Non-food contact surface improperly constructed. Unacce

Unnamed: 0,camis,dba,boro,phone,inspection_date,latitude,longitude,cuisine_description,action,score,grade,full_address,violation_code,violation_description
14,50116983,EL BASURERO BAR REST.,Queens,7185457077,2021-11-23T00:00:00.000,40.758499,-73.919364,Spanish,Violations cited,20,B,3217 STEINWAY ST 11103,"[09B, 04M, 08A, 06A, 04N, 10B, 10F]","[Thawing procedures improper., Live roaches pr..."
122950,50116983,EL BASURERO BAR REST.,Queens,7185457077,2021-11-23T00:00:00.000,40.758499,-73.919364,Spanish,No violations,0,A,3217 STEINWAY ST 11103,"[09B, 04M, 08A, 06A, 04N, 10B, 10F]","[Thawing procedures improper., Live roaches pr..."


In [201]:
def join_lists(ny):    
    joined_code = []
    joined_description = []

    for row in ny.violation_code:
        joined_code.append(' '.join(row))

    for row in ny.violation_description:
        joined_description.append(' '.join(row))
    
    ny.violation_code = joined_code
    ny.violation_description = joined_description
    
    return ny

In [245]:
ny = join_lists(ny)

In [246]:
ny

Unnamed: 0,camis,dba,boro,phone,inspection_date,latitude,longitude,cuisine_description,action,score,grade,full_address,violation_code,violation_description
0,41168748,DUNKIN,Bronx,7188614171,2022-03-30T00:00:00.000,40.816753,-73.892364,Donuts,Violations cited,13,A,880 GARRISON AVENUE 10474,10J 04N 08A,Hand wash sign not posted Filth flies or food/...
1,41688142,TABLE 87,Brooklyn,9176186100,2017-01-25T00:00:00.000,40.683447,-73.975691,Pizza,No violations,0,A,620 ATLANTIC AVENUE 11217,No violation,No violation
2,50100336,SUBWAY,Brooklyn,7186808808,2022-04-05T00:00:00.000,40.622569,-74.031412,Sandwiches,Violations cited,10,A,8711 3 AVENUE 11209,09B 10F 06D,Thawing procedures improper. Non-food contact ...
3,50086686,GERTIE,Brooklyn,7186360902,2021-08-25T00:00:00.000,40.712360,-73.955419,American,No violations,0,A,58 MARCY AVENUE 11211,No violation,No violation
4,50081121,DUNKIN,Brooklyn,7182729090,2022-04-04T00:00:00.000,40.666827,-73.871606,Donuts,Violations cited,24,B,2492 LINDEN BOULEVARD 11208,10J 02G 04N 10F 08A,Hand wash sign not posted Cold food item held ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197600,50001313,NEW GOLDEN RESTAURANT,Brooklyn,7184346377,2021-08-11T00:00:00.000,40.634287,-73.949185,Chinese,Violations cited,5,A,1483 FLATBUSH AVENUE 11210,10F,Non-food contact surface improperly constructe...
197610,50064419,BURGER JOINT (INDUSTRY CITY FOOD HALL BUILDING 2),Brooklyn,7188018393,2023-03-24T00:00:00.000,40.656054,-74.007334,Hamburgers,Violations cited,3,A,220 36 STREET 11232,10F,Non-food contact surface or equipment made of ...
197660,50093964,DOWNSTEIN DINING HALL @ NYU,Manhattan,2129953095,2022-07-11T00:00:00.000,40.730917,-73.995364,American,Violations cited,2,A,5 UNIVERSITY PLACE 10003,10F,Non-food contact surface or equipment made of ...
197665,50103447,PLAYA BOWLS,Manhattan,9172315259,2021-08-11T00:00:00.000,40.756918,-73.972066,"Juice, Smoothies, Fruit Salads",Violations cited,2,A,570 LEXINGTON AVENUE 10022,10H,Proper sanitization not provided for utensil w...


In [254]:
ny = prep.final_ny()
ny.head()

Unnamed: 0,camis,dba,boro,phone,inspection_date,latitude,longitude,cuisine_description,action,score,grade,full_address,violation_code,violation_description
0,41168748,DUNKIN,Bronx,7188614171,2022-03-30T00:00:00.000,40.816753,-73.892364,Donuts,Violations cited,13,A,880 GARRISON AVENUE 10474,10J 04N 08A,Hand wash sign not posted Filth flies or food/...
1,41688142,TABLE 87,Brooklyn,9176186100,2017-01-25T00:00:00.000,40.683447,-73.975691,Pizza,No violations,0,A,620 ATLANTIC AVENUE 11217,No violation,No violation
2,50100336,SUBWAY,Brooklyn,7186808808,2022-04-05T00:00:00.000,40.622569,-74.031412,Sandwiches,Violations cited,10,A,8711 3 AVENUE 11209,09B 10F 06D,Thawing procedures improper. Non-food contact ...
3,50086686,GERTIE,Brooklyn,7186360902,2021-08-25T00:00:00.000,40.71236,-73.955419,American,No violations,0,A,58 MARCY AVENUE 11211,No violation,No violation
4,50081121,DUNKIN,Brooklyn,7182729090,2022-04-04T00:00:00.000,40.666827,-73.871606,Donuts,Violations cited,24,B,2492 LINDEN BOULEVARD 11208,10J 02G 04N 10F 08A,Hand wash sign not posted Cold food item held ...


In [255]:
ny

Unnamed: 0,camis,dba,boro,phone,inspection_date,latitude,longitude,cuisine_description,action,score,grade,full_address,violation_code,violation_description
0,41168748,DUNKIN,Bronx,7188614171,2022-03-30T00:00:00.000,40.816753,-73.892364,Donuts,Violations cited,13,A,880 GARRISON AVENUE 10474,10J 04N 08A,Hand wash sign not posted Filth flies or food/...
1,41688142,TABLE 87,Brooklyn,9176186100,2017-01-25T00:00:00.000,40.683447,-73.975691,Pizza,No violations,0,A,620 ATLANTIC AVENUE 11217,No violation,No violation
2,50100336,SUBWAY,Brooklyn,7186808808,2022-04-05T00:00:00.000,40.622569,-74.031412,Sandwiches,Violations cited,10,A,8711 3 AVENUE 11209,09B 10F 06D,Thawing procedures improper. Non-food contact ...
3,50086686,GERTIE,Brooklyn,7186360902,2021-08-25T00:00:00.000,40.712360,-73.955419,American,No violations,0,A,58 MARCY AVENUE 11211,No violation,No violation
4,50081121,DUNKIN,Brooklyn,7182729090,2022-04-04T00:00:00.000,40.666827,-73.871606,Donuts,Violations cited,24,B,2492 LINDEN BOULEVARD 11208,10J 02G 04N 10F 08A,Hand wash sign not posted Cold food item held ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197600,50001313,NEW GOLDEN RESTAURANT,Brooklyn,7184346377,2021-08-11T00:00:00.000,40.634287,-73.949185,Chinese,Violations cited,5,A,1483 FLATBUSH AVENUE 11210,10F,Non-food contact surface improperly constructe...
197610,50064419,BURGER JOINT (INDUSTRY CITY FOOD HALL BUILDING 2),Brooklyn,7188018393,2023-03-24T00:00:00.000,40.656054,-74.007334,Hamburgers,Violations cited,3,A,220 36 STREET 11232,10F,Non-food contact surface or equipment made of ...
197660,50093964,DOWNSTEIN DINING HALL @ NYU,Manhattan,2129953095,2022-07-11T00:00:00.000,40.730917,-73.995364,American,Violations cited,2,A,5 UNIVERSITY PLACE 10003,10F,Non-food contact surface or equipment made of ...
197665,50103447,PLAYA BOWLS,Manhattan,9172315259,2021-08-11T00:00:00.000,40.756918,-73.972066,"Juice, Smoothies, Fruit Salads",Violations cited,2,A,570 LEXINGTON AVENUE 10022,10H,Proper sanitization not provided for utensil w...


In [251]:
ny.to_csv('clean_ny.csv', index=False)