# **NYC Restaurant Inspections Data Processing**

Joe Lardie

Sep 2023

# **Imports**

In [1]:
#Numpy
import numpy as np

#Pandas
import pandas as pd

#Seaborn
import seaborn as sns

# **Data Dictionary**
**CAMIS**	This is an unique identifier for the entity (restaurant); 10-digit integer, static per restaurant permit

**DBA**	This field represents the name (doing business as) of the entity (restaurant); Public business name, may change at discretion of restaurant owner

**BORO**	Borough in which the entity (restaurant) is located.;• 1 = MANHATTAN • 2 = BRONX • 3 = BROOKLYN • 4 = QUEENS • 5 = STATEN ISLAND • Missing; NOTE: There may be discrepancies between zip code and listed boro due to differences in an establishment's mailing address and physical location

**BUILDING**  Building number for establishment (restaurant) location

**STREET**	Street name for establishment (restaurant) location

**ZIPCODE**	Zip code of establishment (restaurant) location

**PHONE**	Phone Number; Phone number provided by restaurant owner/manager

**CUISINE DESCRIPTION**	This field describes the entity (restaurant) cuisine. ; Optional field provided by provided by restaurant owner/manager

**INSPECTION DATE**	This field represents the date of inspection; NOTE: Inspection dates of 1/1/1900 mean an establishment has not yet had an inspection

**ACTION**	This field represents the actions that is associated with each restaurant inspection. ; • Violations were cited in the following area(s). • No violations were recorded at the time of this inspection. • Establishment re-opened by DOHMH • Establishment re-closed by DOHMH • Establishment Closed by DOHMH. Violations were cited in the following area(s) and those requiring immediate action were addressed. • "Missing" = not yet inspected;

**VIOLATION CODE**	Violation code associated with an establishment (restaurant) inspection

**VIOLATION DESCRIPTION**Violation description associated with an establishment (restaurant) inspection

**CRITICAL FLAG**	Indicator of critical violation; "• Critical • Not Critical • Not Applicable"; Critical violations are those most likely to contribute to food-borne illness

**SCORE**Total score for a particular inspection; Scores are updated based on adjudication results

**GRADE**	Grade associated with the inspection; • N = Not Yet Graded• A = Grade A• B = Grade B• C = Grade C• Z = Grade Pending• P= Grade Pending issued on re-opening following an initial inspection that resulted in a closure

**GRADE DATE**	The date when the current grade was issued to the entity (restaurant)

**RECORD DATE**	The date when the extract was run to produce this data set

**INSPECTION TYPE** A combination of the inspection program and the type of inspection performed; See Data Dictionary for full list of expected values


# **Loading Data**

In [2]:
# Loading NYC Restaraunt Data
rdf = pd.read_csv('https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv?accessType=DOWNLOAD')

In [3]:
# Looking at the data
rdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242682 entries, 0 to 242681
Data columns (total 27 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   CAMIS                  242682 non-null  int64  
 1   DBA                    241853 non-null  object 
 2   BORO                   242682 non-null  object 
 3   BUILDING               242284 non-null  object 
 4   STREET                 242668 non-null  object 
 5   ZIPCODE                239855 non-null  float64
 6   PHONE                  242679 non-null  object 
 7   CUISINE DESCRIPTION    239995 non-null  object 
 8   INSPECTION DATE        242682 non-null  object 
 9   ACTION                 239995 non-null  object 
 10  VIOLATION CODE         238662 non-null  object 
 11  VIOLATION DESCRIPTION  238662 non-null  object 
 12  CRITICAL FLAG          242682 non-null  object 
 13  SCORE                  230796 non-null  float64
 14  GRADE                  116932 non-nu

# **Data Cleaning**

In [4]:
# Dropping irrelevant of imcomplete columns
rdf.drop(['Location Point1', 'PHONE', 'BBL', 'BIN', 'NTA', 'Census Tract', 'Community Board', 'RECORD DATE'], axis=1, inplace=True)

In [5]:
# Looking at remaining columns
rdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 242682 entries, 0 to 242681
Data columns (total 19 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   CAMIS                  242682 non-null  int64  
 1   DBA                    241853 non-null  object 
 2   BORO                   242682 non-null  object 
 3   BUILDING               242284 non-null  object 
 4   STREET                 242668 non-null  object 
 5   ZIPCODE                239855 non-null  float64
 6   CUISINE DESCRIPTION    239995 non-null  object 
 7   INSPECTION DATE        242682 non-null  object 
 8   ACTION                 239995 non-null  object 
 9   VIOLATION CODE         238662 non-null  object 
 10  VIOLATION DESCRIPTION  238662 non-null  object 
 11  CRITICAL FLAG          242682 non-null  object 
 12  SCORE                  230796 non-null  float64
 13  GRADE                  116932 non-null  object 
 14  GRADE DATE             107898 non-nu

In [6]:
# Saving dataset to upload into other notebooks
rdf.to_csv('rdf', index=False)