## Identify The Most Frequent Causes of Failed Food inspections in Chicago Using Natural Language Processing (NLP)

* **Project Goal:**

This project aims to promote the food safety in Chicago by identifying the top 10 most frequent causes of failed food inspections in Chicago. It will also shows the top 10 least frequent causes as additional discovery of tendency and pattern that causes failed food inspections.

* **Data Source:**

Chicago Data Portal - Fodd inspections of restaurants and other food establishments in Chicago from January 1, 2010 to August 4, 2022. Inspections are performed by staff from the Chicago Department of Public Health’s Food Protection Program.
https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5

In [22]:
import pandas as pd
import numpy as np
import re
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Load The Data
df = pd.read_csv("/Users/azizhazeinita/Project/NLP/Food_Inspections.csv")
df.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location
0,2561879,SOLITA,SOLITA,2857386.0,Restaurant,Risk 3 (Low),431 N WELLS ST,CHICAGO,IL,60654.0,08/09/2022,License,Pass,,41.890025,-87.633882,"(-87.63388240330882, 41.89002468212031)"
1,2559417,CHEZ JOEL,CHEZ JOEL,32359.0,Restaurant,Risk 1 (High),1119 W TAYLOR ST,CHICAGO,IL,60607.0,06/15/2022,Non-Inspection,No Entry,,41.869332,-87.655107,"(-87.65510678669794, 41.86933227916697)"
2,2559999,ROJO GUSANO,ROJO GUSANO,1305286.0,Restaurant,Risk 1 (High),3830 W LAWRENCE AVE,CHICAGO,IL,60625.0,06/28/2022,Canvass,Out of Business,,41.96839,-87.724448,"(-87.72444785924317, 41.968390431264375)"
3,2554552,HOST INTERNATIONAL B05,LA TAPENADE (T1-B5),34203.0,Restaurant,Risk 1 (High),11601 W TOUHY AVE,CHICAGO,IL,60666.0,04/20/2022,Canvass,Pass,,42.008536,-87.914428,"(-87.91442843927047, 42.008536400868735)"
4,2553925,LITTLE HARVARD ACADEMY,LITTLE HARVARD ACADEMY,2215573.0,Daycare (2 - 6 Years),Risk 1 (High),2708 W PETERSON AVE,CHICAGO,IL,60659.0,04/05/2022,Canvass,Out of Business,,41.990564,-87.697365,"(-87.69736479750581, 41.99056361928264)"


### Selecting only the records corresponding to failed inspection ("Results" column)

In [4]:
# Filtering Dataframe Only With Value 'Fail' in Column 'Results'
df_fail = df[df.Results=='Fail']
df_fail.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location
20,2561623,LA CHOZA MEXICAN GRILL,LA CHOZA MEXICAN GRILL,1840862.0,Restaurant,Risk 1 (High),7022 N CLARK ST,CHICAGO,IL,60626.0,08/02/2022,Canvass,Fail,9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE...,42.009698,-87.674274,"(-87.67427432484828, 42.009697980488845)"
50,2560320,iO Theater,iO Theater,2850833.0,Restaurant,Risk 3 (Low),1501-1519 N KINGSBURY ST,CHICAGO,IL,60642.0,07/06/2022,License,Fail,,41.908306,-87.6518,"(-87.65179957092685, 41.90830578569197)"
109,2556045,REGGIE'S ON THE BEACH,REGGIE'S ON THE BEACH,2840426.0,Restaurant,Risk 3 (Low),6245 S LAKE SHORE DR,CHICAGO,IL,60637.0,05/19/2022,License,Fail,,41.780804,-87.574616,"(-87.57461558340579, 41.780803899394144)"
135,2555232,HOT SEAFOOD MARKET INC.,HOT SEAFOOD MARKET INC.,2840868.0,Restaurant,Risk 2 (Medium),9454 S COTTAGE GROVE AVE,CHICAGO,IL,60619.0,05/04/2022,License,Fail,2. CITY OF CHICAGO FOOD SERVICE SANITATION CER...,41.722519,-87.604637,"(-87.60463680899302, 41.72251869461573)"
141,2555124,PRIME FISH,PRIME FISH,2463968.0,Restaurant,Risk 1 (High),8022 S HALSTED ST,CHICAGO,IL,60620.0,05/02/2022,Canvass,Fail,"1. PERSON IN CHARGE PRESENT, DEMONSTRATES KNOW...",41.748104,-87.644115,"(-87.64411513161956, 41.74810405559482)"


### Cleaning the data to make sure that there are no NaNs in "Violations" column
* "Violations" column lists the reasons for inspection failure

In [6]:
df_fail.Violations[20]

'9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED - Comments: OBSERVED THE EMPLOYEE SLICING FRESH TOMATOES AND ASSEMBLING READY-TO-EAT LETTUCE, ONION, AND TOMATOES FOR A SALAD WITH BARE HANDS. REVIEWED NO BARE HAND CONTACT WITH READY-TO-EAT FOODS. ALL FOODS DISCARDED. PRIORITY VIOLATION 7-38-010 CITATION ISSUED. | 15. FOOD SEPARATED AND PROTECTED - Comments: OBSERVED RAW CHICKEN ON THE COOK LINE CUTTING BOARD DIRECTLY NEXT TO FRESH READY-TO-EAT TOMATOES AND AVOCADO THAT WERE BEING SLICED AND PREPARED. ALSO OBSERVED WERE RAW EGGS STORED DIRECTLY ON TOP OF READY-TO-EAT TORTILLAS, ROASTED PEPPERS, AND RAW ONIONS IN THE TOP REFRIGERATION AREA OF THE COOKS LINE COOLER. ALL FOODS WERE REMOVED AND DISCARDED. PRIORITY VIOLATION 7-38-005 CITATION ISSUED. | 23. PROPER DATE MARKING AND DISPOSITION - Comments: OBSERVED ALL PREPARED TCS READY TO EAT FOODS THROUGHOUT ALL COOLERS NOT LABELED. NO EXPIRATION DATES OR PREP DATES. FOOD SUCH AS COOKED RICE, FL

* Those reasons in "Violations" column are separated by "|". Each reason consists of a regulation code, regulation description and comments describing how the regulation was violated.

In [7]:
# Drop Null Value in All Columns Based On The Null Value In Column 'Violations'
df_clean = df_fail.dropna(subset=['Violations'])
df_clean.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location
20,2561623,LA CHOZA MEXICAN GRILL,LA CHOZA MEXICAN GRILL,1840862.0,Restaurant,Risk 1 (High),7022 N CLARK ST,CHICAGO,IL,60626.0,08/02/2022,Canvass,Fail,9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE...,42.009698,-87.674274,"(-87.67427432484828, 42.009697980488845)"
135,2555232,HOT SEAFOOD MARKET INC.,HOT SEAFOOD MARKET INC.,2840868.0,Restaurant,Risk 2 (Medium),9454 S COTTAGE GROVE AVE,CHICAGO,IL,60619.0,05/04/2022,License,Fail,2. CITY OF CHICAGO FOOD SERVICE SANITATION CER...,41.722519,-87.604637,"(-87.60463680899302, 41.72251869461573)"
141,2555124,PRIME FISH,PRIME FISH,2463968.0,Restaurant,Risk 1 (High),8022 S HALSTED ST,CHICAGO,IL,60620.0,05/02/2022,Canvass,Fail,"1. PERSON IN CHARGE PRESENT, DEMONSTRATES KNOW...",41.748104,-87.644115,"(-87.64411513161956, 41.74810405559482)"
149,2554905,LAS BRISAS DEL MAR,LAS BRISAS DEL MAR,84625.0,Restaurant,Risk 1 (High),3207 W 51ST ST,CHICAGO,IL,60632.0,04/27/2022,Canvass,Fail,"3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E...",41.800706,-87.704099,"(-87.70409936083796, 41.80070644396386)"
167,2554291,MANA,MANA,2712278.0,Grocery Store,Risk 2 (Medium),434 E 71ST ST,CHICAGO,IL,60619.0,04/13/2022,Complaint,Fail,2. CITY OF CHICAGO FOOD SERVICE SANITATION CER...,41.765844,-87.613915,"(-87.61391496525177, 41.76584421474714)"


### Parsing "Violations" column to select <u>*only regulation descriptions*</u> using regular expression

In [8]:
# Showing Example of Violations Content From Random Index, This Case I Choose Index 20
df_clean.Violations[20]

'9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED - Comments: OBSERVED THE EMPLOYEE SLICING FRESH TOMATOES AND ASSEMBLING READY-TO-EAT LETTUCE, ONION, AND TOMATOES FOR A SALAD WITH BARE HANDS. REVIEWED NO BARE HAND CONTACT WITH READY-TO-EAT FOODS. ALL FOODS DISCARDED. PRIORITY VIOLATION 7-38-010 CITATION ISSUED. | 15. FOOD SEPARATED AND PROTECTED - Comments: OBSERVED RAW CHICKEN ON THE COOK LINE CUTTING BOARD DIRECTLY NEXT TO FRESH READY-TO-EAT TOMATOES AND AVOCADO THAT WERE BEING SLICED AND PREPARED. ALSO OBSERVED WERE RAW EGGS STORED DIRECTLY ON TOP OF READY-TO-EAT TORTILLAS, ROASTED PEPPERS, AND RAW ONIONS IN THE TOP REFRIGERATION AREA OF THE COOKS LINE COOLER. ALL FOODS WERE REMOVED AND DISCARDED. PRIORITY VIOLATION 7-38-005 CITATION ISSUED. | 23. PROPER DATE MARKING AND DISPOSITION - Comments: OBSERVED ALL PREPARED TCS READY TO EAT FOODS THROUGHOUT ALL COOLERS NOT LABELED. NO EXPIRATION DATES OR PREP DATES. FOOD SUCH AS COOKED RICE, FL

In [9]:
# Remove Comments, Leave only  regulation code and description
a = df_clean.Violations.apply(lambda x: re.sub(r'\s?\-\s+Comments.*?\|','',x))
a[20]

'9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED 15. FOOD SEPARATED AND PROTECTED 23. PROPER DATE MARKING AND DISPOSITION 25. CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD 36. THERMOMETERS PROVIDED & ACCURATE 37. FOOD PROPERLY LABELED; ORIGINAL CONTAINER 41. WIPING CLOTHS: PROPERLY USED & STORED 45. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED 57. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING 58. ALLERGEN TRAINING AS REQUIRED 60. PREVIOUS CORE VIOLATION CORRECTED - Comments: PREVIOUS CORE VIOLATIONS FROM REPORT #251

In [10]:
# Remove Comment Residue in The Last Paragraph
b = a.apply(lambda x: re.sub(r'\s?\-\s+Comments.*','',x))
b[20]

'9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED 15. FOOD SEPARATED AND PROTECTED 23. PROPER DATE MARKING AND DISPOSITION 25. CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD 36. THERMOMETERS PROVIDED & ACCURATE 37. FOOD PROPERLY LABELED; ORIGINAL CONTAINER 41. WIPING CLOTHS: PROPERLY USED & STORED 45. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED 57. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING 58. ALLERGEN TRAINING AS REQUIRED 60. PREVIOUS CORE VIOLATION CORRECTED'

In [11]:
# Remove Regulation Code
c = b.apply(lambda x: re.sub(r'\s\d+','', str(x)))
c[20]

'9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED. FOOD SEPARATED AND PROTECTED. PROPER DATE MARKING AND DISPOSITION. CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD. THERMOMETERS PROVIDED & ACCURATE. FOOD PROPERLY LABELED; ORIGINAL CONTAINER. WIPING CLOTHS: PROPERLY USED & STORED. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. NON-FOOD/FOOD CONTACT SURFACES CLEAN. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING. ALLERGEN TRAINING AS REQUIRED. PREVIOUS CORE VIOLATION CORRECTED'

In [12]:
# Remove Regulation Code Residue In The Beginning of Paragraphs
d = c.apply(lambda x: re.sub(r'^\d+\W+ | ^\d?\.', '', x))
d[20]

'NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED. FOOD SEPARATED AND PROTECTED. PROPER DATE MARKING AND DISPOSITION. CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD. THERMOMETERS PROVIDED & ACCURATE. FOOD PROPERLY LABELED; ORIGINAL CONTAINER. WIPING CLOTHS: PROPERLY USED & STORED. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED. NON-FOOD/FOOD CONTACT SURFACES CLEAN. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING. ALLERGEN TRAINING AS REQUIRED. PREVIOUS CORE VIOLATION CORRECTED'

In [101]:
# These codes below are to make sure that other rows besides row 20 are already clened well

# Make Sure To Remove All '|'
e = d.apply(lambda x: re.sub(r'([\|])','', str(x)))

# Make Sure To Remove All Space At The End of Sentences
f = e.apply(lambda x: re.sub('([\w]) ([.])', r'\1\2', str(x)))

# Make Sure To Remove All Space At The Beginning of Sentences
g = f.apply(lambda x: re.sub(r'\^s','', str(x)))

# Remove Space After Dots, So It Can Be Easy To Split (Based On Dots)
desc = g.apply(lambda x: re.sub('([.]) ([\w])', r'\1\2', str(x)))

In [23]:
# Store Description-Only Into DataFrame
df_clean['Description'] = desc
df_clean.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location,Description
20,2561623,LA CHOZA MEXICAN GRILL,LA CHOZA MEXICAN GRILL,1840862.0,Restaurant,Risk 1 (High),7022 N CLARK ST,CHICAGO,IL,60626.0,08/02/2022,Canvass,Fail,9. NO BARE HAND CONTACT WITH RTE FOOD OR A PRE...,42.009698,-87.674274,"(-87.67427432484828, 42.009697980488845)",NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-AP...
135,2555232,HOT SEAFOOD MARKET INC.,HOT SEAFOOD MARKET INC.,2840868.0,Restaurant,Risk 2 (Medium),9454 S COTTAGE GROVE AVE,CHICAGO,IL,60619.0,05/04/2022,License,Fail,2. CITY OF CHICAGO FOOD SERVICE SANITATION CER...,41.722519,-87.604637,"(-87.60463680899302, 41.72251869461573)",CITY OF CHICAGO FOOD SERVICE SANITATION CERTIF...
141,2555124,PRIME FISH,PRIME FISH,2463968.0,Restaurant,Risk 1 (High),8022 S HALSTED ST,CHICAGO,IL,60620.0,05/02/2022,Canvass,Fail,"1. PERSON IN CHARGE PRESENT, DEMONSTRATES KNOW...",41.748104,-87.644115,"(-87.64411513161956, 41.74810405559482)","PERSON IN CHARGE PRESENT, DEMONSTRATES KNOWLED..."
149,2554905,LAS BRISAS DEL MAR,LAS BRISAS DEL MAR,84625.0,Restaurant,Risk 1 (High),3207 W 51ST ST,CHICAGO,IL,60632.0,04/27/2022,Canvass,Fail,"3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E...",41.800706,-87.704099,"(-87.70409936083796, 41.80070644396386)","MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL..."
167,2554291,MANA,MANA,2712278.0,Grocery Store,Risk 2 (Medium),434 E 71ST ST,CHICAGO,IL,60619.0,04/13/2022,Complaint,Fail,2. CITY OF CHICAGO FOOD SERVICE SANITATION CER...,41.765844,-87.613915,"(-87.61391496525177, 41.76584421474714)",CITY OF CHICAGO FOOD SERVICE SANITATION CERTIF...


### Counting how many times each regulation description occurred in the table

In [16]:
from collections import Counter

In [17]:
# Just to Check The Initial Paragraph
df_clean['Description'].iloc[0]

'NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED.FOOD SEPARATED AND PROTECTED.PROPER DATE MARKING AND DISPOSITION.CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD.THERMOMETERS PROVIDED & ACCURATE.FOOD PROPERLY LABELED; ORIGINAL CONTAINER.WIPING CLOTHS: PROPERLY USED & STORED.SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED.FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED.FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED.FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED.NON-FOOD/FOOD CONTACT SURFACES CLEAN.PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN.ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED.ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING.ALLERGEN TRAINING AS REQUIRED.PREVIOUS CORE VIOLATION CORRECTED'

In [18]:
# Split The Description in Each Row
split = df_clean['Description'].str.split(pat='.')
split[20]

['NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED',
 'FOOD SEPARATED AND PROTECTED',
 'PROPER DATE MARKING AND DISPOSITION',
 'CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD',
 'THERMOMETERS PROVIDED & ACCURATE',
 'FOOD PROPERLY LABELED; ORIGINAL CONTAINER',
 'WIPING CLOTHS: PROPERLY USED & STORED',
 'SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED',
 'FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED',
 'FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED',
 'FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED',
 'NON-FOOD/FOOD CONTACT SURFACES CLEAN',
 'PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN',
 'ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED',
 'ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING',
 'ALLERGEN TRAINING AS REQUIRED',
 'PREVIOUS CORE VIOLATION CORRECTED']

In [20]:
# Store Splitted Description Into Current DataFrame
df_split = pd.DataFrame(split)

# Count Each Sentence
count = Counter(df_split['Description'].explode())
count

Counter({'NO BARE HAND CONTACT WITH RTE FOOD OR A PRE-APPROVED ALTERNATIVE PROCEDURE PROPERLY ALLOWED': 155,
         'FOOD SEPARATED AND PROTECTED': 174,
         'PROPER DATE MARKING AND DISPOSITION': 1734,
         'CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD': 1824,
         'THERMOMETERS PROVIDED & ACCURATE': 2550,
         'FOOD PROPERLY LABELED; ORIGINAL CONTAINER': 2337,
         'WIPING CLOTHS: PROPERLY USED & STORED': 1283,
         'SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED': 523,
         'FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED': 5989,
         'NON-FOOD/FOOD CONTACT SURFACES CLEAN': 4904,
         'PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN': 13761,
         'ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED': 4161,
         'ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING': 2869,
         'ALLERGEN TRAINING AS REQUIRED': 3187,
         'PREVIOUS CORE VIOLATION CORRECTED': 2193,
         'CITY OF

### Showing top 10 MOST and LEAST frequent regulation descriptions

### TOP 10 Most Frequent

In [99]:
# Showing top 10 most frequent regulation descriptions
top_ten = pd.DataFrame(count.most_common(10))
top_ten = top_ten.rename(columns={0: "Top 10 # Regulation Descriptions", 1: "Count"})
top_ten['Percentage'] = (top_ten['Count']/df_split.count()[0])*100
pd.set_option('display.max_colwidth', 4000)
top_ten


Unnamed: 0,Top 10 # Regulation Descriptions,Count,Percentage
0,"FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED",19371,44.889115
1,"WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTRUCTED PER CODE: GOOD REPAIR, SURFACES CLEAN AND DUST-LESS CLEANING METHODS",18257,42.307603
2,"FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS",16445,38.10859
3,"NO EVIDENCE OF RODENT OR INSECT OUTER OPENINGS PROTECTED/RODENT PROOFED, A WRITTEN LOG SHALL BE MAINTAINED AVAILABLE TO THE INSPECTORS",16432,38.078465
4,VENTILATION: ROOMS AND EQUIPMENT VENTED AS REQUIRED: PLUMBING: INSTALLED AND MAINTAINED,15529,35.985911
5,"FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED",15019,34.804069
6,"PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN",13761,31.888861
7,"PREMISES MAINTAINED FREE OF LITTER, UNNECESSARY ARTICLES, CLEANING EQUIPMENT PROPERLY STORED",10667,24.719023
8,"LIGHTING: REQUIRED MINIMUM FOOT-CANDLES OF LIGHT PROVIDED, FIXTURES SHIELDED",7736,17.926911
9,"INSECTS, RODENTS, & ANIMALS NOT PRESENT",7346,17.02315


### TOP 10 Least Frequent

In [98]:
#Showing last 10 most frequent regulation descriptions
last_ten = pd.DataFrame(count.most_common()[:-1-10:-1])
last_ten = last_ten.rename(columns={0: "Last 10 # Regulation Descriptions", 1: "Count"})
last_ten['Percentage'] = (last_ten['Count']/df_split.count()[0])*100
pd.set_option('display.max_colwidth', 4000)
last_ten

Unnamed: 0,Last 10 # Regulation Descriptions,Count,Percentage
0,PASTEURIZED EGGS USED WHERE REQUIRED,1,0.002317
1,UNWRAPPED AND POTENTIALLY HAZARDOUS FOOD NOT RE-SERVED,1,0.002317
2,WASHING FRUITS & VEGETABLES,1,0.002317
3,PLANT FOOD PROPERLY COOKED FOR HOT HOLDING,1,0.002317
4,PROPER USE OF RESTRICTION AND EXCLUSION,3,0.006952
5,"NO DISCHARGE FROM EYES, NOSE, AND MOUTH",3,0.006952
6,PROPER COOKING TIME & TEMPERATURES,4,0.009269
7,"DISHES AND UTENSILS FLUSHED, SCRAPED, SOAKED",5,0.011587
8,WATER & ICE FROM APPROVED SOURCE,6,0.013904
9,"PROPER DISPOSITION OF RETURNED, PREVIOUSLY SERVED, RECONDITIONED & UNSAFE FOOD",6,0.013904


### Summary

* The majority factor that causes food inspection rejection is the **quality of floor** in the restaurant or food establishment, it's **almost 45%** cause of failed inspection
* **9 of top 10 frequent** causes are related to **facilites and equipments**, while the rest is about harmful animals
* **Most of top 10 least** are about **process** of preparing or cooking food