# Problem Statement

[An article in the Dallas Observer](https://www.dallasobserver.com/restaurants/dallas-restaurant-inspections-suffer-from-delays-poor-record-keeping-and-overworked-staff-10697588) unearthed a massive problem in the city's ability to follow up on restaurants requiring reinspection due to a low grade upon original inspection.  Dallas states that out of a scale from 1-100, any facility that scores between 70-79 requires reinspection within 30 days, between 60-69 requires reinspection within 10 days, and below 60 requires reinspection ASAP.

The article points out many flaws in the city's ability to reinspect restaurants within its own self-imposed timeframes,.  Until the department can hopefully become better-staffed, I am looking to build a classification model that can predict how a restaurant will perform upon reinspection.  This way, if the city is still struggling to reinspect restaurants in a timely manner, they can refer to the model in order to prioritize certain facilities to reinspect.



In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, GridSearchCV

%matplotlib inline

  return f(*args, **kwds)
  return f(*args, **kwds)


In [4]:
df = pd.read_csv('./data/Restaurant_and_Food_Establishment_Inspections__October_2016_to_Present_.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [5]:
df.head()

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
0,FRESHII,Routine,10/31/2018,96,2414,VICTORY PARK,,LN,,2414 VICTORY PARK LN,...,,,,,,,,Oct 2018,FY2019,"2414 VICTORY PARK LN\n(32.787625, -96.809294)"
1,MICKLE CHICKEN,Routine,10/30/2019,100,3203,CAMP WISDOM,W,RD,,3203 W CAMP WISDOM RD,...,,,,,,,,Oct 2019,FY2020,"3203 W CAMP WISDOM RD\n(32.662584, -96.873446)"
2,WORLD TRADE CENTER MARKET,Routine,11/03/2016,100,2050,STEMMONS,N,FRWY,,2050 N STEMMONS FRWY,...,,,,,,,,Nov 2016,FY2017,"2050 N STEMMONS FRWY\n(32.801934, -96.825878)"
3,DUNKIN DONUTS,Routine,10/30/2019,99,8008,HERB KELLEHER,,WAY,C2174,8008 HERB KELLEHER WAY STE# C2174,...,,,,,,,,Oct 2019,FY2020,8008 HERB KELLEHER WAY STE# C2174
4,CANVAS HOTEL - 6TH FLOOR,Routine,06/11/2018,100,1325,LAMAR,S,ST,,1325 S LAMAR ST,...,,,,,,,,Jun 2018,FY2018,"1325 S LAMAR ST\n(39.69335, -105.067425)"


In [24]:
df.isnull().sum()

Restaurant Name             11
Inspection Type              0
Inspection Date              0
Inspection Score             0
Street Number                0
                         ...  
Violation Detail - 25    44654
Violation Memo - 25      44653
Inspection Month             0
Inspection Year              0
Lat Long Location            0
Length: 114, dtype: int64

Since this project is based on NLP, I will be merging all of the violation detail, description, and memo columns, which should handle the nulls.  Any leftover nulls after that merge likely relate to a restaurant having no violations to note, which is important data.  11 restaurant names are null.  If there is an address given, I will probably keep them.  Additionally, I will merge the address columns with names to help the model account for different locations of the same restaurant.

In [56]:
df['Inspection Date'] = pd.to_datetime(df['Inspection Date'])

In [58]:
df.shape

(44656, 114)

In [59]:
df.loc[df['Restaurant Name'].isnull()]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
20592,,Routine,2018-02-21,86,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
21643,,Routine,2017-08-28,87,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
24064,,Routine,2017-07-28,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jul 2017,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
24612,,Routine,2018-08-06,91,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
26713,,Routine,2017-02-02,88,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
33050,,Routine,2017-11-27,80,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2017,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
34370,,Routine,2018-06-13,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jun 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
39616,,Routine,2018-05-22,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43261,,Routine,2017-05-31,91,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2017,FY2017,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43934,,Routine,2018-01-03,84,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jan 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"


In [60]:
df.loc[df['Street Number'] == 4243]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
4333,WILLIAMS CHICKEN,Routine,2019-08-14,94,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2019,FY2019,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
10552,WILLIAMS CHICKEN,Routine,2020-02-12,92,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2020,FY2020,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
14015,WILLIAMS CHICKEN,Routine,2019-02-11,97,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2019,FY2019,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
20592,,Routine,2018-02-21,86,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
21643,,Routine,2017-08-28,87,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
24612,,Routine,2018-08-06,91,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
26713,,Routine,2017-02-02,88,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"


The NaN restaurants could refer to a previous restaurant at the same location, so I may want to avoid imputation.

In [61]:
df.loc[df['Street Number'] == 8686]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
339,DONUT TOWN,Routine,2018-11-23,98,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2018,FY2019,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
32431,DONUT TOWN,Routine,2018-05-22,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
33050,,Routine,2017-11-27,80,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2017,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
39616,,Routine,2018-05-22,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43261,,Routine,2017-05-31,91,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2017,FY2017,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"


In [62]:
df.loc[df['Street Number'] == 6449]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
8299,FRANKIE'S FOOD MART,Routine,2019-12-09,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2019,FY2020,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
17424,FRANKIE'S FOOD MART,Routine,2018-12-04,81,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2018,FY2019,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
24064,,Routine,2017-07-28,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jul 2017,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
34370,,Routine,2018-06-13,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jun 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
43934,,Routine,2018-01-03,84,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jan 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
44450,,Routine,2016-12-20,88,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2016,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"


In [63]:
df.loc[df[df.columns[3:]].duplicated()].sort_values(by = "Inspection Date")

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
19659,ONE MAIN PLACE LEVEL B1 KITCHEN,Routine,2016-10-06,100,1201,MAIN,,ST,,1201 MAIN ST,...,,,,,,,,Oct 2016,FY2017,"1201 MAIN ST\n(41.024999, -81.430786)"
26891,ONE MAIN PLACE 2nd FLOOR BAR,Routine,2016-10-06,100,1201,MAIN,,ST,,1201 MAIN ST,...,,,,,,,,Oct 2016,FY2017,"1201 MAIN ST\n(41.024999, -81.430786)"
7273,LIFE CHARTER SCHOOL,Routine,2016-10-25,100,4400,RL THORNTON,S,FRWY,,4400 S RL THORNTON FRWY,...,,,,,,,,Oct 2016,FY2017,"4400 S RL THORNTON FRWY\n(32.691296, -96.822673)"
9049,FIESTA MART #74 BAKERY,Routine,2016-10-26,100,11445,GARLAND,,RD,,11445 GARLAND RD,...,,,,,,,,Oct 2016,FY2017,"11445 GARLAND RD\n(32.850702, -96.681525)"
13715,BAYLOR UNIVERSITY MED CTR DISHROOM,Routine,2016-10-26,100,3500,GASTON,,AVE,,3500 GASTON AVE,...,,,,,,,,Oct 2016,FY2017,"3500 GASTON AVE\n(32.342322, -86.31818)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7041,CC YOUNG (ADULT DAY CARE LEVEL 1),Routine,2020-04-03,100,4849,LAWTHER,W,DR,,4849 W LAWTHER DR,...,,,,,,,,Apr 2020,FY2020,"4849 W LAWTHER DR\n(32.85442, -96.730528)"
8624,CC YOUNG LEVEL 7 (SKILLED NURSING CAFE),Routine,2020-04-03,100,4849,LAWTHER,W,DR,,4849 W LAWTHER DR,...,,,,,,,,Apr 2020,FY2020,"4849 W LAWTHER DR\n(32.85442, -96.730528)"
6979,CC YOUNG LEVEL 8 (SKILLED NURSING CAFE),Routine,2020-04-03,100,4849,LAWTHER,W,DR,,4849 W LAWTHER DR,...,,,,,,,,Apr 2020,FY2020,"4849 W LAWTHER DR\n(32.85442, -96.730528)"
6998,TOM THUMB-BAKERY,Routine,2020-04-08,100,2380,FIELD,N,ST,,2380 N FIELD ST,...,,,,,,,,Apr 2020,FY2020,"2380 N FIELD ST\n(32.78904, -96.806882)"


In [69]:
df = df.sort_values(by = 'Inspection Date')

In [73]:
df.reset_index(inplace = True)

In [64]:
df['Inspection Type'].value_counts()

Routine      43990
Follow-up      641
Complaint       25
Name: Inspection Type, dtype: int64

In [74]:
follow_ups_df = df.loc[df['Inspection Type'] == 'Follow-up']

In [101]:
restaurants = list(follow_ups_df['Restaurant Name'])

In [106]:
follow_ups_df

Unnamed: 0,index,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
51,21062,THAI THAI,Follow-up,2016-10-05,82,1731,GREENVILLE,,AVE,,...,,,,,,,,Oct 2016,FY2017,"1731 GREENVILLE AVE\n(32.812007, -96.770206)"
62,18705,TOMORROW SEAFOOD & CHICKEN,Follow-up,2016-10-05,90,3200,LANCASTER,S,RD,#742A,...,,,,,,,,Oct 2016,FY2017,"3200 S LANCASTER RD #742A\n(32.708127, -96.801..."
140,44578,ZENNA,Follow-up,2016-10-07,80,3950,ROSEMEADE,,PKWY,#100,...,,,,,,,,Oct 2016,FY2017,"3950 ROSEMEADE PKWY #100\n(33.010437, -96.84581)"
141,29436,POPEYES,Follow-up,2016-10-07,94,18311,MARSH,,LN,,...,,,,,,,,Oct 2016,FY2017,"18311 MARSH LN\n(32.999855, -96.855923)"
145,18296,RAVENNA,Follow-up,2016-10-10,85,115,FIELD,S,ST,,...,,,,,,,,Oct 2016,FY2017,"115 S FIELD ST\n(33.235904, -96.795985)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43981,4077,TAQUERIA'S ROJAS,Follow-up,2020-03-09,80,1328,JIM MILLER,N,RD,#101,...,,,,,,,,Mar 2020,FY2020,"1328 N JIM MILLER RD #101\n(32.73556, -96.700079)"
43992,3261,LITTLE KATANA,Follow-up,2020-03-09,80,4525,COLE,,AVE,#160,...,,,,,,,,Mar 2020,FY2020,"4525 COLE AVE #160\n(32.822383, -96.789342)"
44129,8631,FRUTERIA Y NIEVERIA COCO'S,Follow-up,2020-03-16,88,6212,SAMUELL,,BLVD,,...,,,,,,,,Mar 2020,FY2020,"6212 SAMUELL BLVD\n(32.792147, -96.698279)"
44230,7131,RESTAURANT LEYLITA #2,Follow-up,2020-03-19,86,2330,ROYAL,,LN,#600,...,,,,,,,,Mar 2020,FY2020,"2330 ROYAL LN #600\n(32.895564, -96.900756)"


In [67]:
# Filter on follow-up only
# Match them with their routine
# put them side by side

# add success metric and models to problem statement
# finish EDA