# Problem Statement

[An article in the Dallas Observer](https://www.dallasobserver.com/restaurants/dallas-restaurant-inspections-suffer-from-delays-poor-record-keeping-and-overworked-staff-10697588) unearthed a massive problem in the city's ability to follow up on restaurants requiring reinspection due to a low grade upon original inspection.  Dallas states that out of a scale from 1-100, any facility that scores between 70-79 requires reinspection within 30 days, between 60-69 requires reinspection within 10 days, and below 60 requires reinspection ASAP.

The article points out many flaws in the city's ability to reinspect restaurants within its own self-imposed timeframes,.  Until the department can hopefully become better-staffed, I am looking to build a classification model that can predict how a restaurant will perform upon reinspection.  This way, if the city is still struggling to reinspect restaurants in a timely manner, they can refer to the model in order to prioritize certain facilities to reinspect.



In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, GridSearchCV

%matplotlib inline

  return f(*args, **kwds)
  return f(*args, **kwds)


In [4]:
df = pd.read_csv('./data/Restaurant_and_Food_Establishment_Inspections__October_2016_to_Present_.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [5]:
df.head()

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
0,FRESHII,Routine,10/31/2018,96,2414,VICTORY PARK,,LN,,2414 VICTORY PARK LN,...,,,,,,,,Oct 2018,FY2019,"2414 VICTORY PARK LN\n(32.787625, -96.809294)"
1,MICKLE CHICKEN,Routine,10/30/2019,100,3203,CAMP WISDOM,W,RD,,3203 W CAMP WISDOM RD,...,,,,,,,,Oct 2019,FY2020,"3203 W CAMP WISDOM RD\n(32.662584, -96.873446)"
2,WORLD TRADE CENTER MARKET,Routine,11/03/2016,100,2050,STEMMONS,N,FRWY,,2050 N STEMMONS FRWY,...,,,,,,,,Nov 2016,FY2017,"2050 N STEMMONS FRWY\n(32.801934, -96.825878)"
3,DUNKIN DONUTS,Routine,10/30/2019,99,8008,HERB KELLEHER,,WAY,C2174,8008 HERB KELLEHER WAY STE# C2174,...,,,,,,,,Oct 2019,FY2020,8008 HERB KELLEHER WAY STE# C2174
4,CANVAS HOTEL - 6TH FLOOR,Routine,06/11/2018,100,1325,LAMAR,S,ST,,1325 S LAMAR ST,...,,,,,,,,Jun 2018,FY2018,"1325 S LAMAR ST\n(39.69335, -105.067425)"


In [24]:
df.isnull().sum()

Restaurant Name             11
Inspection Type              0
Inspection Date              0
Inspection Score             0
Street Number                0
                         ...  
Violation Detail - 25    44654
Violation Memo - 25      44653
Inspection Month             0
Inspection Year              0
Lat Long Location            0
Length: 114, dtype: int64

Since this project is based on NLP, I will be merging all of the violation detail, description, and memo columns, which should handle the nulls.  Any leftover nulls after that merge likely relate to a restaurant having no violations to note, which is important data.  11 restaurant names are null.  If there is an address given, I will probably keep them.  Additionally, I will merge the address columns with names to help the model account for different locations of the same restaurant.

In [25]:
df.dtypes

Restaurant Name          object
Inspection Type          object
Inspection Date          object
Inspection Score          int64
Street Number             int64
                          ...  
Violation Detail - 25    object
Violation Memo - 25      object
Inspection Month         object
Inspection Year          object
Lat Long Location        object
Length: 114, dtype: object

In [26]:
df.shape

(44656, 114)

In [27]:
df.loc[df['Restaurant Name'].isnull()]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
20592,,Routine,02/21/2018,86,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
21643,,Routine,08/28/2017,87,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
24064,,Routine,07/28/2017,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jul 2017,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
24612,,Routine,08/06/2018,91,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
26713,,Routine,02/02/2017,88,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
33050,,Routine,11/27/2017,80,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2017,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
34370,,Routine,06/13/2018,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jun 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
39616,,Routine,05/22/2018,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43261,,Routine,05/31/2017,91,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2017,FY2017,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43934,,Routine,01/03/2018,84,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jan 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"


In [44]:
df.loc[df['Street Number'] == 4243]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
4333,WILLIAMS CHICKEN,Routine,08/14/2019,94,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2019,FY2019,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
10552,WILLIAMS CHICKEN,Routine,02/12/2020,92,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2020,FY2020,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
14015,WILLIAMS CHICKEN,Routine,02/11/2019,97,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2019,FY2019,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
20592,,Routine,02/21/2018,86,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
21643,,Routine,08/28/2017,87,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
24612,,Routine,08/06/2018,91,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Aug 2018,FY2018,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"
26713,,Routine,02/02/2017,88,4243,WESTMORELAND,S,RD,,4243 S WESTMORELAND RD,...,,,,,,,,Feb 2017,FY2017,"4243 S WESTMORELAND RD\n(32.691613, -96.880689)"


The NaN restaurants could refer to a previous restaurant at the same location, so I may want to avoid imputation.

In [29]:
df.loc[df['Street Number'] == 8686]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
339,DONUT TOWN,Routine,11/23/2018,98,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2018,FY2019,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
32431,DONUT TOWN,Routine,05/22/2018,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
33050,,Routine,11/27/2017,80,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,Nov 2017,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
39616,,Routine,05/22/2018,92,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2018,FY2018,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"
43261,,Routine,05/31/2017,91,8686,FERGUSON,,RD,#210,8686 FERGUSON RD #210,...,,,,,,,,May 2017,FY2017,"8686 FERGUSON RD #210\n(32.812751, -96.698799)"


In [45]:
df.loc[df['Street Number'] == 6449]

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
8299,FRANKIE'S FOOD MART,Routine,12/09/2019,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2019,FY2020,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
17424,FRANKIE'S FOOD MART,Routine,12/04/2018,81,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2018,FY2019,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
24064,,Routine,07/28/2017,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jul 2017,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
34370,,Routine,06/13/2018,87,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jun 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
43934,,Routine,01/03/2018,84,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Jan 2018,FY2018,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"
44450,,Routine,12/20/2016,88,6449,GREENVILLE,,AVE,,6449 GREENVILLE AVE,...,,,,,,,,Dec 2016,FY2017,"6449 GREENVILLE AVE\n(32.863098, -96.767426)"


In [36]:
df.loc[df[df.columns[3:]].duplicated()].sort_values(by = "Inspection Date")

Unnamed: 0,Restaurant Name,Inspection Type,Inspection Date,Inspection Score,Street Number,Street Name,Street Direction,Street Type,Street Unit,Street Address,...,Violation Points - 24,Violation Detail - 24,Violation Memo - 24,Violation Description - 25,Violation Points - 25,Violation Detail - 25,Violation Memo - 25,Inspection Month,Inspection Year,Lat Long Location
11967,LAKEWOOD COUNTY CLUB,Routine,01/04/2018,100,1912,ABRAMS,,ST,,1912 ABRAMS ST,...,,,,,,,,Jan 2018,FY2018,1912 ABRAMS ST
11886,LAKEWOOD COUNTRY CLUB POOL BAR,Routine,01/04/2018,100,1912,ABRAMS,,ST,,1912 ABRAMS ST,...,,,,,,,,Jan 2018,FY2018,1912 ABRAMS ST
10619,WAL-MART-MEAT,Routine,01/06/2020,98,4122,LBJ,,FRWY,,4122 LBJ FRWY,...,,,,,,,,Jan 2020,FY2020,4122 LBJ FRWY
23613,WOLFGANG PUCK CATERING & EVENTS-BAR EAST,Routine,01/09/2017,100,400,HOUSTON,S,ST,3F,400 S HOUSTON ST 3F,...,,,,,,,,Jan 2017,FY2017,"400 S HOUSTON ST 3F\n(32.587214, -96.312451)"
13475,WOLFGANG PUCK CATERING-BUTCHER ROOM,Routine,01/09/2017,100,400,HOUSTON,S,ST,3F,400 S HOUSTON ST 3F,...,,,,,,,,Jan 2017,FY2017,"400 S HOUSTON ST 3F\n(32.587214, -96.312451)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12655,EATZI'S MARKET PRODUCE,Routine,12/30/2016,100,3403,OAK LAWN,,AVE,,3403 OAK LAWN AVE,...,,,,,,,,Dec 2016,FY2017,"3403 OAK LAWN AVE\n(32.81138, -96.806319)"
18935,EATZI'S MARKET BAKERY,Routine,12/30/2016,100,3403,OAK LAWN,,AVE,,3403 OAK LAWN AVE,...,,,,,,,,Dec 2016,FY2017,"3403 OAK LAWN AVE\n(32.81138, -96.806319)"
10636,PRODUCE PREP - PHASE 1,Routine,12/31/2018,100,1515,BUCKNER,S,BLVD,#301,1515 S BUCKNER BLVD #301,...,,,,,,,,Dec 2018,FY2019,"1515 S BUCKNER BLVD #301\n(32.738319, -96.682973)"
14479,EL RANCHO (SEAFOOD)- PHASE1,Routine,12/31/2018,100,1515,BUCKNER,S,BLVD,#301,1515 S BUCKNER BLVD #301,...,,,,,,,,Dec 2018,FY2019,"1515 S BUCKNER BLVD #301\n(32.738319, -96.682973)"


In [48]:
df['Inspection Type'].value_counts()

Routine      43990
Follow-up      641
Complaint       25
Name: Inspection Type, dtype: int64

In [None]:
# Filter on follow-up only
# Match them with their routine
# put them side by side

# add success metric and models to problem statement
# finish EDA