_Lambda School Data Science, Unit 2_

# Classification 2 Sprint Challenge: Predict Chicago food inspections 🍔

For this Sprint Challenge, you'll use a dataset with information from inspections of restaurants and other food establishments in Chicago from January 2010 to March 2019. 

[See this PDF](https://data.cityofchicago.org/api/assets/BAD5301B-681A-4202-9D25-51B2CAE672FF) for descriptions of the data elements included in this dataset.

According to [Chicago Department of Public Health — Food Protection Services](https://www.chicago.gov/city/en/depts/cdph/provdrs/healthy_restaurants/svcs/food-protection-services.html), "Chicago is home to 16,000 food establishments like restaurants, grocery stores, bakeries, wholesalers, lunchrooms, mobile food vendors and more. Our business is food safety and sanitation with one goal, to prevent the spread of food-borne disease. We do this by inspecting food businesses, responding to complaints and food recalls." 

#### Your challenge: Predict whether inspections failed

The target is the `Fail` column.

- When the food establishment failed the inspection, the target is `1`.
- When the establishment passed, the target is `0`.

#### Run this cell to load the data:

In [1]:
from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.pipeline import make_pipeline
import eli5
from eli5.sklearn import PermutationImportance
import category_encoders as ce
import pandas as pd


In [2]:
train_url = 'https://drive.google.com/uc?export=download&id=13_tP9JpLcZHSPVpWcua4t2rY44K_s4H5'
test_url  = 'https://drive.google.com/uc?export=download&id=1GkDHjsiGrzOXoF_xcYjdzBTSjOIi3g5a'

train_set = pd.read_csv(train_url)
test_set  = pd.read_csv(test_url)

assert train_set.shape == (51916, 17)
assert test_set.shape  == (17306, 17)

In [3]:
test_set.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Violations,Latitude,Longitude,Location,Fail
0,114835,7 - ELEVEN,7 - ELEVEN,46907.0,Grocery Store,Risk 2 (Medium),600 S DEARBORN,CHICAGO,IL,60605.0,2011-03-22T00:00:00,Canvass,33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSI...,41.874481,-87.629357,"{'longitude': '-87.62935653990546', 'latitude'...",0
1,1575555,TAQUERIA LOS GALLOS INC,TAQUERIA LOS GALLOS,1044860.0,Restaurant,Risk 1 (High),4209-4211 W 26TH ST,CHICAGO,IL,60623.0,2015-09-15T00:00:00,Canvass,"30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABEL...",41.84407,-87.729807,"{'longitude': '-87.72980747367433', 'latitude'...",0
2,671061,TROTTER'S TO GO,TROTTER'S TO GO,1092634.0,Restaurant,Risk 1 (High),1337 W FULLERTON AVE,CHICAGO,IL,60614.0,2012-03-02T00:00:00,Canvass,"34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOO...",41.925128,-87.662041,"{'longitude': '-87.66204067083224', 'latitude'...",0
3,1965844,BIG G'S PIZZA,BIG G'S PIZZA,2334691.0,Restaurant,Risk 1 (High),1132 W TAYLOR ST,CHICAGO,IL,60607.0,2016-10-04T00:00:00,Canvass Re-Inspection,"14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42...",41.869546,-87.655501,"{'longitude': '-87.65550098867566', 'latitude'...",1
4,1751669,SOUTH CENTRAL COMMUNITY SERVICES ELEMENTARY,SOUTH CENTRAL COMMUNITY SERVICES ELEMENTARY,3491970.0,School,Risk 2 (Medium),1021 E 83RD,CHICAGO,IL,60619.0,2016-04-08T00:00:00,Canvass,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,41.743933,-87.599291,"{'longitude': '-87.59929083361996', 'latitude'...",1


In [4]:
features= ['DBA Name',
           'License #',
           'Facility Type',
           'Risk',
           'Address',
           'City',
           'State',
           'Zip',
           'Inspection Type',
           'Latitude',
           'Longitude',
           'Inspection Date',
           'Violations']
target= 'Fail'

In [5]:
train, test = train_test_split(train_set, train_size= 0.8, test_size=0.2, random_state=42)

In [6]:
train, val = train_test_split(train, train_size= 0.8, test_size= 0.2, random_state= 42)

In [7]:
train.shape, val.shape, test.shape

((33225, 17), (8307, 17), (10384, 17))

In [8]:
#targets
train_target= train[target]
val_target= val[target]
test_target= test[target]
test_set_target= test_set[target]

#features
train_features= train[features]
val_features= val[features]
test_features= test[features]
test_set_features= test_set[features]


In [9]:
pipeline= make_pipeline(
            ce.OrdinalEncoder(),
            SimpleImputer(strategy= 'median'))

In [10]:
train_encoded= pipeline.fit_transform(train_features)
val_encoded= pipeline.transform(val_features)
test_encoded=pipeline.transform(test_features)
test_set_encoded=pipeline.transform(test_set_features)

In [11]:
train_encoded= pd.DataFrame(data= train_encoded,columns=train_features.columns)
val_encoded= pd.DataFrame(data= val_encoded,columns=val_features.columns)
test_encoded= pd.DataFrame(data= test_encoded,columns=test_features.columns)
test_set_encoded= pd.DataFrame(data= test_set_encoded,columns=test_set_features.columns)

In [12]:
train_encoded.shape, val_encoded.shape, test_encoded.shape, test_set_encoded.shape

((33225, 13), (8307, 13), (10384, 13), (17306, 13))

In [13]:
model= XGBClassifier(random_state=42, early_stopping=10)

In [14]:
model.fit(train_encoded, train_target)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, early_stopping=10, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=42, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=True,
       subsample=1)

In [15]:
val_pred_proba= model.predict_proba(val_encoded)[:,-1]
test_pred_proba= model.predict_proba(test_encoded)[:,-1]

In [16]:
val_roc_auc= roc_auc_score(val_target, val_pred_proba)
test_roc_auc= roc_auc_score(test_target, test_pred_proba)

print(f'Val Roc Auc: {val_roc_auc}\nTest Roc Auc: {test_roc_auc}')

Val Roc Auc: 0.6687183436936255
Test Roc Auc: 0.6592212549153056


In [17]:
permuter= PermutationImportance(model, cv='prefit', n_iter=3, random_state= 42)
permuter.fit(val_encoded, val_target)

PermutationImportance(cv='prefit',
           estimator=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, early_stopping=10, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=42, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=True,
       subsample=1),
           n_iter=3, random_state=42, refit=True, scoring=None)

In [18]:
eli5.show_weights(permuter, top=None, feature_names= val_encoded.columns.tolist())

Weight,Feature
0.0012  ± 0.0003,Facility Type
0.0011  ± 0.0000,Inspection Type
0.0004  ± 0.0002,Latitude
0.0003  ± 0.0009,License #
0.0001  ± 0.0001,Risk
0.0001  ± 0.0002,Violations
0.0000  ± 0.0002,Zip
0  ± 0.0000,Longitude
0  ± 0.0000,State
0  ± 0.0000,City


In [19]:
print('Shapes before removing features:', train_encoded.shape, val_encoded.shape, test_encoded.shape)
 
mask = permuter.feature_importances_ > 0
features = train_encoded.columns[mask]

train_final = train_encoded[features]
val_final = val_encoded[features]
test_final= test_encoded[features]
test_set_final= test_set_encoded[features]

print('Shapes after removing features:', train_final.shape, val_final.shape, test_final.shape)

Shapes before removing features: (33225, 13) (8307, 13) (10384, 13)
Shapes after removing features: (33225, 7) (8307, 7) (10384, 7)


In [20]:
model.fit(train_final, train_target)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, early_stopping=10, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=42, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=True,
       subsample=1)

In [21]:
val_pred_proba= model.predict_proba(val_final)[:,-1]
test_pred_proba= model.predict_proba(test_final)[:,-1]

In [22]:
val_roc_auc= roc_auc_score(val_target, val_pred_proba)
test_roc_auc= roc_auc_score(test_target, test_pred_proba)

print(f'Val Roc Auc: {val_roc_auc}Test Roc Auc: {test_roc_auc}')

Val Roc Auc: 0.6696428243049328Test Roc Auc: 0.6614490226104053


In [23]:
test_set_pred_proba= model.predict_proba(test_set_final)[:,-1]

In [24]:
final_test_roc_auc= roc_auc_score(test_set_target, test_set_pred_proba)

print(f'Final Test Roc Auc: {final_test_roc_auc}')

Final Test Roc Auc: 0.6705840559852209


### Part 1: Preprocessing

You may choose which features you want to use, and whether/how you will preprocess them. If you use categorical features, you may use any tools and techniques for encoding. (Pandas, category_encoders, sklearn.preprocessing, or any other library.)

_To earn a score of 3 for this part, find and explain leakage. The dataset has a feature that will give you an ROC AUC score > 0.90 if you process and use the feature. Find the leakage and explain why the feature shouldn't be used in a real-world model to predict the results of future inspections._

The inspection Id i believe is the leakage. When running all the data through, inspection id had the highest permutation importance. Which wouldn't make logical sense that the id would be a big deciding factor for whether or not the inspection failed. Once i removed the id and the other redundant columns, the permutation importances were much more logical. 

### Part 2: Modeling

**Fit a model** with the train set. (You may use scikit-learn, xgboost, or any other library.) Use cross-validation or do a three-way split (train/validate/test) and **estimate your ROC AUC** validation score.

Use your model to **predict probabilities** for the test set. **Get an ROC AUC test score >= 0.60.**

_To earn a score of 3 for this part, get an ROC AUC test score >= 0.70 (without using the feature with leakage)._


### Part 3: Visualization

Make one visualization for model interpretation. (You may use any libraries.) Choose one of these types:

- Feature Importances
- Permutation Importances
- Partial Dependence Plot
- Shapley Values

_To earn a score of 3 for this part, make at least two of these visualization types._

### Part 4: Gradient Descent

Answer both of these two questions:

- What does Gradient Descent seek to minimize?

It seeks to minimize the cost function/ minimize error

- What is the "Learning Rate" and what is its function?

Learning rate is how much the you change the weights of the cost function depending on steepest decent(negative gradient). Learning rate can also be thought of as the percentage you move relative to the distance from the local minimum to the local minimum. 

One sentence is sufficient for each.

_To earn a score of 3 for this part, go above and beyond. Show depth of understanding and mastery of intuition in your answers._