<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-libs..." data-toc-modified-id="Import-libs...-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import libs...</a></span></li><li><span><a href="#Load-the-data-from-tsv..." data-toc-modified-id="Load-the-data-from-tsv...-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load the data from tsv...</a></span></li><li><span><a href="#Condition-the-data-in-prep-for-modeling..." data-toc-modified-id="Condition-the-data-in-prep-for-modeling...-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Condition the data in prep for modeling...</a></span><ul class="toc-item"><li><span><a href="#Drop-records-with-null-in-the-lead_result_max_bucket" data-toc-modified-id="Drop-records-with-null-in-the-lead_result_max_bucket-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Drop records with null in the lead_result_max_bucket</a></span></li><li><span><a href="#Replace-text-values-with-numerical-representations" data-toc-modified-id="Replace-text-values-with-numerical-representations-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Replace text values with numerical representations</a></span></li><li><span><a href="#Assign-the-all-feature-name-list-minus-the-lead-predictor-fields-(757)" data-toc-modified-id="Assign-the-all-feature-name-list-minus-the-lead-predictor-fields-(757)-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Assign the all feature name list minus the lead predictor fields (757)</a></span></li><li><span><a href="#Convert-X-and-y-to-numeric-values" data-toc-modified-id="Convert-X-and-y-to-numeric-values-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Convert X and y to numeric values</a></span></li></ul></li><li><span><a href="#Split-into-train-and-test-sets..." data-toc-modified-id="Split-into-train-and-test-sets...-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Split into train and test sets...</a></span></li><li><span><a href="#Load-a-previous-model-from-local..." data-toc-modified-id="Load-a-previous-model-from-local...-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Load a previous model from local...</a></span></li><li><span><a href="#Model-using-a-default-XGBoost-for-purposes-of-understanding-the-features..." data-toc-modified-id="Model-using-a-default-XGBoost-for-purposes-of-understanding-the-features...-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Model using a default XGBoost for purposes of understanding the features...</a></span><ul class="toc-item"><li><span><a href="#Fit-the-model-on-all-features-in-the-training-data" data-toc-modified-id="Fit-the-model-on-all-features-in-the-training-data-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Fit the model on all features in the training data</a></span></li><li><span><a href="#Make-predictions-for-test-data-and-evaluate" data-toc-modified-id="Make-predictions-for-test-data-and-evaluate-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Make predictions for test data and evaluate</a></span></li><li><span><a href="#Fit-the-model-using-each-importance-as-a-threshold" data-toc-modified-id="Fit-the-model-using-each-importance-as-a-threshold-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Fit the model using each importance as a threshold</a></span></li></ul></li></ul></div>

## Import libs...

In [1]:
import pickle
import collections
import pandas as pd
import numpy as np
from numpy import sort
from numpy import unique
from sqlalchemy import create_engine
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlxtend.evaluate import lift_score
from sklearn.metrics import confusion_matrix

## Load the data from tsv...

Note previous code versions acquired the data directly from an AWS RDS Postgres instance table. The table was exported as a tsv for future user convenience and accessibility.

In [2]:
df = pd.DataFrame.from_csv('../data/acs_chem_lead_joined.tsv', sep='\t')

In [3]:
df.shape

(8057, 769)

## Condition the data in prep for modeling...

### Drop records with null in the lead_result_max_bucket

In [7]:
df_cond = df.dropna(subset=['lead_result_max_bucket'])
df_cond.shape

(1241, 769)

### Replace text values with numerical representations

In [8]:
X = df_cond.replace(to_replace='(X)', value=1) \
      .replace('-',np.nan) \
      .replace('N',np.nan) \
      .replace('250,000+','250000') \
      .replace('3,500+','3500') \
      .replace('1,500+','1500') \
      .replace('4,000+','4000') \
      .replace('2,000,000+','2000000') \
      .replace('9.0+','9') \
      .replace('100-','100') \
      .fillna(0)

In [9]:
X.shape

(1241, 769)

### Assign the all feature name list minus the lead predictor fields (757)

In [10]:
feature_names_all = [
	'est_occ_tot_housing_units',
	'pct_occ_tot_housing_units',
	'est_occ_tot_housing_units_occ_unts',
	'pct_occ_tot_housing_units_occ_unts',
	'est_occ_tot_housing_units_vac_housing_units',
	'pct_occ_tot_housing_units_vac_housing_units',
	'est_occ_homeowner_vac_rate',
	'pct_occ_homeowner_vac_rate',
	'est_occ_rental_vac_rate',
	'pct_occ_rental_vac_rate',
	'est_units_in_struct',
	'pct_units_in_struct',
	'est_units_in_struct_1_unit_detached',
	'pct_units_in_struct_1_unit_detached',
	'est_units_in_struct_1_unit_attached',
	'pct_units_in_struct_1_unit_attached',
	'est_units_in_struct_2_units',
	'pct_units_in_struct_2_units',
	'est_units_in_struct_3_or_4_units',
	'pct_units_in_struct_3_or_4_units',
	'est_units_in_struct_5_to_9_units',
	'pct_units_in_struct_5_to_9_units',
	'est_units_in_struct_10_to_19_units',
	'pct_units_in_struct_10_to_19_units',
	'est_units_in_struct_20_or_more_units',
	'pct_units_in_struct_20_or_more_units',
	'est_units_in_struct_mobile_home',
	'pct_units_in_struct_mobile_home',
	'est_units_in_struct_boat_rv_van_etc',
	'pct_units_in_struct_boat_rv_van_etc',
	'est_yr_blt_units',
	'pct_yr_blt_units',
	'est_yr_blt_units_built_2014_or_later',
	'pct_yr_blt_units_built_2014_or_later',
	'est_yr_blt_units_built_2010_to_2013',
	'pct_yr_blt_units_built_2010_to_2013',
	'est_yr_blt_units_built_2000_to_2009',
	'pct_yr_blt_units_built_2000_to_2009',
	'est_yr_blt_units_built_1990_to_1999',
	'pct_yr_blt_units_built_1990_to_1999',
	'est_yr_blt_units_built_1980_to_1989',
	'pct_yr_blt_units_built_1980_to_1989',
	'est_yr_blt_units_built_1970_to_1979',
	'pct_yr_blt_units_built_1970_to_1979',
	'est_yr_blt_units_built_1960_to_1969',
	'pct_yr_blt_units_built_1960_to_1969',
	'est_yr_blt_units_built_1950_to_1959',
	'pct_yr_blt_units_built_1950_to_1959',
	'est_yr_blt_units_built_1940_to_1949',
	'pct_yr_blt_units_built_1940_to_1949',
	'est_yr_blt_units_built_1939_or_earlier',
	'pct_yr_blt_units_built_1939_or_earlier',
	'est_rooms_units',
	'pct_rooms_units',
	'est_rooms_units_1_room',
	'pct_rooms_units_1_room',
	'est_rooms_units_2_rooms',
	'pct_rooms_units_2_rooms',
	'est_rooms_units_3_rooms',
	'pct_rooms_units_3_rooms',
	'est_rooms_units_4_rooms',
	'pct_rooms_units_4_rooms',
	'est_rooms_units_5_rooms',
	'pct_rooms_units_5_rooms',
	'est_rooms_units_6_rooms',
	'pct_rooms_units_6_rooms',
	'est_rooms_units_7_rooms',
	'pct_rooms_units_7_rooms',
	'est_rooms_units_8_rooms',
	'pct_rooms_units_8_rooms',
	'est_rooms_units_9_rooms_or_more',
	'pct_rooms_units_9_rooms_or_more',
	'est_rooms_units_median_rooms',
	'pct_rooms_units_median_rooms',
	'est_bedrooms_units',
	'pct_bedrooms_units',
	'est_bedrooms_units_no_bedroom',
	'pct_bedrooms_units_no_bedroom',
	'est_bedrooms_units_1_bedroom',
	'pct_bedrooms_units_1_bedroom',
	'est_bedrooms_units_2_bedrooms',
	'pct_bedrooms_units_2_bedrooms',
	'est_bedrooms_units_3_bedrooms',
	'pct_bedrooms_units_3_bedrooms',
	'est_bedrooms_units_4_bedrooms',
	'pct_bedrooms_units_4_bedrooms',
	'est_bedrooms_units_5_or_more_bedrooms',
	'pct_bedrooms_units_5_or_more_bedrooms',
	'est_housing_tenure_occ_unts',
	'pct_housing_tenure_occ_unts',
	'est_housing_tenure_occ_unts_owner_occupied',
	'pct_housing_tenure_occ_unts_owner_occupied',
	'est_housing_tenure_occ_unts_renter_occupied',
	'pct_housing_tenure_occ_unts_renter_occupied',
	'est_housing_tenure_avg_houshld_of_owner_occupied_unit',
	'pct_housing_tenure_avg_houshld_of_owner_occupied_unit',
	'est_housing_tenure_avg_houshld_of_renter_occupied_unit',
	'pct_housing_tenure_avg_houshld_of_renter_occupied_unit',
	'est_year_householder_moved_into_unit_occ_unts',
	'pct_year_householder_moved_into_unit_occ_unts',
	'est_year_householder_moved_into_unit_occ_unts_2015_or_later',
	'pct_year_householder_moved_into_unit_occ_unts_2015_or_later',
	'est_year_householder_moved_into_unit_occ_unts_2010_to_2014',
	'pct_year_householder_moved_into_unit_occ_unts_2010_to_2014',
	'est_year_householder_moved_into_unit_occ_unts_2000_to_2009',
	'pct_year_householder_moved_into_unit_occ_unts_2000_to_2009',
	'est_year_householder_moved_into_unit_occ_unts_1990_to_1999',
	'pct_year_householder_moved_into_unit_occ_unts_1990_to_1999',
	'est_year_householder_moved_into_unit_occ_unts_1980_to_1989',
	'pct_year_householder_moved_into_unit_occ_unts_1980_to_1989',
	'est_year_householder_moved_into_unit_occ_unts_1979prior',
	'pct_year_householder_moved_into_unit_occ_unts_1979prior',
	'est_vehicles_occ_unts',
	'pct_vehicles_occ_unts',
	'est_vehicles_occ_unts_no_vehicles',
	'pct_vehicles_occ_unts_no_vehicles',
	'est_vehicles_occ_unts_1_vehicles',
	'pct_vehicles_occ_unts_1_vehicles',
	'est_vehicles_occ_unts_2_vehicles',
	'pct_vehicles_occ_unts_2_vehicles',
	'est_vehicles_occ_unts_3_or_more_vehicles',
	'pct_vehicles_occ_unts_3_or_more_vehicles',
	'est_house_heating_fuel_occ_unts',
	'pct_house_heating_fuel_occ_unts',
	'est_house_heating_fuel_occ_unts_utility_gas',
	'pct_house_heating_fuel_occ_unts_utility_gas',
	'est_house_heating_fuel_occ_unts_bottled_tank_or_lp_gas',
	'pct_house_heating_fuel_occ_unts_bottled_tank_or_lp_gas',
	'est_house_heating_fuel_occ_unts_electricity',
	'pct_house_heating_fuel_occ_unts_electricity',
	'est_house_heating_fuel_occ_unts_fuel_oil_kerosene_etc',
	'pct_house_heating_fuel_occ_unts_fuel_oil_kerosene_etc',
	'est_house_heating_fuel_occ_unts_coal_or_coke',
	'pct_house_heating_fuel_occ_unts_coal_or_coke',
	'est_house_heating_fuel_occ_unts_wood',
	'pct_house_heating_fuel_occ_unts_wood',
	'est_house_heating_fuel_occ_unts_solar_energy',
	'pct_house_heating_fuel_occ_unts_solar_energy',
	'est_house_heating_fuel_occ_unts_other_fuel',
	'pct_house_heating_fuel_occ_unts_other_fuel',
	'est_house_heating_fuel_occ_unts_no_fuel_used',
	'pct_house_heating_fuel_occ_unts_no_fuel_used',
	'est_selected_char_occ_unts',
	'pct_selected_char_occ_unts',
	'est_selected_char_occ_unts_lacking_plumbing_fac',
	'pct_selected_char_occ_unts_lacking_plumbing_fac',
	'est_selected_char_occ_unts_lacking_kitchen_fac',
	'pct_selected_char_occ_unts_lacking_kitchen_fac',
	'est_selected_char_occ_unts_no_teleph',
	'pct_selected_char_occ_unts_no_teleph',
	'est_occupants_per_room_occ_unts',
	'pct_occupants_per_room_occ_unts',
	'est_occupants_per_room_occ_unts_100_or_less',
	'pct_occupants_per_room_occ_unts_100_or_less',
	'est_occupants_per_room_occ_unts_101_to_150',
	'pct_occupants_per_room_occ_unts_101_to_150',
	'est_occupants_per_room_occ_unts_151_or_more',
	'pct_occupants_per_room_occ_unts_151_or_more',
	'est_value_own_occ',
	'pct_value_own_occ',
	'est_value_own_occ_less_than_50000',
	'pct_value_own_occ_less_than_50000',
	'est_value_own_occ_50000_to_99999',
	'pct_value_own_occ_50000_to_99999',
	'est_value_own_occ_100000_to_149999',
	'pct_value_own_occ_100000_to_149999',
	'est_value_own_occ_150000_to_199999',
	'pct_value_own_occ_150000_to_199999',
	'est_value_own_occ_200000_to_299999',
	'pct_value_own_occ_200000_to_299999',
	'est_value_own_occ_300000_to_499999',
	'pct_value_own_occ_300000_to_499999',
	'est_value_own_occ_500000_to_999999',
	'pct_value_own_occ_500000_to_999999',
	'est_value_own_occ_1000000_or_more',
	'pct_value_own_occ_1000000_or_more',
	'est_value_own_occ_median_dollars',
	'pct_value_own_occ_median_dollars',
	'est_mortgage_status_own_occ',
	'pct_mortgage_status_own_occ',
	'est_mortgage_status_own_occ_unit_with_mort',
	'pct_mortgage_status_own_occ_unit_with_mort',
	'est_mortgage_status_own_occ_unit_without_mort',
	'pct_mortgage_status_own_occ_unit_without_mort',
	'est_month_ownr_cst_smoc_unit_with_mort',
	'pct_month_ownr_cst_smoc_unit_with_mort',
	'est_month_ownr_cst_smoc_unit_with_mort_less_than_500',
	'pct_month_ownr_cst_smoc_unit_with_mort_less_than_500',
	'est_month_ownr_cst_smoc_unit_with_mort_500_to_999',
	'pct_month_ownr_cst_smoc_unit_with_mort_500_to_999',
	'est_month_ownr_cst_smoc_unit_with_mort_1000_to_1499',
	'pct_month_ownr_cst_smoc_unit_with_mort_1000_to_1499',
	'est_month_ownr_cst_smoc_unit_with_mort_1500_to_1999',
	'pct_month_ownr_cst_smoc_unit_with_mort_1500_to_1999',
	'est_month_ownr_cst_smoc_unit_with_mort_2000_to_2499',
	'pct_month_ownr_cst_smoc_unit_with_mort_2000_to_2499',
	'est_month_ownr_cst_smoc_unit_with_mort_2500_to_2999',
	'pct_month_ownr_cst_smoc_unit_with_mort_2500_to_2999',
	'est_month_ownr_cst_smoc_unit_with_mort_3000_or_more',
	'pct_month_ownr_cst_smoc_unit_with_mort_3000_or_more',
	'est_month_ownr_cst_smoc_unit_with_mort_median_dollars',
	'pct_month_ownr_cst_smoc_unit_with_mort_median_dollars',
	'est_month_ownr_cst_smoc_unit_without_mort',
	'pct_month_ownr_cst_smoc_unit_without_mort',
	'est_month_ownr_cst_smoc_unit_without_mort_less_than_250',
	'pct_month_ownr_cst_smoc_unit_without_mort_less_than_250',
	'est_month_ownr_cst_smoc_unit_without_mort_250_to_399',
	'pct_month_ownr_cst_smoc_unit_without_mort_250_to_399',
	'est_month_ownr_cst_smoc_unit_without_mort_400_to_599',
	'pct_month_ownr_cst_smoc_unit_without_mort_400_to_599',
	'est_month_ownr_cst_smoc_unit_without_mort_600_to_799',
	'pct_month_ownr_cst_smoc_unit_without_mort_600_to_799',
	'est_month_ownr_cst_smoc_unit_without_mort_800_to_999',
	'pct_month_ownr_cst_smoc_unit_without_mort_800_to_999',
	'est_month_ownr_cst_smoc_unit_without_mort_1000_or_more',
	'pct_month_ownr_cst_smoc_unit_without_mort_1000_or_more',
	'est_month_ownr_cst_smoc_unit_without_mort_median_dollars',
	'pct_month_ownr_cst_smoc_unit_without_mort_median_dollars',
	'est_smocapi_unit_with_mort',
	'pct_smocapi_unit_with_mort',
	'est_smocapi_unit_with_mort_less_than_200_pct',
	'pct_smocapi_unit_with_mort_less_than_200_pct',
	'est_smocapi_unit_with_mort_200_to_249_pct',
	'pct_smocapi_unit_with_mort_200_to_249_pct',
	'est_smocapi_unit_with_mort_250_to_299_pct',
	'pct_smocapi_unit_with_mort_250_to_299_pct',
	'est_smocapi_unit_with_mort_300_to_349_pct',
	'pct_smocapi_unit_with_mort_300_to_349_pct',
	'est_smocapi_unit_with_mort_350_pct_or_more',
	'pct_smocapi_unit_with_mort_350_pct_or_more',
	'est_smocapi_unit_nomortg',
	'pct_smocapi_unit_nomortg',
	'est_smocapi_unit_nomortg_less_than_100_pct',
	'pct_smocapi_unit_nomortg_less_than_100_pct',
	'est_smocapi_unit_nomortg_100_to_149_pct',
	'pct_smocapi_unit_nomortg_100_to_149_pct',
	'est_smocapi_unit_nomortg_150_to_199_pct',
	'pct_smocapi_unit_nomortg_150_to_199_pct',
	'est_smocapi_unit_nomortg_200_to_249_pct',
	'pct_smocapi_unit_nomortg_200_to_249_pct',
	'est_smocapi_unit_nomortg_250_to_299_pct',
	'pct_smocapi_unit_nomortg_250_to_299_pct',
	'est_smocapi_unit_nomortg_300_to_349_pct',
	'pct_smocapi_unit_nomortg_300_to_349_pct',
	'est_smocapi_unit_nomortg_350_pct_or_more',
	'pct_smocapi_unit_nomortg_350_pct_or_more',
	'est_gross_rent_occ_unt_paying_rnt',
	'pct_gross_rent_occ_unt_paying_rnt',
	'est_gross_rent_occ_unt_paying_rnt_less_than_500',
	'pct_gross_rent_occ_unt_paying_rnt_less_than_500',
	'est_gross_rent_occ_unt_paying_rnt_500_to_999',
	'pct_gross_rent_occ_unt_paying_rnt_500_to_999',
	'est_gross_rent_occ_unt_paying_rnt_1000_to_1499',
	'pct_gross_rent_occ_unt_paying_rnt_1000_to_1499',
	'est_gross_rent_occ_unt_paying_rnt_1500_to_1999',
	'pct_gross_rent_occ_unt_paying_rnt_1500_to_1999',
	'est_gross_rent_occ_unt_paying_rnt_2000_to_2499',
	'pct_gross_rent_occ_unt_paying_rnt_2000_to_2499',
	'est_gross_rent_occ_unt_paying_rnt_2500_to_2999',
	'pct_gross_rent_occ_unt_paying_rnt_2500_to_2999',
	'est_gross_rent_occ_unt_paying_rnt_3000_or_more',
	'pct_gross_rent_occ_unt_paying_rnt_3000_or_more',
	'est_gross_rent_occ_unt_paying_rnt_median_dollars',
	'pct_gross_rent_occ_unt_paying_rnt_median_dollars',
	'est_gross_rent_no_rent_paid',
	'pct_gross_rent_no_rent_paid',
	'est_grapi_occ_unt_paying_rnt_excl',
	'pct_grapi_occ_unt_paying_rnt_excl',
	'est_grapi_occ_unt_paying_rnt_excl_less_than_150_pct',
	'pct_grapi_occ_unt_paying_rnt_excl_less_than_150_pct',
	'est_grapi_occ_unt_paying_rnt_excl_150_to_199_pct',
	'pct_grapi_occ_unt_paying_rnt_excl_150_to_199_pct',
	'est_grapi_occ_unt_paying_rnt_excl_200_to_249_pct',
	'pct_grapi_occ_unt_paying_rnt_excl_200_to_249_pct',
	'est_grapi_occ_unt_paying_rnt_excl_250_to_299_pct',
	'pct_grapi_occ_unt_paying_rnt_excl_250_to_299_pct',
	'est_grapi_occ_unt_paying_rnt_excl_300_to_349_pct',
	'pct_grapi_occ_unt_paying_rnt_excl_300_to_349_pct',
	'est_grapi_occ_unt_paying_rnt_excl_350_pct_or_more',
	'pct_grapi_occ_unt_paying_rnt_excl_350_pct_or_more',
	'est_grapi_not_computed',
	'pct_grapi_not_computed',
	'est_hshld_tot_hshld',
	'pct_hshld_tot_hshld',
	'est_hshld_tot_fam',
	'pct_hshld_tot_fam',
	'est_hshld_tot_fam_w_child_u18_yrs',
	'pct_hshld_tot_fam_w_child_u18_yrs',
	'est_hshld_tot_fam_married_couple_family',
	'pct_hshld_tot_fam_married_couple_family',
	'est_hshld_tot_fam_married_couple_family_w_child_u18_yrs',
	'pct_hshld_tot_fam_married_couple_family_w_child_u18_yrs',
	'est_hshld_tot_fam_male_sngl',
	'pct_hshld_tot_fam_male_sngl',
	'est_hshld_tot_fam_male_sngl_w_child_u18_yrs',
	'pct_hshld_tot_fam_male_sngl_w_child_u18_yrs',
	'est_hshld_tot_fam_fem_sngl',
	'pct_hshld_tot_fam_fem_sngl',
	'est_hshld_tot_fam_fem_sngl_w_child_u18_yrs',
	'pct_hshld_tot_fam_fem_sngl_w_child_u18_yrs',
	'est_hshld_tot_hshld_nonfamily_hshld',
	'pct_hshld_tot_hshld_nonfamily_hshld',
	'est_hshld_tot_alone',
	'pct_hshld_tot_alone',
	'est_hshld_tot_alone_65_yrs_plus',
	'pct_hshld_tot_alone_65_yrs_plus',
	'est_hshld_hshld_w_u18_yrs',
	'pct_hshld_hshld_w_u18_yrs',
	'est_hshld_hshld_w_65_yrs_plus',
	'pct_hshld_hshld_w_65_yrs_plus',
	'est_hshld_average_household_size',
	'pct_hshld_average_household_size',
	'est_hshld_average_family_size',
	'pct_hshld_average_family_size',
	'est_relationship_pop_in_hshld',
	'pct_relationship_pop_in_hshld',
	'est_relationship_pop_in_hshld_householder',
	'pct_relationship_pop_in_hshld_householder',
	'est_relationship_pop_in_hshld_spouse',
	'pct_relationship_pop_in_hshld_spouse',
	'est_relationship_pop_in_hshld_child',
	'pct_relationship_pop_in_hshld_child',
	'est_relationship_pop_in_hshld_other_rel',
	'pct_relationship_pop_in_hshld_other_rel',
	'est_relationship_pop_in_hshld_nonrel',
	'pct_relationship_pop_in_hshld_nonrel',
	'est_relationship_pop_in_hshld_nonrel_unmarr_partner',
	'pct_relationship_pop_in_hshld_nonrel_unmarr_partner',
	'est_fert_15_to_50_yrs_birth',
	'pct_fert_15_to_50_yrs_birth',
	'est_fert_15_to_50_yrs_birth_unmarr',
	'pct_fert_15_to_50_yrs_birth_unmarr',
	'est_fert_15_to_50_yrs_birth_unmarr_per_1000_unmarr_women',
	'pct_fert_15_to_50_yrs_birth_unmarr_per_1000_unmarr_women',
	'est_fert_15_to_50_yrs_birth_per_1000_women_15_to_50_yrs',
	'pct_fert_15_to_50_yrs_birth_per_1000_women_15_to_50_yrs',
	'est_fert_15_to_50_yrs_birth_per_1000_women_15_to_19_yrs',
	'pct_fert_15_to_50_yrs_birth_per_1000_women_15_to_19_yrs',
	'est_fert_15_to_50_yrs_birth_per_1000_women_20_to_34_yrs',
	'pct_fert_15_to_50_yrs_birth_per_1000_women_20_to_34_yrs',
	'est_fert_15_to_50_yrs_birth_per_1000_women_35_to_50_yrs',
	'pct_fert_15_to_50_yrs_birth_per_1000_women_35_to_50_yrs',
	'est_school_enroll_pop_3_yrs_plus_',
	'pct_school_enroll_pop_3_yrs_plus_',
	'est_school_enroll_pop_3_yrs_plus_nursery_school_preschool',
	'pct_school_enroll_pop_3_yrs_plus_nursery_school_preschool',
	'est_school_enroll_pop_3_yrs_plus_kindergarten',
	'pct_school_enroll_pop_3_yrs_plus_kindergarten',
	'est_school_enroll_pop_3_yrs_plus_elem_gr_1_8',
	'pct_school_enroll_pop_3_yrs_plus_elem_gr_1_8',
	'est_school_enroll_pop_3_yrs_plus_high_school_gr_9_12',
	'pct_school_enroll_pop_3_yrs_plus_high_school_gr_9_12',
	'est_school_enroll_pop_3_yrs_plus_college_or_graduate_school',
	'pct_school_enroll_pop_3_yrs_plus_college_or_graduate_school',
	'est_educ_pop_25_yrs_plus',
	'pct_educ_pop_25_yrs_plus',
	'est_educ_pop_25_yrs_plus_less_9th_grade',
	'pct_educ_pop_25_yrs_plus_less_9th_grade',
	'est_educ_pop_25_yrs_plus_9th_to_12th_grade_no_diploma',
	'pct_educ_pop_25_yrs_plus_9th_to_12th_grade_no_diploma',
	'est_educ_pop_25_yrs_plus_hs',
	'pct_educ_pop_25_yrs_plus_hs',
	'est_educ_pop_25_yrs_plus_some_college_no_degree',
	'pct_educ_pop_25_yrs_plus_some_college_no_degree',
	'est_educ_pop_25_yrs_plus_associates_degree',
	'pct_educ_pop_25_yrs_plus_associates_degree',
	'est_educ_pop_25_yrs_plus_bachelors_degree',
	'pct_educ_pop_25_yrs_plus_bachelors_degree',
	'est_educ_pop_25_yrs_plus_graduate_or_professional_degree',
	'pct_educ_pop_25_yrs_plus_graduate_or_professional_degree',
	'est_educ_pct_high_school_graduate_or_higher',
	'pct_educ_pct_high_school_graduate_or_higher',
	'est_educ_pct_bachelors_degree_or_higher',
	'pct_educ_pct_bachelors_degree_or_higher',
	'est_disability_tot_civilian_noninstitutionalized_pop',
	'pct_disability_tot_civilian_noninstitutionalized_pop',
	'est_disability_tot_disabled',
	'pct_disability_tot_disabled',
	'est_disability_u18_yrs',
	'pct_disability_u18_yrs',
	'est_disability_u18_yrs_with_a_disability',
	'pct_disability_u18_yrs_with_a_disability',
	'est_disability_18_to_64_yrs',
	'pct_disability_18_to_64_yrs',
	'est_disability_18_to_64_yrs_with_a_disability',
	'pct_disability_18_to_64_yrs_with_a_disability',
	'est_disability_65_yrs_plus',
	'pct_disability_65_yrs_plus',
	'est_disability_65_yrs_plus_with_a_disability',
	'pct_disability_65_yrs_plus_with_a_disability',
	'est_res_pop_1_yr_plus',
	'pct_res_pop_1_yr_plus',
	'est_res_pop_1_yr_plus_same_house',
	'pct_res_pop_1_yr_plus_same_house',
	'est_res_pop_1_yr_plus_diff_house_in_the_us',
	'pct_res_pop_1_yr_plus_diff_house_in_the_us',
	'est_res_pop_1_yr_plus_diff_house_in_the_us_same_county',
	'pct_res_pop_1_yr_plus_diff_house_in_the_us_same_county',
	'est_res_pop_1_yr_plus_diff_house_diff_county',
	'pct_res_pop_1_yr_plus_diff_house_diff_county',
	'est_res_pop_1_yr_plus_diff_house_diff_county_same_state',
	'pct_res_pop_1_yr_plus_diff_house_diff_county_same_state',
	'est_res_pop_1_yr_plus_diff_house_diff_county_diff_state',
	'pct_res_pop_1_yr_plus_diff_house_diff_county_diff_state',
	'est_res_pop_1_yr_plus_abroad',
	'pct_res_pop_1_yr_plus_abroad',
	'est_birth_tot_pop',
	'pct_birth_tot_pop',
	'est_birth_tot_pop_nat',
	'pct_birth_tot_pop_nat',
	'est_birth_tot_pop_nat_us',
	'pct_birth_tot_pop_nat_us',
	'est_birth_tot_pop_nat_us_state_of_res',
	'pct_birth_tot_pop_nat_us_state_of_res',
	'est_birth_tot_pop_nat_us_diff_state',
	'pct_birth_tot_pop_nat_us_diff_state',
	'est_birth_tot_pop_nat_abrd',
	'pct_birth_tot_pop_nat_abrd',
	'est_birth_tot_pop_foreign_born',
	'pct_birth_tot_pop_foreign_born',
	'est_us_cit_status_foreign_born_pop',
	'pct_us_cit_status_foreign_born_pop',
	'est_us_cit_status_foreign_born_pop_naturalized_us_citizen',
	'pct_us_cit_status_foreign_born_pop_naturalized_us_citizen',
	'est_us_cit_status_foreign_born_pop_not_a_us_citizen',
	'pct_us_cit_status_foreign_born_pop_not_a_us_citizen',
	'est_yr_of_entry_pop_born_outside_the_united_states',
	'pct_yr_of_entry_pop_born_outside_the_united_states',
	'est_yr_of_entry_nat',
	'pct_yr_of_entry_nat',
	'est_yr_of_entry_nat_entered_2010_or_later',
	'pct_yr_of_entry_nat_entered_2010_or_later',
	'est_yr_of_entry_nat_entered_before_2010',
	'pct_yr_of_entry_nat_entered_before_2010',
	'est_yr_of_entry_foreign_born',
	'pct_yr_of_entry_foreign_born',
	'est_yr_of_entry_foreign_born_entered_2010_or_later',
	'pct_yr_of_entry_foreign_born_entered_2010_or_later',
	'est_yr_of_entry_foreign_born_entered_before_2010',
	'pct_yr_of_entry_foreign_born_entered_before_2010',
	'est_computer_tot_hshld',
	'pct_computer_tot_hshld',
	'est_computer_tot_hshld_with_a_computer',
	'pct_computer_tot_hshld_with_a_computer',
	'est_computer_tot_hshld_broadband',
	'pct_computer_tot_hshld_broadband',
	'est_emp_pop_16_yrs_plus',
	'pct_emp_pop_16_yrs_plus',
	'est_emp_pop_16_yrs_plus_in_labor_force',
	'pct_emp_pop_16_yrs_plus_in_labor_force',
	'est_emp_pop_16_yrs_plus_in_labor_force_civ',
	'pct_emp_pop_16_yrs_plus_in_labor_force_civ',
	'est_emp_pop_16_yrs_plus_in_labor_force_civ_emp',
	'pct_emp_pop_16_yrs_plus_in_labor_force_civ_emp',
	'est_emp_pop_16_yrs_plus_in_labor_force_civ_unemp',
	'pct_emp_pop_16_yrs_plus_in_labor_force_civ_unemp',
	'est_emp_pop_16_yrs_plus_in_labor_force_armed_forces',
	'pct_emp_pop_16_yrs_plus_in_labor_force_armed_forces',
	'est_emp_pop_16_yrs_plus_not_in_labor_force',
	'pct_emp_pop_16_yrs_plus_not_in_labor_force',
	'est_emp_civ',
	'pct_emp_civ',
	'est_emp_civ_unemployment_rate',
	'pct_emp_civ_unemployment_rate',
	'est_emp_fems_16_yrs_plus',
	'pct_emp_fems_16_yrs_plus',
	'est_emp_fems_16_yrs_plus_in_labor_force',
	'pct_emp_fems_16_yrs_plus_in_labor_force',
	'est_emp_fems_16_yrs_plus_in_labor_force_civ',
	'pct_emp_fems_16_yrs_plus_in_labor_force_civ',
	'est_emp_fems_16_yrs_plus_in_labor_force_civ_emp',
	'pct_emp_fems_16_yrs_plus_in_labor_force_civ_emp',
	'est_emp_child_u6_yrs',
	'pct_emp_child_u6_yrs',
	'est_emp_child_u6_yrs_all_parents_in_fam_in_labor_force',
	'pct_emp_child_u6_yrs_all_parents_in_fam_in_labor_force',
	'est_emp_child_6_to_17_yrs',
	'pct_emp_child_6_to_17_yrs',
	'est_emp_child_6_to_17_yrs_all_parents_in_fam_in_labor_force',
	'pct_emp_child_6_to_17_yrs_all_parents_in_fam_in_labor_force',
	'est_inc_tot_hshld',
	'pct_inc_tot_hshld',
	'est_inc_tot_hshld_less_than_10000',
	'pct_inc_tot_hshld_less_than_10000',
	'est_inc_tot_hshld_10000_to_14999',
	'pct_inc_tot_hshld_10000_to_14999',
	'est_inc_tot_hshld_15000_to_24999',
	'pct_inc_tot_hshld_15000_to_24999',
	'est_inc_tot_hshld_25000_to_34999',
	'pct_inc_tot_hshld_25000_to_34999',
	'est_inc_tot_hshld_35000_to_49999',
	'pct_inc_tot_hshld_35000_to_49999',
	'est_inc_tot_hshld_50000_to_74999',
	'pct_inc_tot_hshld_50000_to_74999',
	'est_inc_tot_hshld_75000_to_99999',
	'pct_inc_tot_hshld_75000_to_99999',
	'est_inc_tot_hshld_100000_to_149999',
	'pct_inc_tot_hshld_100000_to_149999',
	'est_inc_tot_hshld_150000_to_199999',
	'pct_inc_tot_hshld_150000_to_199999',
	'est_inc_tot_hshld_200000_or_more',
	'pct_inc_tot_hshld_200000_or_more',
	'est_inc_tot_hshld_median_household_inc_dol',
	'pct_inc_tot_hshld_median_household_inc_dol',
	'est_inc_tot_hshld_mean_household_inc_dol',
	'pct_inc_tot_hshld_mean_household_inc_dol',
	'est_inc_with_earnings',
	'pct_inc_with_earnings',
	'est_inc_with_earnings_mean_earnings_dol',
	'pct_inc_with_earnings_mean_earnings_dol',
	'est_inc_with_soc_sec',
	'pct_inc_with_soc_sec',
	'est_inc_with_soc_sec_mean_soc_sec_inc_dol',
	'pct_inc_with_soc_sec_mean_soc_sec_inc_dol',
	'est_inc_with_retirement_inc',
	'pct_inc_with_retirement_inc',
	'est_inc_with_retirement_inc_mean_retirement_inc_dol',
	'pct_inc_with_retirement_inc_mean_retirement_inc_dol',
	'est_inc_with_supp_inc',
	'pct_inc_with_supp_inc',
	'est_inc_with_supp_inc_mean_supp_inc_dol',
	'pct_inc_with_supp_inc_mean_supp_inc_dol',
	'est_inc_with_cash_pub_ast_inc',
	'pct_inc_with_cash_pub_ast_inc',
	'est_inc_with_cash_pub_ast_inc_mean_cash_pub_ast_inc_dol',
	'pct_inc_with_cash_pub_ast_inc_mean_cash_pub_ast_inc_dol',
	'est_inc_with_food_stamp',
	'pct_inc_with_food_stamp',
	'est_inc_fam',
	'pct_inc_fam',
	'est_inc_fam_less_than_10000',
	'pct_inc_fam_less_than_10000',
	'est_inc_fam_10000_to_14999',
	'pct_inc_fam_10000_to_14999',
	'est_inc_fam_15000_to_24999',
	'pct_inc_fam_15000_to_24999',
	'est_inc_fam_25000_to_34999',
	'pct_inc_fam_25000_to_34999',
	'est_inc_fam_35000_to_49999',
	'pct_inc_fam_35000_to_49999',
	'est_inc_fam_50000_to_74999',
	'pct_inc_fam_50000_to_74999',
	'est_inc_fam_75000_to_99999',
	'pct_inc_fam_75000_to_99999',
	'est_inc_fam_100000_to_149999',
	'pct_inc_fam_100000_to_149999',
	'est_inc_fam_150000_to_199999',
	'pct_inc_fam_150000_to_199999',
	'est_inc_fam_200000_or_more',
	'pct_inc_fam_200000_or_more',
	'est_inc_fam_median_fam_inc_dol',
	'pct_inc_fam_median_fam_inc_dol',
	'est_inc_fam_mean_fam_inc_dol',
	'pct_inc_fam_mean_fam_inc_dol',
	'est_inc_per_capita_inc_dol',
	'pct_inc_per_capita_inc_dol',
	'est_inc_nonfam_hshld',
	'pct_inc_nonfam_hshld',
	'est_inc_nonfam_hshld_median_nonfam_inc_dol',
	'pct_inc_nonfam_hshld_median_nonfam_inc_dol',
	'est_inc_nonfam_hshld_mean_nonfam_inc_dol',
	'pct_inc_nonfam_hshld_mean_nonfam_inc_dol',
	'est_inc_median_earnings_for_workers_dol',
	'pct_inc_median_earnings_for_workers_dol',
	'est_inc_median_earnings_for_male_full_time_ann_workers_dol',
	'pct_inc_median_earnings_for_male_full_time_ann_workers_dol',
	'est_inc_median_earnings_for_fem_full_time_ann_workers_dol',
	'pct_inc_median_earnings_for_fem_full_time_ann_workers_dol',
	'est_pct_of_fam_pov_all_fam',
	'pct_pct_of_fam_pov_all_fam',
	'est_pct_of_fam_pov_all_fam_child_u18_yrs',
	'pct_pct_of_fam_pov_all_fam_child_u18_yrs',
	'est_pct_of_fam_pov_all_fam_child_u5_yrs_only',
	'pct_pct_of_fam_pov_all_fam_child_u5_yrs_only',
	'est_pct_of_fam_pov_marr_fam',
	'pct_pct_of_fam_pov_marr_fam',
	'est_pct_of_fam_pov_marr_fam_child_u18_yrs',
	'pct_pct_of_fam_pov_marr_fam_child_u18_yrs',
	'est_pct_of_fam_pov_marr_fam_child_u5_yrs_only',
	'pct_pct_of_fam_pov_marr_fam_child_u5_yrs_only',
	'est_pct_of_fam_pov_fam_with_fem_sngl',
	'pct_pct_of_fam_pov_fam_with_fem_sngl',
	'est_pct_of_fam_pov_fam_with_fem_sngl_child_u18_yrs',
	'pct_pct_of_fam_pov_fam_with_fem_sngl_child_u18_yrs',
	'est_pct_of_fam_pov_fam_with_fem_sngl_child_u5_yrs_only',
	'pct_pct_of_fam_pov_fam_with_fem_sngl_child_u5_yrs_only',
	'est_pct_of_fam_pov_all_people',
	'pct_pct_of_fam_pov_all_people',
	'est_pct_of_fam_pov_all_people_u18_yrs',
	'pct_pct_of_fam_pov_all_people_u18_yrs',
	'est_pct_of_fam_pov_all_people_u18_yrs_child_u18_yrs',
	'pct_pct_of_fam_pov_all_people_u18_yrs_child_u18_yrs',
	'est_pct_of_fam_pov_all_people_u18_yrs_child_u5_yrs',
	'pct_pct_of_fam_pov_all_people_u18_yrs_child_u5_yrs',
	'est_pct_of_fam_pov_all_people_u18_yrs_child_5_to_17_yrs',
	'pct_pct_of_fam_pov_all_people_u18_yrs_child_5_to_17_yrs',
	'est_pct_of_fam_pov_all_people_18_yrs_plus',
	'pct_pct_of_fam_pov_all_people_18_yrs_plus',
	'est_pct_of_fam_pov_all_people_18_yrs_plus_18_to_64_yrs',
	'pct_pct_of_fam_pov_all_people_18_yrs_plus_18_to_64_yrs',
	'est_pct_of_fam_pov_all_people_18_yrs_plus_65_yrs_plus',
	'pct_pct_of_fam_pov_all_people_18_yrs_plus_65_yrs_plus',
	'est_pct_of_fam_pov_people_in_fam',
	'pct_pct_of_fam_pov_people_in_fam',
	'est_pct_of_fam_pov_unrelated_individuals_15_yrs_plus',
	'pct_pct_of_fam_pov_unrelated_individuals_15_yrs_plus',
	'est_sex_and_age_tot_pop',
	'pct_sex_and_age_tot_pop',
	'est_sex_and_age_tot_pop_male',
	'pct_sex_and_age_tot_pop_male',
	'est_sex_and_age_tot_pop_female',
	'pct_sex_and_age_tot_pop_female',
	'est_sex_and_age_under_5_yrs',
	'pct_sex_and_age_under_5_yrs',
	'est_sex_and_age_5_to_9_yrs',
	'pct_sex_and_age_5_to_9_yrs',
	'est_sex_and_age_10_to_14_yrs',
	'pct_sex_and_age_10_to_14_yrs',
	'est_sex_and_age_15_to_19_yrs',
	'pct_sex_and_age_15_to_19_yrs',
	'est_sex_and_age_20_to_24_yrs',
	'pct_sex_and_age_20_to_24_yrs',
	'est_sex_and_age_25_to_34_yrs',
	'pct_sex_and_age_25_to_34_yrs',
	'est_sex_and_age_35_to_44_yrs',
	'pct_sex_and_age_35_to_44_yrs',
	'est_sex_and_age_45_to_54_yrs',
	'pct_sex_and_age_45_to_54_yrs',
	'est_sex_and_age_55_to_59_yrs',
	'pct_sex_and_age_55_to_59_yrs',
	'est_sex_and_age_60_to_64_yrs',
	'pct_sex_and_age_60_to_64_yrs',
	'est_sex_and_age_65_to_74_yrs',
	'pct_sex_and_age_65_to_74_yrs',
	'est_sex_and_age_75_to_84_yrs',
	'pct_sex_and_age_75_to_84_yrs',
	'est_sex_and_age_85_yrs_plus',
	'pct_sex_and_age_85_yrs_plus',
	'est_sex_and_age_median_age_yrs',
	'pct_sex_and_age_median_age_yrs',
	'est_sex_and_age_18_yrs_plus',
	'pct_sex_and_age_18_yrs_plus',
	'est_sex_and_age_21_yrs_plus',
	'pct_sex_and_age_21_yrs_plus',
	'est_sex_and_age_62_yrs_plus',
	'pct_sex_and_age_62_yrs_plus',
	'est_sex_and_age_65_yrs_plus',
	'pct_sex_and_age_65_yrs_plus',
	'est_sex_and_age_18_yrs_plus1',
	'pct_sex_and_age_18_yrs_plus1',
	'est_sex_and_age_18_yrs_plus_male',
	'pct_sex_and_age_18_yrs_plus_male',
	'est_sex_and_age_18_yrs_plus_female',
	'pct_sex_and_age_18_yrs_plus_female',
	'est_sex_and_age_65_yrs_plus1',
	'pct_sex_and_age_65_yrs_plus1',
	'est_sex_and_age_65_yrs_plus_male',
	'pct_sex_and_age_65_yrs_plus_male',
	'est_sex_and_age_65_yrs_plus_female',
	'pct_sex_and_age_65_yrs_plus_female',
	'est_race_tot_pop',
	'pct_race_tot_pop',
	'est_race_tot_pop_one_race',
	'pct_race_tot_pop_one_race',
	'est_race_tot_pop_two_or_more_races',
	'pct_race_tot_pop_two_or_more_races',
	'est_race_one_race',
	'pct_race_one_race',
	'est_race_one_race_white',
	'pct_race_one_race_white',
	'est_race_one_race_black',
	'pct_race_one_race_black',
	'est_race_one_race_nat_am_and_alaska_native',
	'pct_race_one_race_nat_am_and_alaska_native',
	'est_race_one_race_asian',
	'pct_race_one_race_asian',
	'est_race_one_race_nat_hi_and_pi',
	'pct_race_one_race_nat_hi_and_pi',
	'est_race_one_race_some_other_race',
	'pct_race_one_race_some_other_race',
	'est_race_two_or_more_races',
	'pct_race_two_or_more_races',
	'est_race_alone_or_in_part_tot_pop',
	'pct_race_alone_or_in_part_tot_pop',
	'est_race_alone_or_in_part_tot_pop_white',
	'pct_race_alone_or_in_part_tot_pop_white',
	'est_race_alone_or_in_part_tot_pop_black',
	'pct_race_alone_or_in_part_tot_pop_black',
	'est_race_alone_or_in_part_tot_pop_nat_am_and_alaska_native',
	'pct_race_alone_or_in_part_tot_pop_nat_am_and_alaska_native',
	'est_race_alone_or_in_part_tot_pop_asian',
	'pct_race_alone_or_in_part_tot_pop_asian',
	'est_race_alone_or_in_part_tot_pop_nat_hi_and_pi',
	'pct_race_alone_or_in_part_tot_pop_nat_hi_and_pi',
	'est_race_alone_or_in_part_tot_pop_some_other_race',
	'pct_race_alone_or_in_part_tot_pop_some_other_race',
	'est_latin_and_race_tot_pop_latin_of_any_race',
	'pct_latin_and_race_tot_pop_latin_of_any_race',
	'est_cit_voting_age_pop_cit_18_plus_pop',
	'pct_cit_voting_age_pop_cit_18_plus_pop',
	'est_cit_voting_age_pop_cit_18_plus_pop_male',
	'pct_cit_voting_age_pop_cit_18_plus_pop_male',
	'est_cit_voting_age_pop_cit_18_plus_pop_female',
	'pct_cit_voting_age_pop_cit_18_plus_pop_female',
	'tot_est_age3plus_enrolled_in_school',
	'pct_est_age3plus_enrolled_in_school',
	'in_public_school_est_age3plus_enrolled_in_school',
	'pct_in_pubsch_est_age3plus_enrolled_in_school',
	'in_private_school_est_age3plus_enrolled_in_school',
	'pct_in_private_school_est_age3plus_enrolled_in_school',
	'avg_00650',
	'avg_00660',
	'avg_00940',
	'avg_01051',
	'avg_32102',
	'avg_34418',
	'avg_34423',
	'avg_39175',
	'max_00650',
	'max_00660',
	'max_00940',
	'max_01051',
	'max_32102',
	'max_34418',
	'max_34423',
	'max_39175',
	'min_00650',
	'min_00660',
	'min_00940',
	'min_01051',
	'min_32102',
	'min_34418',
	'min_34423',
	'min_39175',
	'num_00650',
	'num_00660',
	'num_00940',
	'num_01051',
	'num_32102',
	'num_34418',
	'num_34423',
	'num_39175',
	'thr_00650',
	'thr_00660',
	'thr_00940',
	'thr_01051',
	'thr_32102',
	'thr_34418',
	'thr_34423',
	'thr_39175',
	'pct_thr_00650',
	'pct_thr_00660',
	'pct_thr_01051',
	'pct_thr_32102',
	'pct_thr_34418',
	'pct_thr_34423',
	'pct_thr_39175', 
]

In [11]:
len(feature_names_all)

757

### Convert X and y to numeric values

In [12]:
X = X[feature_names_all]
X = X.apply(pd.to_numeric)

In [13]:
y = df_cond['lead_result_max_bucket']
y = y.replace('max_below_5', 0).replace('max_5_plus', 1).replace('max_15_plus', 1)

## Split into train and test sets...

In [14]:
X_train, X_test, \
y_train, y_test, = train_test_split(X, y, test_size = 0.33, random_state=12)

In [15]:
print(collections.Counter(y_test))

Counter({0: 331, 1: 79})


## Load a previous model from local...

This is an example of how to load a previous model in the case one wants to study previous models.

In [16]:
filename = './models/04-03-2018_AllFeatures-TestTrainVal-SMOTE-XGBoost-ParamOptCV_model.sav'

In [18]:
loaded_model = pickle.load(open(filename, 'rb'))

In [19]:
imp = loaded_model.get_score(importance_type='weight')

## Model using a default XGBoost for purposes of understanding the features...

### Fit the model on all features in the training data

In [21]:
model = XGBClassifier()
model.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)

### Make predictions for test data and evaluate

In [22]:
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Accuracy: 77.07%


### Fit the model using each importance as a threshold

In [23]:
thresholds = unique(model.feature_importances_)
print(thresholds)

[ 0.          0.00151745  0.0030349   0.00455235  0.0060698   0.00758725
  0.0091047   0.01062215  0.01213961  0.01365706  0.01517451  0.01669196]


In [24]:
for thresh in thresholds:
    # select features using threshold
    selection = SelectFromModel(model, threshold=thresh, prefit=True)
    select_X_train = selection.transform(X_train)
    # train model
    selection_model = XGBClassifier()
    selection_model.fit(select_X_train, y_train)
    # eval model
    select_X_test = selection.transform(X_test)
    y_pred = selection_model.predict(select_X_test)
    predictions = [round(value) for value in y_pred]
    accuracy = accuracy_score(y_test, predictions)
    tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
    print("Thresh=%.3f, n=%d, Accuracy: %.2f%%, LIFT: %.2f, tp: %d/100" % (thresh, select_X_train.shape[1], 
                                                   accuracy*100.0,
                                                   lift_score(y_test, y_pred),
                                                   tp))
    #print("confusion matrix: \n\ttn = %d\n\tfp = %d\n\tfn = %d\n\ttp = %d" % (tn, fp, fn, tp))

Thresh=0.000, n=757, Accuracy: 77.07%, LIFT: 1.25, tp: 7/100
Thresh=0.002, n=277, Accuracy: 77.07%, LIFT: 1.25, tp: 7/100
Thresh=0.003, n=148, Accuracy: 78.54%, LIFT: 1.66, tp: 8/100
Thresh=0.005, n=90, Accuracy: 77.56%, LIFT: 1.35, tp: 7/100
Thresh=0.006, n=57, Accuracy: 78.29%, LIFT: 1.51, tp: 7/100
Thresh=0.008, n=32, Accuracy: 75.61%, LIFT: 0.23, tp: 1/100
Thresh=0.009, n=18, Accuracy: 75.85%, LIFT: 0.86, tp: 5/100
Thresh=0.011, n=12, Accuracy: 75.85%, LIFT: 0.60, tp: 3/100
Thresh=0.012, n=9, Accuracy: 76.34%, LIFT: 0.80, tp: 4/100
Thresh=0.014, n=8, Accuracy: 77.07%, LIFT: 0.90, tp: 4/100
Thresh=0.015, n=6, Accuracy: 77.56%, LIFT: 0.99, tp: 4/100
Thresh=0.017, n=2, Accuracy: 77.80%, LIFT: 0.37, tp: 1/100
