# Linear Probability Model

The purpose of this program is to regress a mortgage application approval against other variables in the HMDA data.

$P(Approval = 1 | \lambda_j, \chi_j) = \beta_0 + \lambda_j * Race/Ethnic/Gender Indicators  + \beta_j * \chi_j + \mu $

Variables of Interest
- White
- Black
- Asian
- Other
- Multi-Race Interactions
- Hispanic
- Hispanic and Race Interactions
- Non-Hispanic
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log)
- Credit Score Type?
- Lender 
- Co-Applicant
- Region Indicators by Community Tract  or county

Possible Other Control Variables
- Credit Score Type
- Pre-Approval indicators
- AUS(Automated Underwriting System) Decision

Possible Filters
- Loan Type
- Pre-Approval
- Purchaser Type
- Loan Purpose
- DTI and LTV outliers

Clustered Standard errors
- by Lender
- by Region

Other regressions to run that will use similar controls.
- Interest Rates
- Denial Rates

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

tensorflow below

In [2]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.test.is_built_with_cuda()
print(tf.version.VERSION)
import sys
print(sys.version)
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")

ModuleNotFoundError: No module named 'tensorflow'

## Filter the Dataset before running the model.

Below is for manipulating the dataset before running it through the funciton.

In [3]:
# Load in HMDA Data
HMDA_raw_csv_file_location = r'2019_HMDA_raw.csv'

#Do not change this code! 
HMDA_raw_CSV = pd.read_csv(HMDA_raw_csv_file_location)
HMDA_raw_CSV

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,activity_year,lei,derived_msa-md,state_code,county_code,census_tract,conforming_loan_limit,derived_loan_product_type,derived_dwelling_category,derived_ethnicity,derived_race,derived_sex,action_taken,purchaser_type,preapproval,loan_type,loan_purpose,lien_status,reverse_mortgage,open-end_line_of_credit,business_or_commercial_purpose,loan_amount,loan_to_value_ratio,interest_rate,rate_spread,hoepa_status,total_loan_costs,total_points_and_fees,origination_charges,discount_points,lender_credits,loan_term,prepayment_penalty_term,intro_rate_period,negative_amortization,interest_only_payment,balloon_payment,other_nonamortizing_features,property_value,construction_method,occupancy_type,manufactured_home_secured_property_type,manufactured_home_land_property_interest,total_units,multifamily_affordable_units,income,debt_to_income_ratio,applicant_credit_score_type,co-applicant_credit_score_type,applicant_ethnicity-1,applicant_ethnicity-2,applicant_ethnicity-3,applicant_ethnicity-4,applicant_ethnicity-5,co-applicant_ethnicity-1,co-applicant_ethnicity-2,co-applicant_ethnicity-3,co-applicant_ethnicity-4,co-applicant_ethnicity-5,applicant_ethnicity_observed,co-applicant_ethnicity_observed,applicant_race-1,applicant_race-2,applicant_race-3,applicant_race-4,applicant_race-5,co-applicant_race-1,co-applicant_race-2,co-applicant_race-3,co-applicant_race-4,co-applicant_race-5,applicant_race_observed,co-applicant_race_observed,applicant_sex,co-applicant_sex,applicant_sex_observed,co-applicant_sex_observed,applicant_age,co-applicant_age,applicant_age_above_62,co-applicant_age_above_62,submission_of_application,initially_payable_to_institution,aus-1,aus-2,aus-3,aus-4,aus-5,denial_reason-1,denial_reason-2,denial_reason-3,denial_reason-4,tract_population,tract_minority_population_percent,ffiec_msa_md_median_family_income,tract_to_msa_income_percentage,tract_owner_occupied_units,tract_one_to_four_family_homes,tract_median_age_of_housing_units
0,2019,KB1H1DSPRFMYMCUFXT09,26420,TX,48167.0,48167722001,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Hispanic or Latino,White,Male,6,1,2,1,1,1,2,2,2,215000.0,,4.25,,2,5374.98,,2557.55,,,360,,,2,2,2,2,225000.0,1,1,3,5,1,,113.0,,9,9,1.0,11.0,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,35-44,9999,No,,3,3,6,,,,,10,,,,5457,41.73,77100,104,1682,1895,23
1,2019,KB1H1DSPRFMYMCUFXT09,26420,TX,48071.0,48071710100,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Male,6,1,2,1,1,1,2,2,2,255000.0,,4.25,,2,9823.34,,2682.5,,,360,,,2,2,2,2,275000.0,1,1,3,5,1,,62.0,,9,9,2.0,,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,25-34,9999,No,,3,3,6,,,,,10,,,,8119,19.53,77100,130,1961,2443,18
2,2019,KB1H1DSPRFMYMCUFXT09,47894,VA,51177.0,51177020305,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Joint,6,1,2,1,1,1,2,2,2,195000.0,,3.75,,2,3410.68,,1434.88,239.88,,360,,,2,2,2,2,205000.0,1,1,3,5,1,,43.0,,9,9,2.0,,,,,2.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,1,2,2,2,45-54,35-44,No,No,3,3,6,,,,,10,,,,3533,34.59,114700,61,659,1098,30
3,2019,KB1H1DSPRFMYMCUFXT09,31180,TX,48303.0,48303010302,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Joint,6,1,2,1,1,1,2,2,2,385000.0,,4.375,,2,4155.35,,285.0,,1148.74,360,,,2,2,2,2,415000.0,1,1,3,5,1,,140.0,,9,9,2.0,,,,,2.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,1,2,2,2,25-34,25-34,No,No,3,3,6,,,,,10,,,,2693,29.37,61900,123,674,1009,35
4,2019,KB1H1DSPRFMYMCUFXT09,41700,TX,48325.0,48325000102,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Hispanic or Latino,White,Male,6,3,2,1,1,1,2,2,2,245000.0,,4.25,,2,4568.38,,1535.0,,,360,,,2,2,2,2,265000.0,1,1,3,5,1,,62.0,,9,9,1.0,11.0,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,55-64,9999,No,,3,3,6,,,,,10,,,,8455,39.34,71000,133,2453,3293,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17545452,2019,KB1H1DSPRFMYMCUFXT09,41700,TX,48029.0,48029140200,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Hispanic or Latino,White,Female,6,1,2,1,1,1,2,2,2,355000.0,,4.375,,2,5537.24,,1535.0,,,360,,,2,2,2,2,395000.0,1,1,3,5,1,,200.0,,9,9,1.0,,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,2,5,2,4,35-44,9999,No,,3,3,6,,,,,10,,,,2767,82.29,71000,64,576,1029,73
17545453,2019,KB1H1DSPRFMYMCUFXT09,41700,TX,48029.0,48029121906,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Male,6,1,2,1,1,1,2,2,2,255000.0,,4.375,,2,7348.98,,4100.0,,1500.0,360,,,2,2,2,2,285000.0,1,1,3,5,1,,92.0,,9,9,2.0,,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,35-44,9999,No,,3,3,6,,,,,10,,,,5661,42.41,71000,158,1401,1635,18
17545454,2019,KB1H1DSPRFMYMCUFXT09,23104,TX,48439.0,48439113813,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Joint,6,1,2,1,1,1,2,2,2,305000.0,,4.375,,2,4752.42,,1250.0,,,360,,,2,2,2,2,345000.0,1,1,3,5,1,,90.0,,9,9,2.0,,,,,2.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,1,2,2,2,35-44,35-44,No,No,3,3,6,,,,,10,,,,4940,20.04,75300,176,1647,1689,21
17545455,2019,KB1H1DSPRFMYMCUFXT09,39580,NC,37183.0,37183053725,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Joint,6,1,2,1,1,1,2,2,2,185000.0,,3.875,,2,2797.0,,899.0,,,360,,,2,2,2,2,205000.0,1,1,3,5,1,,73.0,,9,9,2.0,,,,,2.0,,,,,1,1,5.0,,,,,5.0,,,,,1,1,1,2,1,1,<25,<25,No,No,3,3,6,,,,,10,,,,5147,25.65,93100,148,1232,1348,16


In [6]:
#Smaller Sample for testing code.
HMDA_sample = HMDA_raw_CSV.sample(100)
HMDA_sample.to_csv('2019_HMDA_sample.csv',index = False)

In [8]:
#Load in Smaller Sample
HMDA_sample_file_location = r'2019_HMDA_sample.csv'

#Do not change this code! 
HMDA_sample = pd.read_csv(HMDA_sample_file_location)
HMDA_sample

Unnamed: 0,activity_year,lei,derived_msa-md,state_code,county_code,census_tract,conforming_loan_limit,derived_loan_product_type,derived_dwelling_category,derived_ethnicity,derived_race,derived_sex,action_taken,purchaser_type,preapproval,loan_type,loan_purpose,lien_status,reverse_mortgage,open-end_line_of_credit,business_or_commercial_purpose,loan_amount,loan_to_value_ratio,interest_rate,rate_spread,hoepa_status,total_loan_costs,total_points_and_fees,origination_charges,discount_points,lender_credits,loan_term,prepayment_penalty_term,intro_rate_period,negative_amortization,interest_only_payment,balloon_payment,other_nonamortizing_features,property_value,construction_method,occupancy_type,manufactured_home_secured_property_type,manufactured_home_land_property_interest,total_units,multifamily_affordable_units,income,debt_to_income_ratio,applicant_credit_score_type,co-applicant_credit_score_type,applicant_ethnicity-1,applicant_ethnicity-2,applicant_ethnicity-3,applicant_ethnicity-4,applicant_ethnicity-5,co-applicant_ethnicity-1,co-applicant_ethnicity-2,co-applicant_ethnicity-3,co-applicant_ethnicity-4,co-applicant_ethnicity-5,applicant_ethnicity_observed,co-applicant_ethnicity_observed,applicant_race-1,applicant_race-2,applicant_race-3,applicant_race-4,applicant_race-5,co-applicant_race-1,co-applicant_race-2,co-applicant_race-3,co-applicant_race-4,co-applicant_race-5,applicant_race_observed,co-applicant_race_observed,applicant_sex,co-applicant_sex,applicant_sex_observed,co-applicant_sex_observed,applicant_age,co-applicant_age,applicant_age_above_62,co-applicant_age_above_62,submission_of_application,initially_payable_to_institution,aus-1,aus-2,aus-3,aus-4,aus-5,denial_reason-1,denial_reason-2,denial_reason-3,denial_reason-4,tract_population,tract_minority_population_percent,ffiec_msa_md_median_family_income,tract_to_msa_income_percentage,tract_owner_occupied_units,tract_one_to_four_family_homes,tract_median_age_of_housing_units
0,2019,5493000YNV8IX4VD3X12,99999,KY,21187.0,2.118797e+10,C,Conventional:First Lien,Single Family (1-4 Units):Manufactured,Not Hispanic or Latino,White,Female,5,0,2,1,1,1,2,2,2,95000.0,,,,3,,,,,,276,,,2,2,2,2,,2,1,1,1,1,,32.0,,9,9,2.0,,,,,2.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,2,2,2,2,25-34,45-54,No,No,1,1,6,,,,,10,,,,3297,5.58,49800,135,1007,1471,31
1,2019,549300O6Z0I6KYMESL47,16984,IL,17031.0,1.703181e+10,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Hispanic or Latino,White,Joint,4,0,2,1,1,1,2,2,1,215000.0,,,,3,,,,,,360.0,,,2,1,2,2,,1,3,3,5,1,,0.0,,9,9,1.0,,,,,1.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,1,2,2,2,55-64,>74,No,Yes,2,1,6,,,,,10,,,,4279,39.10,82000,112,1331,1484,57
2,2019,549300XRXBA38J60S618,99999,ID,16039.0,1.603996e+10,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Male,1,0,2,1,1,1,2,2,2,345000.0,90.0,4.375,0.462,2,4633.0,,1595.0,,,360,,,2,2,2,2,385000,1,1,3,5,1,,0.0,44,3,10,2.0,,,,,3.0,,,,,2,2,5.0,,,,,6.0,,,,,2,2,1,1,2,2,55-64,65-74,Yes,Yes,1,1,1,,,,,10,,,,2570,22.49,57100,89,854,1916,39
3,2019,549300NBFJM2PBH24L77,48620,KS,20173.0,2.017301e+10,C,Conventional:First Lien,Single Family (1-4 Units):Manufactured,Not Hispanic or Latino,White,Male,1,0,2,1,31,1,1111,1111,1111,15000.0,Exempt,Exempt,Exempt,3,Exempt,Exempt,Exempt,Exempt,Exempt,Exempt,Exempt,Exempt,1111,1111,1111,1111,Exempt,2,3,1111,1111,1,Exempt,85.0,Exempt,1111,1111,2.0,,,,,5.0,,,,,1,4,5.0,,,,,8.0,,,,,1,4,1,5,1,4,25-34,9999,No,,1111,1111,1111,,,,,1111,,,,5346,20.50,69500,126,1667,1904,24
4,2019,C5654JQHZUHN0772B561,47894,VA,51683.0,5.168391e+10,C,VA:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Male,1,2,2,3,1,1,2,2,2,395000.0,100.0,3.5,-0.1828,2,11566.57,,497.98,497.98,,360,,,2,2,2,2,395000.0,1,1,3,5,1,,115.0,30%-<36%,1,10,2.0,,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,25-34,9999,No,,1,1,1,,,,,10,,,,5854,51.81,114700,76,1025,1355,29
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2019,254900FL0I8OUMMBOZ11,38060,AZ,4013.0,4.013072e+09,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Ethnicity Not Available,Native Hawaiian or Other Pacific Islander,Male,1,0,2,1,31,1,2,2,1,125000.0,85.0,11.0,,3,,,,,,11.0,,,2,1,1,2,145000.0,1,3,3,5,1,,,,9,10,3.0,,,,,5.0,,,,,2,4,5.0,44.0,,,,8.0,,,,,2,4,1,5,2,4,8888,9999,,,1,1,6,,,,,10,,,,3031,8.02,72900,65,1701,2620,49
96,2019,549300NOCASXPA34X033,15180,TX,48061.0,4.806101e+10,C,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Ethnicity Not Available,White,Male,3,0,2,1,32,1,2,2,2,115000.0,80.0,,,3,,,,,,360.0,,,2,2,2,2,145000.0,1,1,3,5,1,,,,9,10,3.0,,,,,5.0,,,,,2,4,5.0,,,,,8.0,,,,,2,4,1,5,2,4,55-64,9999,Yes,,1,2,6,,,,,7,,,,7046,90.41,44000,123,1686,2246,32
97,2019,B4TYDEB6GKMZO031MB27,22744,FL,12011.0,1.201104e+10,C,Conventional:Subordinate Lien,Single Family (1-4 Units):Site-Built,Ethnicity Not Available,Race Not Available,Male,4,0,2,1,2,2,2,1,2,105000.0,,,,3,,,,,,360,36,1,2,2,2,2,,1,1,3,5,1,,160.0,,9,9,3.0,,,,,5.0,,,,,2,4,6.0,,,,,8.0,,,,,2,4,1,5,2,4,35-44,9999,No,,1,1,6,,,,,10,,,,4961,25.06,68600,173,1389,1705,49
98,2019,549300CY7WNAHKHYSJ73,99999,WA,53037.0,5.303798e+10,NC,Conventional:First Lien,Single Family (1-4 Units):Site-Built,Not Hispanic or Latino,White,Joint,1,0,2,1,1,1,2,2,2,615000.0,77.22,3.875,0.437,2,25867.65,,21239.25,13126.89,,369,,,2,1,2,2,805000.0,1,1,3,5,1,,151.0,20%-<30%,2,9,2.0,,,,,2.0,,,,,2,2,5.0,,,,,5.0,,,,,2,2,2,1,2,2,25-34,25-34,No,No,1,1,1,,,,,10,,,,4562,5.09,63500,110,1412,2038,20


In [None]:
CommunityArea_and_topholder_file = r'CommunityARea and Topholder.xlsx'
Community_Area_List = pd.read_excel(CommunityArea_and_topholder_file)
# Community_Area_List[:5]

In [None]:
#Quick merge 
table_1 = HMDA_raw_CSV
table_2 = Community_Area_List
table_1_column = 'census_tract'
table_2_column = 'FIPS'

#Do not change this code!
merged_data = pd.merge(table_1, table_2, left_on = table_1_column, right_on = table_2_column)
merged_data

print('Merge complete')
# merged_data[:5]

In [None]:
regulators_and_supervisors_file_location = r'2020 Regulator and Supervisor Mod1.xlsx'
regulators_and_supervisors = pd.read_excel(regulators_and_supervisors_file_location)

table_3 = merged_data
table_4 = regulators_and_supervisors
table_3_column = 'lei'
table_4_column = 'lei'

#Do not change this code!
merged_data_full = pd.merge(table_3, table_4, left_on = table_3_column, right_on = table_4_column)
merged_data_full

print('Merge complete')

In [None]:
merged_data_full.head()