# Linear Probability Model

The purpose of this program is to regress a mortgage approval variable against race, ethnicity, gender, and other control variables found in HMDA data. Using the model below.

$P(Approval = 1 | Race/Sex, \chi_j) = \beta_0 + \lambda_j * Race/Sex + \beta_j * \chi_j + \mu $

Where $\lambda_j$ are the variables of interest, $\beta_j$ are the coefficients on the control variables, and $\chi_j$ are the control variables.

Variables of Interest
- White
- Black
- Asian
- Hispanic
- Other
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log)
- Pre-Approval indicators

Fixed Effects - maybe include
- Lender
- Region Indicators by Community Tract  or county

Variables ommited in model to prevent perfect collinearity.
- White
- Male

Filters
- Loan Purpose
- Occupancy Type

Clustered Standard errors
- by Lender
- by Region
- by County

Other regressions to run that will use similar controls.
- Simplified Model(Just variables of interest)
- Restricted Model
- Interest Rates
- Denial Rates
- Fixed Effects Model
- Years other than 2019

In [24]:
import pandas as pd
import numpy as np
from linearmodels import PanelOLS
import statsmodels.api as sm
from statsmodels.formula.api import ols

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

In [25]:
'''
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.test.is_built_with_cuda()
print(tf.version.VERSION)
import sys
print(sys.version)
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
'''

'\nimport tensorflow as tf\n\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\nfrom tensorflow.keras.layers.experimental import preprocessing\n\nprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(\'GPU\')))\ntf.test.is_built_with_cuda()\nprint(tf.version.VERSION)\nimport sys\nprint(sys.version)\ngpu = len(tf.config.list_physical_devices(\'GPU\'))>0\nprint("GPU is", "available" if gpu else "NOT AVAILABLE")\n'

## Load in and manipulate dataset.

Below is for manipulating the dataset before running it through the funciton.

In [26]:
# Load in HMDA Data
HMDA_clean_file_location = r'2019 HMDA Clean IL SAMPLE.csv'
HMDA_clean_0 = pd.read_csv(HMDA_clean_file_location)
HMDA_clean_0

Unnamed: 0,Year,Lender_LEI,State,County_Code,Census_Tract,Approved,Denied,Race,Sex,Income,Log_Income,Loan_Amount,Log_Loan_Amount,LTV,Loan_Type,DTI_Ratio,preapproval,Occupancy_Type,Index
0,2019,549300U3721PJGQZYY68,IL,17031.0,1.703183e+10,1,0,Black,Female,54.0,3.988984,155000.0,11.951180,99.690,Conventional,44,2,1,497447
1,2019,2WHM8VNJH63UN14OL754,IL,17031.0,1.703105e+10,1,0,0_White,0_Male,164.0,5.099866,375000.0,12.834681,90.000,Conventional,38,2,1,485056
2,2019,549300BRJZYHYKT4BJ84,IL,17063.0,1.706300e+10,1,0,Latinx,Female,53.0,3.970292,155000.0,11.951180,101.010,RHS or FSA,36,2,1,89546
3,2019,549300RBX56T2MW5HO19,IL,17031.0,1.703161e+10,0,1,Black,0_Male,71.0,4.262680,355000.0,12.779873,100.000,VA,37,2,1,97097
4,2019,549300HGDJQ37M5BE268,IL,17167.0,1.716700e+10,1,0,0_White,Female,44.0,3.784190,165000.0,12.013701,97.000,Conventional,41,2,1,116672
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22470,2019,549300FGXN1K3HLB1R50,IL,17031.0,1.703183e+10,1,0,0_White,0_Male,177.0,5.176150,155000.0,11.951180,95.000,Conventional,20%-<30%,2,1,188823
22471,2019,549300SELI3XCH3UZW80,IL,17089.0,1.708985e+10,1,0,0_White,Female,74.0,4.304065,255000.0,12.449019,95.000,Conventional,49,2,1,7044
22472,2019,KV8W1JTB8FZ821S5ED75,IL,17097.0,1.709786e+10,0,1,0_White,0_Male,98.0,4.584967,325000.0,12.691580,90.000,Conventional,42,2,1,326056
22473,2019,254900D8UVDBN0LNLV64,IL,17031.0,1.703126e+10,1,0,Latinx,Female,48.0,3.871201,195000.0,12.180755,96.500,FHA,50%-60%,2,1,472329


In [27]:
HMDA_clean_0.columns

Index(['Year', 'Lender_LEI', 'State', 'County_Code', 'Census_Tract',
       'Approved', 'Denied', 'Race', 'Sex', 'Income', 'Log_Income',
       'Loan_Amount', 'Log_Loan_Amount', 'LTV', 'Loan_Type', 'DTI_Ratio',
       'preapproval', 'Occupancy_Type', 'Index'],
      dtype='object')

### Check for further cleaning

In [28]:
#HMDA_clean.info()

In [29]:
#Clean df
HMDA_clean_1 = HMDA_clean_0.copy()
HMDA_clean_1 = HMDA_clean_1.dropna()
HMDA_clean_1['Census_Tract'] = HMDA_clean_1['Census_Tract'].apply(str)
#HMDA_clean.info()

Below filters the occupancy type to Principal residence. It omits secondary residence purposes and investment purposes.

In [30]:
# "Occupancy_Type" = 1, Second Residence" = 2, "Investment Property" = 3.
HMDA_clean = HMDA_clean_1[HMDA_clean_1["Occupancy_Type"] == 1]

### Model 0 - Race*Sex Interaction Only

In [31]:
No_Controls_Model = ols("Approved ~ Race*Sex", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
No_Controls_Model.summary()

0,1,2,3
Dep. Variable:,Approved,R-squared:,0.014
Model:,OLS,Adj. R-squared:,0.014
Method:,Least Squares,F-statistic:,63.57
Date:,"Fri, 17 Jun 2022",Prob (F-statistic):,8.11e-116
Time:,12:09:24,Log-Likelihood:,-3254.5
No. Observations:,21213,AIC:,6529.0
Df Residuals:,21203,BIC:,6609.0
Df Model:,9,,
Covariance Type:,Cluster,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.9329,0.003,300.563,0.000,0.927,0.939
Race[T.Asian],-0.0219,0.008,-2.914,0.004,-0.037,-0.007
Race[T.Black],-0.0891,0.015,-5.837,0.000,-0.119,-0.059
Race[T.Latinx],-0.0670,0.012,-5.486,0.000,-0.091,-0.043
Race[T.Other],-0.1037,0.065,-1.585,0.113,-0.232,0.025
Sex[T.Female],-0.0030,0.006,-0.522,0.602,-0.014,0.008
Race[T.Asian]:Sex[T.Female],0.0101,0.014,0.719,0.472,-0.017,0.038
Race[T.Black]:Sex[T.Female],-0.0166,0.011,-1.539,0.124,-0.038,0.005
Race[T.Latinx]:Sex[T.Female],0.0122,0.013,0.915,0.360,-0.014,0.038

0,1,2,3
Omnibus:,11794.524,Durbin-Watson:,2.0
Prob(Omnibus):,0.0,Jarque-Bera (JB):,62829.791
Skew:,-2.837,Prob(JB):,0.0
Kurtosis:,9.236,Cond. No.,45.6


### Model 1 - Indicators Only

In [3]:
#omit ['White', 'Not Hispanic', 'Male','DTI_less_than_20']
#don't forget to add census tract, lei, and relationships
Model_1 = ols("Approved ~ Race + Sex\
          + LTV + DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_1.summary()

NameError: name 'ols' is not defined

### Model 2 - Race/Ethnicity/Sex Interactions

In [4]:
Model_2 = ols("Approved ~ Race*Ethnicity*Sex\
          + LTV + DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_2.summary()

NameError: name 'ols' is not defined

### Model 3 - DTI/LTV Interactions

In [5]:
Model_3 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_3.summary()

NameError: name 'ols' is not defined

### Model 4 - Lender and Census_Tract Interaction

In [6]:
Model_4 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI*Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_4.summary()

NameError: name 'ols' is not defined

### Model 5 - Approval and Loan type indicators

In [None]:
Model_5 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI*Census_Tract\
          + Log_Income + Log_Loan_Amount\
          + C(preapproval) + C(Loan_Type)", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_5.summary

# Model Summaries

In [None]:
LPM_Model_Variables = {
    'Model' : [0,1,2,3,4,5],
    'Black' : [],
    'Asian' : [],
    'Latinx' : [],
    'Other' : [],
    'Female' : [], 
    'LTV' : [],
    'DTI' : [],
    'Lender' : [],
    'Census Tract' : [],
    'PreApproval + Loan Type' : [0,0,0,0,0,1],
    'Lender/Census Tract Interactions' : [0,0],
    'Race/Sex Interactions' : [1,0,1,1,1,1],
    'LTV/DTI Interactions' : [0],
    }