# Linear Probability Model

The purpose of this program is to regress a mortgage approval variable against race, ethnicity, gender, and other control variables found in HMDA data. Using the model below.

$P(Approval = 1 | Race/Sex, \chi_j) = \beta_0 + \lambda_j * Race/Sex + \beta_j * \chi_j + \mu $

Where $\lambda_j$ are the variables of interest, $\beta_j$ are the coefficients on the control variables, and $\chi_j$ are the control variables.

Variables of Interest
- White
- Black
- Asian
- Hispanic
- Other
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log)
- Pre-Approval indicators

Fixed Effects - maybe include
- Lender
- Region Indicators by Community Tract  or county

Variables ommited in model to prevent perfect collinearity.
- White
- Male

Filters
- Loan Purpose
- Occupancy Type

Clustered Standard errors
- by Lender
- by Region
- by County

Other regressions to run that will use similar controls.
- Simplified Model(Just variables of interest)
- Restricted Model
- Interest Rates
- Denial Rates
- Fixed Effects Model
- Years other than 2019

In [None]:
import pandas as pd
import numpy as np
from linearmodels import PanelOLS
import statsmodels.api as sm
from statsmodels.formula.api import ols

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

In [None]:
'''
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.test.is_built_with_cuda()
print(tf.version.VERSION)
import sys
print(sys.version)
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
'''

## Load in and manipulate dataset.

Below is for manipulating the dataset before running it through the funciton.

In [None]:
# Load in HMDA Data
HMDA_clean_file_location = r'2019 HMDA Clean IL SAMPLE.csv'
HMDA_clean_0 = pd.read_csv(HMDA_clean_file_location)
HMDA_clean_0

In [None]:
#HMDA_clean.columns

### Check for further cleaning

In [None]:
#HMDA_clean.info()

In [None]:
#Clean df
HMDA_clean_1 = HMDA_clean_0.copy()
HMDA_clean_1 = HMDA_clean_1.dropna()
HMDA_clean_1['Census_Tract'] = HMDA_clean_1['Census_Tract'].apply(str)
#HMDA_clean.info()

Below filters the occupancy type to Principal residence. It omits secondary residence purposes and investment purposes.

In [None]:
# "Occupancy_Type" = 1, Second Residence" = 2, "Investment Property" = 3.
HMDA_clean = HMDA_clean_1[HMDA_clean_1["Occupancy_Type"] == 1]

### Model 0 - Race*Sex Interaction Only

In [2]:
No_Controls_Model = ols("Approved ~ Race*Sex", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
No_Controls_Model.summary()

NameError: name 'ols' is not defined

### Model 1 - Indicators Only

In [3]:
#omit ['White', 'Not Hispanic', 'Male','DTI_less_than_20']
#don't forget to add census tract, lei, and relationships
Model_1 = ols("Approved ~ Race + Sex\
          + LTV + DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_1.summary()

NameError: name 'ols' is not defined

### Model 2 - Race/Ethnicity/Sex Interactions

In [4]:
Model_2 = ols("Approved ~ Race*Ethnicity*Sex\
          + LTV + DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_2.summary()

NameError: name 'ols' is not defined

### Model 3 - DTI/LTV Interactions

In [5]:
Model_3 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_3.summary()

NameError: name 'ols' is not defined

### Model 4 - Lender and Census_Tract Interaction

In [6]:
Model_4 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI*Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_4.summary()

NameError: name 'ols' is not defined

### Model 5 - Approval and Loan type indicators

In [None]:
Model_5 = ols("Approved ~ Race*Sex\
          + LTV*DTI_Ratio + Lender_LEI*Census_Tract\
          + Log_Income + Log_Loan_Amount\
          + C(preapproval) + C(Loan_Type)", data = HMDA_clean).fit(cov_type = 'Cluster', cov_kwds = {'groups': HMDA_clean['County_Code']})
Model_5.summary

# Model Summaries

In [None]:
LPM_Model_Variables = {
    'Model' : [0,1,2,3,4,5],
    'Black' : [],
    'Asian' : [],
    'Latinx' : [],
    'Other' : [],
    'Female' : [], 
    'LTV' : [],
    'DTI' : [],
    'Lender' : [],
    'Census Tract' : [],
    'PreApproval + Loan Type' : [0,0,0,0,0,1],
    'Lender/Census Tract Interactions' : [0,0],
    'Race/Sex Interactions' : [1,0,1,1,1,1],
    'LTV/DTI Interactions' : [0],
    }