# Linear Probability Model

The purpose of this program is to regress a mortgage application approval against other variables in the HMDA data.

$P(Approval = 1 | \lambda_j, \chi_j) = \beta_0 + \lambda_j * Race/Ethnic/Gender Indicators  + \beta_j * \chi_j + \mu $

Variables of Interest
- White
- Black
- Asian
- Other
- Multi-Race Interactions
- Hispanic
- Hispanic and Race Interactions
- Non-Hispanic
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log
- Lender
- Region Indicators by Community Tract  or county

Possible Other Control Variables
- ***Credit Score
- Credit Score Type
- Pre-Approval indicators

Filters
- Loan Purpose
- Loan Type

Possible other Filters
- Occupancy Type(*Primary residence, secondary, or Investment)

Clustered Standard errors
- by Lender
- by Region

Other regressions to run that will use similar controls.
- Simplified Model(Just variables of interest)
- Restricted Model
- Interest Rates
- Denial Rates
- Fixed Effects Model
- Years other than 2019

In [31]:
import pandas as pd
import numpy as np
from linearmodels import PooledOLS
import statsmodels.api as sm

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

In [14]:
'''
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.test.is_built_with_cuda()
print(tf.version.VERSION)
import sys
print(sys.version)
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
'''

'\nimport tensorflow as tf\n\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\nfrom tensorflow.keras.layers.experimental import preprocessing\n\nprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(\'GPU\')))\ntf.test.is_built_with_cuda()\nprint(tf.version.VERSION)\nimport sys\nprint(sys.version)\ngpu = len(tf.config.list_physical_devices(\'GPU\'))>0\nprint("GPU is", "available" if gpu else "NOT AVAILABLE")\n'

## Load in and manipulate dataset.

Below is for manipulating the dataset before running it through the funciton.

In [32]:
# Load in HMDA Data
HMDA_clean_file_location = r'2019 HMDA Clean Sample.csv'
HMDA_clean = pd.read_csv(HMDA_clean_file_location)
HMDA_clean.head()

Unnamed: 0.1,Unnamed: 0,State,County Code,Census Tract,Approved,Denied,White,Black,Asian,Other,Hispanic,Not Hispanic,Male,Female,Income,Log Income,Loan Amount,Log Loan Amount,LTV,Loan Type_Conventional,Loan Type_FHA,Loan Type_VA,Loan Type_RHS or FSA,DTI_20%-<30%,DTI_30%-<36%,DTI_36,DTI_38,DTI_39,DTI_40,DTI_41,DTI_42,DTI_43,DTI_44,DTI_45,DTI_46,DTI_47,DTI_48,DTI_50%-60%,DTI_<20%,DTI_>60%
0,4,VA,51683.0,51683910000.0,1,0,1,0,0,0,0,1,1,0,115.0,4.744932,395000.0,12.886641,100.0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,8,WI,55133.0,55133200000.0,1,0,1,0,0,0,0,1,0,1,71.0,4.26268,225000.0,12.323856,95.79,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,9,FL,12101.0,12101030000.0,0,1,0,1,0,0,0,1,0,1,42.0,3.73767,185000.0,12.128111,80.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,10,TX,48157.0,48157670000.0,0,1,0,1,0,0,0,1,0,1,203.0,5.313206,85000.0,11.350407,90.0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,12,TX,48121.0,48121020000.0,1,0,1,0,0,0,0,1,1,0,75.0,4.317488,295000.0,12.594731,100.0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [21]:
# Filter information

In [23]:
HMDA_clean.columns

Index(['State', 'County Code', 'Census Tract', 'Approved', 'Denied', 'White',
       'Black', 'Asian', 'Other', 'Hispanic', 'Not Hispanic', 'Male', 'Female',
       'Income', 'Log Income', 'Loan Amount', 'Log Loan Amount', 'LTV',
       'Loan Type_Conventional', 'Loan Type_FHA', 'Loan Type_VA',
       'Loan Type_RHS or FSA', 'DTI_20%-<30%', 'DTI_30%-<36%', 'DTI_36',
       'DTI_38', 'DTI_39', 'DTI_40', 'DTI_41', 'DTI_42', 'DTI_43', 'DTI_44',
       'DTI_45', 'DTI_46', 'DTI_47', 'DTI_48', 'DTI_50%-60%', 'DTI_<20%',
       'DTI_>60%'],
      dtype='object')

## Run Model

In [35]:
data = HMDA_clean.set_index(['County Code', 'Year'])
exogenous_variables = sm.add_constant(HMDA_clean[['White','Black', 'Asian', 'Other', 
                'Hispanic', 'Not Hispanic', 
                'Male', 'Female',
                'Log Income', 'Log Loan Amount', 'LTV',
                'DTI_20%-<30%', 'DTI_30%-<36%', 'DTI_36','DTI_38',
                'DTI_39', 'DTI_40', 'DTI_41', 'DTI_42', 'DTI_43', 'DTI_44','DTI_45',\
                'DTI_46', 'DTI_47', 'DTI_48', 'DTI_50%-60%', 'DTI_<20%','DTI_>60%']])
dependent_variable = HMDA_clean['Approved']
model = PanelOLS(dependent_variable, exogenous_variables, entity_effects = True).fit()
model

KeyError: "None of ['Year'] are in the columns"