# Fixed Effects Model

The purpose of this program is to regress a mortgage approval variable against race, ethnicity, gender, and other control variables found in HMDA data. Using the model below.

$P(Approval = 1 | Race/Sex, \chi_ji, \alpha_i) = \beta_0 + \lambda_ji * Race/Sex + \beta_ji * \chi_ji + \alpha_i + \mu $

Where $\lambda_ji$ are the variables of interest, $\beta_ji$ are the coefficients on the control variables, 
alpha_i are the fixed effects, and $\chi_j$ are the control variables.

Variables of Interest
- White
- Black
- Asian
- Hispanic
- Other
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log)
- Pre-Approval indicators

Variables ommited in model to prevent perfect collinearity.
- White
- Male

Filters
- Loan Purpose
- Occupancy Type

Clustered Standard errors
- by Lender
- by Region
- by County

In [3]:
import pandas as pd
import numpy as np
from linearmodels import PanelOLS

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

In [4]:
'''
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.test.is_built_with_cuda()
print(tf.version.VERSION)
import sys
print(sys.version)
gpu = len(tf.config.list_physical_devices('GPU'))>0
print("GPU is", "available" if gpu else "NOT AVAILABLE")
'''

'\nimport tensorflow as tf\n\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\nfrom tensorflow.keras.layers.experimental import preprocessing\n\nprint("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices(\'GPU\')))\ntf.test.is_built_with_cuda()\nprint(tf.version.VERSION)\nimport sys\nprint(sys.version)\ngpu = len(tf.config.list_physical_devices(\'GPU\'))>0\nprint("GPU is", "available" if gpu else "NOT AVAILABLE")\n'

## Load in and manipulate dataset.

Below is for manipulating the dataset before running it through the funciton.

In [5]:
# Load in HMDA Data
HMDA_clean_file_location = r'2019 HMDA Clean IL SAMPLE.csv'
HMDA_clean_0 = pd.read_csv(HMDA_clean_file_location)
HMDA_clean_0

Unnamed: 0,Year,Lender_LEI,State,County_Code,Census_Tract,Approved,Denied,Race,Sex,Income,Log_Income,Loan_Amount,Log_Loan_Amount,LTV,Loan_Type,DTI_Ratio,preapproval,Occupancy_Type,Index
0,2019,549300VJ7C5JB4W87F48,IL,17111.0,1.711187e+10,1,0,0_White,0_Male,66.0,4.189655,145000.0,11.884489,101.000,Conventional,36,2,1,370900
1,2019,E57ODZWZ7FF32TWEFA76,IL,17031.0,1.703184e+10,0,1,Asian,Female,55.0,4.007333,35000.0,10.463103,63.640,Conventional,0%-20%,2,1,144508
2,2019,549300YOESI1GLKRL151,IL,17093.0,1.709389e+10,1,0,0_White,0_Male,51.0,3.931826,205000.0,12.230765,90.000,Conventional,48,2,1,552247
3,2019,549300J7XKT2BI5WX213,IL,17143.0,1.714300e+10,1,0,0_White,Female,31.0,3.433987,65000.0,11.082143,95.000,Conventional,20%-<30%,2,1,462945
4,2019,254900378RFGMBEKAF12,IL,17161.0,1.716102e+10,1,0,0_White,0_Male,128.0,4.852030,285000.0,12.560244,79.740,Conventional,37,2,1,352517
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22615,2019,KB1H1DSPRFMYMCUFXT09,IL,17113.0,1.711300e+10,1,0,0_White,Female,66.0,4.189655,145000.0,11.884489,56.863,Conventional,41,2,1,270518
22616,2019,549300BX448ALT10FI43,IL,17031.0,1.703182e+10,1,0,0_White,Female,98.0,4.584967,275000.0,12.524526,98.189,FHA,50%-60%,2,1,129608
22617,2019,549300U3721PJGQZYY68,IL,17031.0,1.703180e+10,1,0,Other,0_Male,236.0,5.463832,345000.0,12.751300,80.000,Conventional,0%-20%,2,1,494891
22618,2019,549300VJ7C5JB4W87F48,IL,17089.0,1.708985e+10,1,0,0_White,0_Male,286.0,5.655992,455000.0,13.028053,85.000,Conventional,0%-20%,2,1,369044


In [6]:
HMDA_clean_0.columns
pd.unique(HMDA_clean_0['Race'])

array(['0_White', 'Asian', 'Black', 'Latinx', 'Other'], dtype=object)

### Check for further cleaning

In [7]:
#HMDA_clean.info()

In [8]:
#Clean df
HMDA_clean_1 = HMDA_clean_0.copy()
HMDA_clean_1 = HMDA_clean_1.dropna()
HMDA_clean_1['Census_Tract'] = HMDA_clean_1['Census_Tract'].apply(str)
#HMDA_clean.info()

Below filters the occupancy type to Principal residence. It omits secondary residence purposes and investment purposes.

In [9]:
# "Occupancy_Type" = 1, Second Residence" = 2, "Investment Property" = 3.
HMDA_clean_2 = HMDA_clean_1[HMDA_clean_1["Occupancy_Type"] == 1]

In [10]:
#Add Constant
HMDA_clean_2.insert(0, 'Constant', 1)

In [11]:
#Create dummy variables.
HMDA_clean_3 = pd.get_dummies(HMDA_clean_2, columns = ['Race', 'Sex'])
HMDA_clean_3

Unnamed: 0,Constant,Year,Lender_LEI,State,County_Code,Census_Tract,Approved,Denied,Income,Log_Income,Loan_Amount,Log_Loan_Amount,LTV,Loan_Type,DTI_Ratio,preapproval,Occupancy_Type,Index,Race_0_White,Race_Asian,Race_Black,Race_Latinx,Race_Other,Sex_0_Male,Sex_Female
0,1,2019,549300VJ7C5JB4W87F48,IL,17111.0,17111870809.0,1,0,66.0,4.189655,145000.0,11.884489,101.000,Conventional,36,2,1,370900,1,0,0,0,0,1,0
1,1,2019,E57ODZWZ7FF32TWEFA76,IL,17031.0,17031839200.0,0,1,55.0,4.007333,35000.0,10.463103,63.640,Conventional,0%-20%,2,1,144508,0,1,0,0,0,0,1
2,1,2019,549300YOESI1GLKRL151,IL,17093.0,17093890102.0,1,0,51.0,3.931826,205000.0,12.230765,90.000,Conventional,48,2,1,552247,1,0,0,0,0,1,0
3,1,2019,549300J7XKT2BI5WX213,IL,17143.0,17143003900.0,1,0,31.0,3.433987,65000.0,11.082143,95.000,Conventional,20%-<30%,2,1,462945,1,0,0,0,0,0,1
4,1,2019,254900378RFGMBEKAF12,IL,17161.0,17161020100.0,1,0,128.0,4.852030,285000.0,12.560244,79.740,Conventional,37,2,1,352517,1,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22615,1,2019,KB1H1DSPRFMYMCUFXT09,IL,17113.0,17113001402.0,1,0,66.0,4.189655,145000.0,11.884489,56.863,Conventional,41,2,1,270518,1,0,0,0,0,0,1
22616,1,2019,549300BX448ALT10FI43,IL,17031.0,17031824108.0,1,0,98.0,4.584967,275000.0,12.524526,98.189,FHA,50%-60%,2,1,129608,1,0,0,0,0,0,1
22617,1,2019,549300U3721PJGQZYY68,IL,17031.0,17031804506.0,1,0,236.0,5.463832,345000.0,12.751300,80.000,Conventional,0%-20%,2,1,494891,0,0,0,0,1,1,0
22618,1,2019,549300VJ7C5JB4W87F48,IL,17089.0,17089852101.0,1,0,286.0,5.655992,455000.0,13.028053,85.000,Conventional,0%-20%,2,1,369044,1,0,0,0,0,1,0


### Set Index

In [13]:
HMDA_application_index = HMDA_clean_3.set_index(['Index','Year'])
HMDA_lender_index = HMDA_clean_3.set_index(['Lender_LEI','Year'])
HMDA_lender_index

Unnamed: 0_level_0,Unnamed: 1_level_0,Constant,State,County_Code,Census_Tract,Approved,Denied,Income,Log_Income,Loan_Amount,Log_Loan_Amount,LTV,Loan_Type,DTI_Ratio,preapproval,Occupancy_Type,Index,Race_0_White,Race_Asian,Race_Black,Race_Latinx,Race_Other,Sex_0_Male,Sex_Female
Lender_LEI,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
549300VJ7C5JB4W87F48,2019,1,IL,17111.0,17111870809.0,1,0,66.0,4.189655,145000.0,11.884489,101.000,Conventional,36,2,1,370900,1,0,0,0,0,1,0
E57ODZWZ7FF32TWEFA76,2019,1,IL,17031.0,17031839200.0,0,1,55.0,4.007333,35000.0,10.463103,63.640,Conventional,0%-20%,2,1,144508,0,1,0,0,0,0,1
549300YOESI1GLKRL151,2019,1,IL,17093.0,17093890102.0,1,0,51.0,3.931826,205000.0,12.230765,90.000,Conventional,48,2,1,552247,1,0,0,0,0,1,0
549300J7XKT2BI5WX213,2019,1,IL,17143.0,17143003900.0,1,0,31.0,3.433987,65000.0,11.082143,95.000,Conventional,20%-<30%,2,1,462945,1,0,0,0,0,0,1
254900378RFGMBEKAF12,2019,1,IL,17161.0,17161020100.0,1,0,128.0,4.852030,285000.0,12.560244,79.740,Conventional,37,2,1,352517,1,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
KB1H1DSPRFMYMCUFXT09,2019,1,IL,17113.0,17113001402.0,1,0,66.0,4.189655,145000.0,11.884489,56.863,Conventional,41,2,1,270518,1,0,0,0,0,0,1
549300BX448ALT10FI43,2019,1,IL,17031.0,17031824108.0,1,0,98.0,4.584967,275000.0,12.524526,98.189,FHA,50%-60%,2,1,129608,1,0,0,0,0,0,1
549300U3721PJGQZYY68,2019,1,IL,17031.0,17031804506.0,1,0,236.0,5.463832,345000.0,12.751300,80.000,Conventional,0%-20%,2,1,494891,0,0,0,0,1,1,0
549300VJ7C5JB4W87F48,2019,1,IL,17089.0,17089852101.0,1,0,286.0,5.655992,455000.0,13.028053,85.000,Conventional,0%-20%,2,1,369044,1,0,0,0,0,1,0


OLS - Race*Sex Only

In [20]:
OLS_Race_Sex_Indicators = PanelOLS.from_formula('Approved ~ Constant + (Race_Asian + Race_Black + Race_Latinx + Race_Other)*Sex_Female', data = HMDA_lender_index).fit(cov_type = "clustered", cluster_entity = True)
OLS_Race_Sex_Indicators

0,1,2,3
Dep. Variable:,Approved,R-squared:,0.0177
Estimator:,PanelOLS,R-squared (Between):,-0.0154
No. Observations:,21425,R-squared (Within):,0.0000
Date:,"Fri, Jun 17 2022",R-squared (Overall):,0.0177
Time:,13:23:22,Log-likelihood,-2871.0
Cov. Estimator:,Clustered,,
,,F-statistic:,42.899
Entities:,435,P-value,0.0000
Avg Obs:,49.253,Distribution:,"F(9,21415)"
Min Obs:,1.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Constant,0.9387,0.0054,174.09,0.0000,0.9281,0.9492
Race_Asian,-0.0268,0.0103,-2.6112,0.0090,-0.0469,-0.0067
Race_Black,-0.1071,0.0148,-7.2116,0.0000,-0.1362,-0.0780
Race_Latinx,-0.0590,0.0091,-6.4549,0.0000,-0.0769,-0.0411
Race_Other,-0.0012,0.0363,-0.0317,0.9747,-0.0724,0.0701
Sex_Female,-0.0054,0.0052,-1.0329,0.3017,-0.0157,0.0049
Race_Asian:Sex_Female,0.0097,0.0143,0.6773,0.4983,-0.0183,0.0377
Race_Black:Sex_Female,-0.0105,0.0176,-0.5961,0.5511,-0.0449,0.0240
Race_Latinx:Sex_Female,-0.0045,0.0131,-0.3387,0.7348,-0.0302,0.0213


FE - Race*Sex Only

In [21]:
FE_Race_Sex_Indicators = PanelOLS.from_formula('Approved ~ Constant + (Race_Asian + Race_Black + Race_Latinx + Race_Other)*Sex_Female + EntityEffects', data = HMDA_lender_index).fit(cov_type = "clustered", cluster_entity = True)
FE_Race_Sex_Indicators

0,1,2,3
Dep. Variable:,Approved,R-squared:,0.0172
Estimator:,PanelOLS,R-squared (Between):,-0.0153
No. Observations:,21425,R-squared (Within):,0.0000
Date:,"Fri, Jun 17 2022",R-squared (Overall):,0.0177
Time:,13:23:29,Log-likelihood,-2168.1
Cov. Estimator:,Clustered,,
,,F-statistic:,40.748
Entities:,435,P-value,0.0000
Avg Obs:,49.253,Distribution:,"F(9,20981)"
Min Obs:,1.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Constant,0.9397,0.0029,329.65,0.0000,0.9341,0.9453
Race_Asian,-0.0269,0.0098,-2.7554,0.0059,-0.0461,-0.0078
Race_Black,-0.1094,0.0148,-7.3988,0.0000,-0.1383,-0.0804
Race_Latinx,-0.0575,0.0092,-6.2642,0.0000,-0.0755,-0.0395
Race_Other,0.0101,0.0323,0.3121,0.7550,-0.0532,0.0734
Sex_Female,-0.0084,0.0048,-1.7408,0.0817,-0.0178,0.0011
Race_Asian:Sex_Female,0.0110,0.0129,0.8582,0.3908,-0.0142,0.0363
Race_Black:Sex_Female,-0.0104,0.0167,-0.6204,0.5350,-0.0431,0.0224
Race_Latinx:Sex_Female,-0.0050,0.0130,-0.3863,0.6992,-0.0304,0.0204


# Compare Models

A link to the documentation to build a table from. https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.results.PanelEffectsResults.html

In [48]:
from linearmodels.panel import compare
compare({'OLS - Model 0' : OLS_Race_Sex_Indicators,
         'FE - Model 0' : FE_Race_Sex_Indicators
        },
        stars = True)

0,1,2
,OLS - Model 0,FE - Model 0
Dep. Variable,Approved,Approved
Estimator,PanelOLS,PanelOLS
No. Observations,21425,21425
Cov. Est.,Clustered,Clustered
R-squared,0.0177,0.0172
R-Squared (Within),0.0000,0.0000
R-Squared (Between),-0.0154,-0.0153
R-Squared (Overall),0.0177,0.0177
F-statistic,42.899,40.748


In [None]:
#omit ['White', 'Not Hispanic', 'Male','DTI_less_than_20']
#don't forget to add census tract, lei, and relationships
Model_1 = PanelOLS.from_formula("Approved ~ Race + Sex\
          + LTV + DTI_Ratio + Lender_LEI + Census_Tract\
          + Log_Income + Log_Loan_Amount", data = HMDA_clean).fit(cov_type = "clustered", cluster_entity = True)
Model_1

# Model Summaries

In [None]:
LPM_Model_Variables = {
    'Model' : [0,1,2,3,4,5],
    'Black' : [],
    'Asian' : [],
    'Latinx' : [],
    'Other' : [],
    'Female' : [], 
    'LTV' : [],
    'DTI' : [],
    'Lender' : [],
    'Census Tract' : [],
    'PreApproval + Loan Type' : [0,0,0,0,0,1],
    'Lender/Census Tract Interactions' : [0,0],
    'Race/Sex Interactions' : [1,0,1,1,1,1],
    'LTV/DTI Interactions' : [0],
    }

In [None]:
from collections import OrderedDict
from linearmodels.iv.results import compare




compare(LPM_Model_Variables)