# Fixed Effects Model

The purpose of this program is to regress a mortgage approval variable against race, ethnicity, gender, and other control variables found in HMDA data. Using the model below.

$P(Approval = 1 | Race/Sex, \chi_ji, \alpha_i) = \beta_0 + \lambda_ji * Race/Sex + \beta_ji * \chi_ji + \alpha_i + \mu $

Where $\lambda_ji$ are the variables of interest, $\beta_ji$ are the coefficients on the control variables, 
alpha_i are the fixed effects, and $\chi_j$ are the control variables.

Variables of Interest
- White
- Black
- Asian
- Hispanic
- Other
- Male 
- Female

Control Variables
- Income (log)
- Loan to Value ratio
- Debt to Income ratio
- Loan Amount (log)
- Pre-Approval indicators

Variables ommited in model to prevent perfect collinearity.
- Race - White
- Sex - Male

Filters
- Loan Purpose
- Occupancy Type

Clustered Standard errors
- by Lender
- by State
- by County
- by Census Tract

https://timeseriesreasoning.com/contents/the-fixed-effects-regression-model-for-panel-data-sets/

In [3]:
import pandas as pd
import numpy as np
from linearmodels import PanelOLS
from linearmodels.panel import compare
import statsmodels.api as sm
import statsmodels.formula.api as smf

# np.set_printoptions(precision=3, suppress=True)

#This will allow all columns to be displayed when reviewing the data.
pd.options.display.max_columns = None

## Load in and manipulate dataset.

Below is for manipulating the dataset before running it through the funciton.

In [4]:
# Load in HMDA Data
HMDA_clean_file_location = r'HMDA Clean IL SAMPLE.csv'
HMDA_clean_0 = pd.read_csv(HMDA_clean_file_location)
#HMDA_clean_0

### Further Cleaning

In [5]:
#Clean df
HMDA_clean_1 = HMDA_clean_0.copy()
HMDA_clean_1 = HMDA_clean_1.dropna()
HMDA_clean_1['Census_Tract'] = HMDA_clean_1['Census_Tract'].apply(str)

#Filter Occupancy type to Principoal residence. Omits secondary residence purposes and investment purposes.
# "Occupancy_Type" = 1, Second Residence" = 2, "Investment Property" = 3.
HMDA_clean_2_1 = HMDA_clean_1[HMDA_clean_1["Occupancy_Type"] == 1]

#Sets County_Code and Census_Tract as strings.
HMDA_clean_2 = HMDA_clean_2_1.copy()
HMDA_clean_2['County_Code'] = HMDA_clean_2['County_Code'].astype(str)
HMDA_clean_2['Census_Tract'] = HMDA_clean_2['Census_Tract'].astype(str)
#HMDA_clean_2

### Set Index

In [6]:
HMDA_Lender_LEI_index = HMDA_clean_2.set_index(['Lender_LEI', 'Year'])
HMDA_State_index = HMDA_clean_2.set_index(['State','Year'])
HMDA_County_Code_index = HMDA_clean_2.set_index(['County_Code','Year'])
HMDA_Census_Tract_index = HMDA_clean_2.set_index(['Census_Tract','Year'])
HMDA_Lender_LEI_index

Unnamed: 0_level_0,Unnamed: 1_level_0,index,State,County_Code,Census_Tract,Approved,Denied,Race,Sex,Income,Log_Income,Loan_Amount,Log_Loan_Amount,LTV,Loan_Type,DTI_Ratio,Preapproval,Occupancy_Type
Lender_LEI,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
549300VZVN841I2ILS84,2021,9,IL,17037.0,17037000300.0,1,0,0_White,0_Male,58.0,4.060443,155000.0,11.951180,100.945,RHS or FSA,30%-<36%,0 No Preapproval Request,1
549300BX448ALT10FI43,2019,11,IL,17031.0,17031834500.0,1,0,0_White,Female,75.0,4.317488,355000.0,12.779873,98.188,FHA,49,0 No Preapproval Request,1
549300RIPPSJXAQKZ383,2021,12,IL,17043.0,17043840103.0,1,0,0_White,0_Male,47.0,3.850148,165000.0,12.013701,95.000,Conventional,45,0 No Preapproval Request,1
549300HW662MN1WU8550,2019,19,IL,17031.0,17031805402.0,0,1,0_White,0_Male,78.0,4.356709,235000.0,12.367341,95.000,Conventional,43,0 No Preapproval Request,1
549300O7SGM8FH65GQ47,2019,21,IL,17039.0,17039971400.0,1,0,0_White,0_Male,44.0,3.784190,75000.0,11.225243,95.000,Conventional,0%-20%,0 No Preapproval Request,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
549300BBKMRS6J0CB539,2019,765857,IL,17197.0,17197883210.0,1,0,0_White,Female,132.0,4.882802,285000.0,12.560244,89.610,VA,50%-60%,0 No Preapproval Request,1
5493000F6NFDOVVZP043,2021,765862,IL,17113.0,17113000102.0,1,0,0_White,0_Male,71.0,4.262680,205000.0,12.230765,100.000,VA,39,0 No Preapproval Request,1
549300AG64NHILB7ZP05,2020,765864,IL,17031.0,17031251200.0,1,0,Latinx,0_Male,51.0,3.931826,255000.0,12.449019,96.500,FHA,50%-60%,0 No Preapproval Request,1
549300AR4BCLQFU47165,2020,765867,IL,17097.0,17097861901.0,1,0,0_White,0_Male,88.0,4.477337,185000.0,12.128111,81.182,Conventional,0%-20%,Preapproval Requested,1


## Run Models

OLS - Race*Sex Only

In [7]:
OLS_Race_Sex_Indicators = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex', data = HMDA_Lender_LEI_index).fit(cov_type = "clustered", cluster_entity = True)
#OLS_Race_Sex_Indicators

FE_Model_0 - Race*Sex Only

In [8]:
OLS_Model_1 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV + DTI_Ratio', data = HMDA_Lender_LEI_index).fit(cov_type = "clustered", cluster_entity = True)
#OLS_Model_1

FE_Model_1 includes Log Income, Log Loan Amount, LTV, and DTI

In [9]:
FE_Model_1 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + EntityEffects + TimeEffects', data = HMDA_Lender_LEI_index).fit(cov_type = "clustered", cluster_entity = True)
#FE_Model_1

FE_Model_2 adds preapproval and loan type.

In [10]:
FE_Model_2 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + EntityEffects + TimeEffects', data = HMDA_Lender_LEI_index).fit(cov_type = "clustered", cluster_entity = True)
#FE_Model_2

FE_Model_3 changes the index to State. Adds Lender Indicators.

In [11]:
#Will only work with more than one state. The current test sample is for IL only.

#FE_Model_3 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + Lender_LEI + EntityEffects', data = HMDA_State_index).fit(cov_type = "clustered", cluster_entity = True)
#FE_Model_3

FE_Model_4 changes the index to County Code.

In [12]:
FE_Model_4 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + Lender_LEI + EntityEffects + TimeEffects', data = HMDA_County_Code_index).fit(cov_type = "clustered", cluster_entity = True)
#FE_Model_4

FE_Model_5 changes the index to Census Tract.

In [13]:
#This current model may not work if there are less than 2 cases for each census tract. Drop census tracts that may have this??? 

#FE_Model_5 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + Lender_LEI + EntityEffects', data = HMDA_Census_Tract_index).fit(cov_type = "clustered", cluster_entity = True)
#FE_Model_5

FE_Model_6 changes the covariance type to unadjusted.

In [14]:
FE_Model_6 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + Lender_LEI + EntityEffects + TimeEffects', data = HMDA_County_Code_index).fit()
#FE_Model_6

FE_Model_7 changes the covariance type to unadjusted, adds County Indicators, and changes index to lender.

In [15]:
FE_Model_7 = PanelOLS.from_formula('Approved ~ 1 + (Race)*Sex + Log_Income + Log_Loan_Amount + LTV*DTI_Ratio + Preapproval + Loan_Type + County_Code + EntityEffects + TimeEffects', data = HMDA_Lender_LEI_index).fit()
#FE_Model_7

# Compare Models

A link to the documentation to build a table from. https://bashtage.github.io/linearmodels/panel/panel/linearmodels.panel.results.PanelEffectsResults.html

In [16]:
comparison_2 = compare({'OLS - Model 0' : OLS_Race_Sex_Indicators,
         'OLS - Model 1' : OLS_Model_1,
         'FE - Model 1' : FE_Model_1,
         'FE - Model 2' : FE_Model_2,
         'FE - Model 4' : FE_Model_4,
         'FE - Model 6' : FE_Model_6,
         'FE - Model 7' : FE_Model_7 
        }, stars = True).params

#Pull rows from model comparisons.
Model_Comparison_1 = comparison_2.loc[['Race[T.Asian]', 'Race[T.Black]',
                                   'Race[T.Other]','Race[T.Latinx]', 'Sex[T.Female]']]
Model_Comparison_2 = Model_Comparison_1.set_axis(['Asian', 'Black', 'Other', 'LatinX', 'Female'])


#Add rows that give information on each regression.
Model_Comparison_2.loc['LTV + DTI Interactions'] = ['No', 'Yes', 'Yes', 'Yes','Yes','Yes','Yes']
Model_Comparison_2.loc['Preapproval + Loan Type'] = ['No', 'No', 'No', 'Yes','Yes','Yes','Yes']
Model_Comparison_2.loc['Lender Indicators'] = ['No', 'No', 'No', 'No','Yes','Yes','No']
Model_Comparison_2.loc['County Code Indicator'] = ['No', 'No', 'No', 'No','No','No','Yes']
Model_Comparison_2.loc['Entity/Time Effects'] = ['No', 'no', 'Yes', 'Yes','Yes','Yes','Yes']
Model_Comparison_2.loc['Index'] = ['Lender', 'Lender', 'Lender', 'Lender','County','County','Lender']
Model_Comparison_2.loc['Cov. Estimator'] = ['Clustered', 'Clustered', 'Clustered', 'Clustered','Clustered','Normal','Normal']
Model_Comparison_2.loc['Overal R Squared'] = [OLS_Race_Sex_Indicators.rsquared_overall, OLS_Model_1.rsquared_overall, 
                                                FE_Model_1.rsquared_overall, FE_Model_2.rsquared_overall, FE_Model_4.rsquared_overall,
                                                FE_Model_6.rsquared_overall, FE_Model_7.rsquared_overall]

Model_Comparison_2

Unnamed: 0,OLS - Model 0,OLS - Model 1,FE - Model 1,FE - Model 2,FE - Model 4,FE - Model 6,FE - Model 7
Asian,-0.018581,-0.021582,-0.016268,-0.015286,-0.013015,-0.013015,-0.013015
Black,-0.109482,-0.07344,-0.064555,-0.065484,-0.061282,-0.061282,-0.061282
Other,-0.055886,-0.032915,-0.022089,-0.023075,-0.022847,-0.022847,-0.022847
LatinX,-0.055295,-0.032208,-0.023448,-0.02306,-0.019571,-0.019571,-0.019571
Female,-0.0023,0.003547,0.001756,0.002322,0.002648,0.002648,0.002648
LTV + DTI Interactions,No,Yes,Yes,Yes,Yes,Yes,Yes
Preapproval + Loan Type,No,No,No,Yes,Yes,Yes,Yes
Lender Indicators,No,No,No,No,Yes,Yes,No
County Code Indicator,No,No,No,No,No,No,Yes
Entity/Time Effects,No,no,Yes,Yes,Yes,Yes,Yes


### For Distributed Computing

Idea behind distribution
https://stats.stackexchange.com/questions/263429/how-to-run-linear-regression-in-a-parallel-distributed-way-for-big-data-setting

SGD vs OLS
https://www.youtube.com/watch?v=fkS3FkVAPWU

Example of Fixed Effects
https://www.youtube.com/watch?v=FCm3_Id6RKM&t=532s

Fixed Effects Model Explanation
https://www.youtube.com/watch?v=J9UEYUXi6lY&t=28s

TensorFlow Examples(OLS and SGD)
https://github.com/jldbc/Tensorflow_ML_Algorithms

OLS From Scratch
https://jianghaochu.github.io/ordinary-least-squares-regression-in-python-from-scratch.html, 
https://www.youtube.com/watch?v=KYNuzfn5Fx0