![lihtc](https://camo.githubusercontent.com/af8ed0a0f65baaa8c90afabbd29e8b1fbc19b48fcf63b86028d5143a0d22acb9/68747470733a2f2f7777772e696864612e6f72672f77702d636f6e74656e742f75706c6f6164732f323031352f30382f494844412d4c6f772d496e636f6d652d5461782d4372656469742d30322d332e6a7067)

# What is the Low Income Housing Tax Credit?

#### Sources:

- [Wikipedia article](https://en.wikipedia.org/wiki/Low-Income_Housing_Tax_Credit)
- [Tax Reform Act of 1986](https://en.wikipedia.org/wiki/Tax_Reform_Act_of_1986)
- [NYT - Opinion: A Tax Credit Worth Preserving](https://www.nytimes.com/2012/12/21/opinion/a-tax-credit-worth-preserving.html?_r=0)
- [Tax Policy Center: What is the Low-Income Housing Tax Credit and how does it work?](https://www.taxpolicycenter.org/briefing-book/what-low-income-housing-tax-credit-and-how-does-it-work)
- [Office Of Policy Development And Research (PD&R): Low-Income Housing Tax Credit LIHTC](https://www.huduser.gov/PORTAL/datasets/lihtc.html)
- [Urban Institute: The Low-Income Housing Tax Credit](https://www.urban.org/sites/default/files/publication/98761/lithc_past_achievements_future_challenges_final_0.pdf)
- [Omnibus Consolidations Appropriations Act of 2018](https://en.wikipedia.org/wiki/Consolidated_Appropriations_Act,_2018)
- [Senate Bill 548](https://www.congress.gov/bill/115th-congress/senate-bill/548)

As the maximum rent that can be charged is based upon the Area Median Income ("AMI"), LIHTC housing remains unaffordable to many low-income (<30% AMI) renters.

The tax credits are more attractive than tax deductions as the credits provide a dollar-for-dollar reduction in a taxpayers federal income tax, whereas a tax deduction only provides a reduction in taxable income.

How it works

The LIHTC provides funding for the development costs of low-income housing by allowing an investor (usually the partners of a partnership that owns the housing) to take a federal tax credit equal to a percentage (either 4% or 9%, for 10 years, depending on the credit type) of the cost incurred for development of the low-income units in a rental housing project.

To take advantage of the LIHTC, a developer will either (i) propose a project to a state agency, seek and win a competitive allocation of tax credits, or (ii) obtain approval and issuance of tax-exempt bonds to finance at least 50% of project cost, and then complete the project, certify its cost, and rent-up the project to low income tenants. Simultaneously, an investor will be found that will make a capital contribution to the partnership or limited liability company that owns the project in exchange for being allocated the entity's LIHTCs over a ten-year period. The amount of the credit will be based on (i) the amount of credits awarded to the project in the competition, (ii) the actual cost of the project, (iii) the tax credit rate announced by the IRS, and (iv) the percentage of the project's units that are rented to low-income tenants. Failure to comply with the applicable rules, or a sale of the project or an ownership interest before the end of at least a 15-year period, can lead to recapture of credits previously taken, as well as the inability to take future credits. These rules are described in greater detail below.

### What is the objective of this project? 

The objective for this project is to explore the LIHTC dataset, get a better understanding of the program, and build a regressor to predict what the allocated amount for a project ought to be based off of various features in the dataset.

We also might build a classifier to determine if a particular project qualifies for certain binary criteria. This depends on the quality of the data.



### How should the problem be framed? 

We should use a regression algorithm as we are looking to predict whether amount allocated to a particular area.

# Initialize Packages

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Load data

In [None]:
pd.set_option('display.max_rows', 20)

In [None]:
pathname = '/Users/blakenicholson/Documents/Personal/Coding/DataAnalysis-LIHTC/LIHTCPUB.CSV'

In [None]:
df = pd.read_csv(pathname, low_memory=False)

In [None]:
df.shape

In [None]:
df.columns

Let's check out the first few rows.

In [None]:
df.head()

In [None]:
df[0:4].T

In [None]:
df.describe().T

In [None]:
df.info()

# Data Exploration

- Variable Identification
- Univariate Analysis
- Bi-variate Analysis
- Missing values treatment
- Outlier treatment
- Variable transformation
- Variable creation

# Data Cleaning

using data dictionary found here: https://github.com/bnicholson206/DataAnalysis-LIHTC/blob/main/LIHTC%20Data%20Dictionary%202019.pdf

We have a fair amount of NaN values in the dataset.

In [None]:
df.isnull().any()

First, identify Predictor (Input) and Target (output) variables. Next, identify the data type and category of the variables.

Predictors = All Columns

Target = Allocamt

Convert all floats to Integers

In [None]:
df_num = df.select_dtypes(include='float')
# df_num.drop(columns=['latitude','longitude'])

In [None]:
df_num.head()

#### Fill in NaN values

In [None]:
# fill all NaN values and change data type to 'int'

df[df_num.columns] = df[df_num.columns].fillna(0)
df[df_num.columns] = df[df_num.columns].astype('int')

#### Convert all discrete numbers into discrete categories

In [None]:
# convert all discrete numbers into discrete categories

def scattered_site_cd_numttocat(number): # Scattered Sity Property
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def resyndication_cd_numttocat(number): # Resyndicated Property
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def Non_Profit_numttocat(number): # Non profit sponsor
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def nonprog_numttocat(number): # Non profit sponsor
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    else:
        return 'No'
    
def Basis_Profit_numttocat(number): # Increase in eligible basis
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def Bond_Profit_numttocat(number): # Tax-exempt bond received
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def Home_numttocat(number): # HOME Investment Partnership Program Funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def Mrr_ra_Profit_numttocat(number): # HUD Multi-Family financial/rental assistance
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def fmha_514_numttocat(number): # FmHA (RHS) Section 514 Loan
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def fmha_515_numttocat(number): # FmHA (RHS) Section 515 Loan
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def fmha_538_numttocat(number): # FmHA (RHS) Section 515 Loan
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def rad_numttocat(number): # Housing Trust Fund funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def htf_numttocat(number): # Housing Trust Fund funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def hopevi_numttocat(number): # Forms part of a HOPEVI development
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def tcep_numttocat(number): # Tax Credit Exchange Program (TCEP) funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def tcap_numttocat(number): # Tax Credit Assistance Program (TCEP) funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def fha_numttocat(number): #FHA-insured loan
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def cdbg_numttocat(number): # community development block grant (CDBG) funds
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def inc_ceil_numttocat(number): #Elected Rent/income ceiling for low income units
    if number is 1:
        return '50% AMGI'
    elif number is 2:
        return '60% AMGI'
    elif number is 3:
        return 'Not Reported'
    
def Low_ceil_numttocat(number): # Units set aside with rents lower than elected rent/income ceiling
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def record_stat_numttocat(number): # Units set aside with rents lower than elected rent/income ceiling
    if number is 'N':
        return 'New'
    elif number is 'U':
        return 'Updated'
    elif number is 'X':
        return 'Existing'
    
def Rentassist_stat_numttocat(number): # Federal or state project-based rental assistance contract
    if number is 1:
        return 'Federal'
    elif number is 2:
        return 'State'
    elif number is 3:
        return 'Both State and Federal'
    elif number is 4:
        return 'Neither'
    elif number is 5:
        return 'Unknown'
    
def Type_numttocat(number): # Federal or state project-based rental assistance contract
    if number is 1:
        return 'New Construction'
    elif number is 2:
        return 'Acquisition and Rehab'
    elif number is 3:
        return 'Both new construction and A/R'
    elif number is 4:
        return 'Existing'
    
def Credit_numttocat(number): # Type of credit percentage
    if number is 1:
        return '30% present value'
    elif number is 2:
        return '70% present value'
    elif number is 3:
        return 'Both'
    elif number is 4:
        return 'TCEP Only'
    
def dda_numttocat(number): # Is the census tract in a difficult development area?
    if number is 0:
        return 'Not in DDA'
    elif number is 1:
        return 'In Metro DDA'
    elif number is 2:
        return 'In Non-Metro DDA'
    elif number is 3:
        return 'In Metro GO Zone DDA'
    elif number is 4:
        return 'In Non-Metro GO Zone DDA'
    
def metro_numttocat(number): # Is the census tract metro or non-metro at the time the property was placed into service?
    if number is 1:
        return 'Metro/Non-Central City'
    elif number is 2:
        return 'Metro/Central City'
    elif number is 3:
        return 'Non-Metro'
    
def nlm_reason_numttocat(number): # Reason property is no longer monitored for LIHTC
    if number is 1:
        return 'Completed Extended-Use Period'
    elif number is 2:
        return 'Sale under Qualified Contract'
    elif number is 3:
        return 'Other'
    
def qct_numttocat(number): # Is the census tract a qualified census tract?
    if number is 1:
        return 'In a Qualified Census Tract'
    elif number is 2:
        return 'Not In a Qualified Census Tract'
    
def qozf_numttocat(number): # Qualified Opportunity Zone Fund
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_pop_numttocat(number): # Targets a specific population with specified services or facilities
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_fam_numttocat(number): # Targets a specific population - families
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_eld_numttocat(number): # Targets a specific population - elderly
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_dis_numttocat(number): # Targets a specific population - disabled
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_hml_numttocat(number): # Targets a specific population - homeless
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'
    
def trgt_other_numttocat(number): # Targets a specific population - other
    if number is 1:
        return 'Yes'
    elif number is 2:
        return 'No'

In [None]:
df["scattered_site_cd"] = df["scattered_site_cd"].map(scattered_site_cd_numttocat)
df["resyndication_cd"] = df["resyndication_cd"].map(resyndication_cd_numttocat)
df["inc_ceil"] = df["inc_ceil"].map(inc_ceil_numttocat)
df["low_ceil"] = df["low_ceil"].map(Low_ceil_numttocat)
df["record_stat"] = df["record_stat"].map(record_stat_numttocat)
df["non_prof"] = df["non_prof"].map(Non_Profit_numttocat)
df["basis"] = df["basis"].map(Basis_Profit_numttocat)
df["bond"] = df["bond"].map(Bond_Profit_numttocat)
df["mff_ra"] = df["mff_ra"].map(Mrr_ra_Profit_numttocat)
df["fmha_514"] = df["fmha_514"].map(fmha_514_numttocat)
df["fmha_515"] = df["fmha_515"].map(fmha_515_numttocat)
df["fmha_538"] = df["fmha_538"].map(fmha_538_numttocat)
df["home"] = df["home"].map(Home_numttocat)
df["rentassist"] = df["rentassist"].map(Rentassist_stat_numttocat)
df["type"] = df["type"].map(Type_numttocat)
df["credit"] = df["credit"].map(Credit_numttocat)
df["dda"] = df["dda"].map(dda_numttocat)
df["metro"] = df["metro"].map(metro_numttocat)
df["nlm_reason"] = df["nlm_reason"].map(nlm_reason_numttocat)
df["tcap"] = df["tcap"].map(tcap_numttocat)
df["cdbg"] = df["cdbg"].map(cdbg_numttocat)
df["htf"] = df["htf"].map(htf_numttocat)
df["hopevi"] = df["hopevi"].map(hopevi_numttocat)
df["fha"] = df["fha"].map(fha_numttocat)
df["tcep"] = df["tcep"].map(tcep_numttocat)
df["rad"] = df["rad"].map(rad_numttocat)
df["nonprog"] = df["nonprog"].map(nonprog_numttocat)
df["qct"] = df["qct"].map(qct_numttocat)
df["qozf"] = df["qozf"].map(qozf_numttocat)
df["trgt_pop"] = df["trgt_pop"].map(trgt_pop_numttocat)
df["trgt_fam"] = df["trgt_fam"].map(trgt_fam_numttocat)
df["trgt_eld"] = df["trgt_eld"].map(trgt_eld_numttocat)
df["trgt_hml"] = df["trgt_hml"].map(trgt_hml_numttocat)
df["trgt_dis"] = df["trgt_dis"].map(trgt_hml_numttocat)
df["trgt_other"] = df["trgt_other"].map(trgt_other_numttocat)

In [None]:
df.to_csv("LIHTCPUB_cleaned.csv", index_label="ID")

In [None]:
df.head()

# Feature Engineering

In [None]:
# try:
#   df['ratio_of_li_units'] = df['li_units'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_li_units'] = 0

In [None]:

# try:
#   df['ratio_of_0br_units'] = df['n_0br'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_0br_units'] = 0

# try:
#   df['ratio_of_1br_units'] = df['n_1br'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_1br_units'] = 0

# try:
#   df['ratio_of_2br_units'] = df['n_2br'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_2br_units'] = 0

# try:
#   df['ratio_of_3br_units'] = df['n_3br'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_3br_units'] = 0

# try:
#   df['ratio_of_4br_units'] = df['n_4br'] / df['n_units'] 
# except ZeroDivisionError:
#   df['ratio_of_4br_units'] = 0

In [None]:
# df['log_of_n_units'] = np.log(df['n_units'])

# Exploratory Data Analysis

In [None]:
df.hist(bins=50, figsize=(20,15))
plt.show()

In [None]:
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
f, ax = plt.subplots(figsize=(15, 10))
g = sns.heatmap(corr, mask=mask, cmap="YlGnBu", center=0, square=True, linewidths=.5,
            cbar_kws={"shrink": 0.6}).set_title('Pairwise correlation')

In [None]:
df.n_units.hist(bins=50)

In [None]:
df.n_0br.hist(bins=20)

In [None]:
boxplot = df.boxplot(column=['n_units','n_0br','n_1br','n_2br','n_3br','n_4br'])

In [None]:
plt.figure(figsize=(10,6))
sns.lmplot(data=df, y='allocamt',x='n_units', hue="trgt_hml",col="record_stat")

plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.lmplot(data=df, y='allocamt',x='n_units', hue="trgt_pop",col="record_stat")

plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.lmplot(data=df, y='allocamt',x='n_units', hue="trgt_dis",col="record_stat")

plt.show()

In [None]:
# add multiple filtering criteria
ts_mask = df.where(df['yr_alloc'] < 3000)
ts_mask = ts_mask.where(df['yr_alloc'] > 1000)

In [None]:
plt.figure(figsize=(10,6))
sns.lineplot(data=ts_mask, x="yr_alloc", y="allocamt")

plt.show()

In [None]:
plt.figure(figsize=(10,6))
sns.lineplot(data=ts_mask, x="yr_alloc", y="n_units")

plt.show()

# Variable Selection and Identification

In [None]:
X = df.drop('allocamt', axis = 1)
y = df.loc[:,'allocamt']

In [None]:
X

In [None]:
y

In [None]:
from sklearn.preprocessing import LabelEncoder

cat_features = X.columns[X.dtypes == object]
# Label Encoding
for f in cat_features:
    lbl = LabelEncoder()
    lbl.fit(list(X[f].values))
    X[f] = lbl.transform(list(X[f].values))

In [None]:
X

# Splitting Training / Test Data

In [None]:
# from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# from scipy.misc import comb

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=40)

# Select and Train a Model

In [None]:
X_train.head()

In [None]:
y_train.head()

In [None]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

In [None]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X_train,y_train)

In [None]:
j = 100
some_sample = X.iloc[j,:]
some_sample = pd.DataFrame(some_sample).T
some_sample

In [None]:
lin_reg.predict(some_sample)

In [None]:
lin_reg.coef_

In [None]:
lin_reg.score(X_test,y_test)

In [None]:
predLinReg = lin_reg.predict(X_test)
predLinReg

In [None]:
predLinReg.shape

# Random Forest Regressor

In [None]:
# from sklearn.model_selection import train_test_split

# X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.33, random_state=1)
from sklearn.ensemble import RandomForestRegressor
# Instantiate model with 1200 decision trees
#nombre de feuille dans arbre   criterion:mean absolut error
RandoForest = RandomForestRegressor( random_state = 0,n_estimators=700,criterion='mse')

In [None]:
RandoForest.fit(X_train,y_train)

In [None]:
RandoForest.predict(some_sample)

In [None]:
RandoForest.score(X_test,y_test)

In [None]:
RandoForest.feature_importances_

In [None]:
 importances = pd.DataFrame({"feature": X_train.columns, "importance": RandoForest.feature_importances_})
 importances.sort_values("importance", ascending=False)[:10]

In [None]:
sns.barplot(data=importances.sort_values("importance", ascending=False).head(8), x="importance", y="feature")

In [None]:
predForest = RandoForest.predict(X_test)

In [None]:
lin_mse_forest = mean_squared_error(y_test, predForest)
lin_rmse_forest = np.sqrt(lin_mse_forest)
lin_rmse_forest

In [None]:
lin_mae_forest = mean_absolute_error(y_test, predForest)
lin_mae_forest

In [None]:
lin_mse_lin_reg = mean_squared_error(y_test, predLinReg) 
lin_rmse_lin_reg = np.sqrt(lin_mse_lin_reg)
lin_rmse_lin_reg

In [None]:
lin_mae_lin_reg = mean_absolute_error(y_test, predLinReg)
lin_mae_lin_reg