# <center>Bad Bank Behavior<br>Analyzing Bank Mortgage during the 2007 Housing Bubble</center>  

<center>Michael Siebel</center>
<center>August 2020</center>

<br>
    
## Table of Contents
- [Goals](#Goals)<br>
- [Load Packages](#Load-Packages)<br>
- [Set Up Functions](#Set-Up-Functions)<br>
- [Implement Data Cleanings](#Implement-Data-Cleanings)<br>
- [Analysis Functions](#Analysis-Functions)<br>
- [Imbalanced Prediction](#Imbalanced-Prediction)
- [Downsampling Prediction](#Downsampling-Prediction)<br>
- [Upsampling Prediction](#Upsampling-Prediction)<br>
- [Conclusion](#Conclusion)<br>

# Goals  
<br>

 

***

# Load Functions

In [1]:
# Load functions
%run Functions.ipynb
pd.set_option("display.max_columns", 200)
pd.set_option('display.max_rows', 200)

# Load data
file_to_open = open('..\Data\Pickle\df.pkl', 'rb') 
df  = pickle.load(file_to_open) 
file_to_open.close()

# Drop mergeID column
df = df.drop(labels='Loan ID', axis=1)

# Convert Inf values to NA
df = df.replace([np.inf, -np.inf], np.nan)

Using TensorFlow backend.


In [2]:
## Bank and Classifier Lists
banks = ['Bank of America','Wells Fargo Bank','CitiMortgage',
         'JPMorgan Chase','GMAC Mortgage','SunTrust Mortgage',
         'AmTrust Bank','PNC Bank','Flagstar Bank']

banks_plus = banks + ['All Banks']
clfs_str = ['RFC', 'RFC PCA', 'RUS Boost'] 

## Create an environment variable to avoid using the GPU. This can be changed.
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

***

# Modeling

In [3]:
# Drop "OTHER"  and "SMALL LOAN BANKS" Categories
df = df[df['Bank'] != 'Other']
df['Bank'].value_counts()

Bank of America      650087
CitiMortgage         260698
Wells Fargo Bank     214039
JPMorgan Chase       202997
GMAC Mortgage        178160
SunTrust Mortgage    141398
PNC Bank             100351
AmTrust Bank          79360
Flagstar Bank         66637
Name: Bank, dtype: int64

In [4]:
# Verify Bank Counts
df['Bank'].value_counts()

Bank of America      650087
CitiMortgage         260698
Wells Fargo Bank     214039
JPMorgan Chase       202997
GMAC Mortgage        178160
SunTrust Mortgage    141398
PNC Bank             100351
AmTrust Bank          79360
Flagstar Bank         66637
Name: Bank, dtype: int64

In [5]:
# Variables to drop
dropvars = ['File Year', 'Year', 'Month', 'Region', 'FIPS',
            'Zip Code', 'Mortgage Insurance Type', 'Property State',
            'First Payment', 'Original Loan-to-Value (LTV)']
df = df.drop(labels=dropvars, axis=1)
df = df.filter(regex=r'^(?!Asset).*$')
df = df.filter(regex=r'^(?!Liab).*$')
df = df.filter(regex=r'^(?!Eqtot).*$')
df = df.filter(regex=r'^(?!Dep).*$')

# Convert Original Date to Numeric
df['Reported Period'] = df['Reported Period'].astype(float).astype(int).astype(str)
df['Reported Period'] = df['Reported Period'].apply(lambda x: x.zfill(6))
df['Reported Period'] = df['Reported Period'].map(lambda x: x[:2] + '/' + x[2:])
df = change_date(df, 'Reported Period')
df = change_date(df, 'Original Date')

# Missingness to drop
df = df.dropna()

# All data
y_all = df['Foreclosed']
X_all = df.drop(labels=['Foreclosed', 'Zero Balance Code'], axis=1) 

# Split Train (70%)
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size = 0.7, 
                                                    stratify = y_all, random_state=2019)
# Split Val (15%) and Test (15%)
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size = 0.5, 
                                                stratify = y_test, random_state=2019)

# One hot encoding on remaining data
Bnk_train = X_train['Bank'].reset_index().iloc[:,1]
X_train = onehotencoding(X_train)
Bnk_val = X_val['Bank'].reset_index().iloc[:,1]
X_val = onehotencoding(X_val)
Bnk_test = X_test['Bank'].reset_index().iloc[:,1]
X_test = onehotencoding(X_test)

In [6]:
# Update Macroeconomic variables (will not use test set)
X_train, X_val, X_test = pca_fred(X_train, X_val, X_test)

# Check columns
X_train.columns

Index(['Reported Period', 'Original Interest Rate', 'Original Mortgage Amount',
       'Original Loan Term', 'Original Date',
       'Original Combined Loan-to-Value (CLTV)', 'Single Borrower',
       'Original Debt to Income Ratio', 'Loan Purpose', 'Number of Units',
       'Mortgage Insurance %', 'Credit Score', 'Loan Change (1 Year)',
       'Loan Change (5 Years)', 'Median Household Income',
       'Number of Employees', 'Lnlsnet (5 Yr)', 'Lnlsnet (1 Yr)',
       'Origination Channel_B', 'Origination Channel_C',
       'Origination Channel_R', 'Bank_AmTrust Bank', 'Bank_Bank of America',
       'Bank_CitiMortgage', 'Bank_Flagstar Bank', 'Bank_GMAC Mortgage',
       'Bank_JPMorgan Chase', 'Bank_PNC Bank', 'Bank_SunTrust Mortgage',
       'Bank_Wells Fargo Bank', 'First Time Home Buyer_N',
       'First Time Home Buyer_Y', 'Property Type_CO', 'Property Type_CP',
       'Property Type_MH', 'Property Type_PU', 'Property Type_SF',
       'Occupancy Type_I', 'Occupancy Type_P', 'Occupanc

In [7]:
# List of banks
banks = ['Bank of America','Wells Fargo Bank','CitiMortgage',
         'JPMorgan Chase','GMAC Mortgage','SunTrust Mortgage',
         'AmTrust Bank','PNC Bank','Flagstar Bank']

# Run Function
Banks_X, Banks_y = Bank_Subsets(banks, df_X = X_train, df_y = y_train)
Banks_X_val, Banks_y_val = Bank_Subsets(banks, df_X = X_val, df_y = y_val)
Banks_X_test, Banks_y_test = Bank_Subsets(banks, df_X = X_test, df_y = y_test)
X_train = X_train.filter(regex=r'^(?!Bank).*$')
X_val = X_val.filter(regex=r'^(?!Bank).*$')
X_test = X_test.filter(regex=r'^(?!Bank).*$')

# All Banks
Banks_y['All Banks'] = y_train
Banks_X['All Banks'] = X_train
Banks_y_val['All Banks'] = y_val
Banks_X_val['All Banks'] = X_val
Banks_y_test['All Banks'] = y_test
Banks_X_test['All Banks'] = X_test

print('Shape:', X_train.shape)

Shape: (483564, 42)


***

# Load

In [8]:
file_to_open = open('..\Data\Pickle\models.pkl', 'rb') 
vote_models = pickle.load(file_to_open) 
file_to_open.close()

file_to_open = open('..\Data\Pickle\model_thresholds.pkl', 'rb') 
vote_thresholds = pickle.load(file_to_open) 
file_to_open.close()

***

# Predictions

In [9]:
# Combine Train, Validation, and Testing Data
X = pd.concat([X_train, X_val, X_test], axis=0).reset_index().iloc[:,1:]
y = pd.concat([y_train, y_val, y_test], axis=0).reset_index().iloc[:,1]
bank_idx = pd.concat([Bnk_train, Bnk_val, Bnk_test], axis=0).reset_index().iloc[:,1]

# Initiate Dictionary
better = {}
better_value = {}
best = {}
best_value = {}

In [10]:
# Credit Score
print('Credit Score Distribution')
print(X['Credit Score'].describe().round(0))
print('')
better['Credit Score'], \
better_value['Credit Score'] = changing_assumptions(
    'Credit Score', 75, 
    banks, bank_idx, X,
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Credit Score Distribution
count    1611881.0
mean         719.0
std           59.0
min          330.0
25%          675.0
50%          724.0
75%          770.0
max          850.0
Name: Credit Score, dtype: float64

Converting Credit Score to the 75 percentile: 770.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 1.9 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 1.2 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 1.0 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 1.6 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 1.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 1.6 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 2.8 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 3.1 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 2.1 %

All Banks
Original Foreclosures 9.7 %
Predicted Foreclosures 1.2 %


In [11]:
# Debt-to-Income
print('Debt-to-Income Distribution')
print(X['Original Debt to Income Ratio'].describe().round(0))
print('')
better['Original Debt to Income Ratio'], \
better_value['Original Debt to Income Ratio'] = changing_assumptions('Original Debt to Income Ratio', 25, 
                                  banks, bank_idx, X, 
                                  vote_models, vote_thresholds, 
                                  Banks_X, Banks_X_val, Banks_X_test,
                                  Banks_y, Banks_y_val, Banks_y_test)

Debt-to-Income Distribution
count    1611881.0
mean          38.0
std           12.0
min            0.0
25%           29.0
50%           38.0
75%           47.0
max           64.0
Name: Original Debt to Income Ratio, dtype: float64

Converting Original Debt to Income Ratio to the 25 percentile: 29.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 8.9 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.5 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.4 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 5.5 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 5.0 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 5.4 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 5.5 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 5.2 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.3 %

All Banks
Original Foreclosures 9.7 %
Predic

In [12]:
# Loan to Value
print('Loan-to-Value Distribution')
print(X['Original Combined Loan-to-Value (CLTV)'].describe().round(0))
print('')
better['Original Combined Loan-to-Value (CLTV)'], \
better_value['Original Combined Loan-to-Value (CLTV)'] = changing_assumptions(
    'Original Combined Loan-to-Value (CLTV)', 25, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Loan-to-Value Distribution
count    1611881.0
mean          72.0
std           18.0
min            1.0
25%           63.0
50%           78.0
75%           84.0
max          154.0
Name: Original Combined Loan-to-Value (CLTV), dtype: float64

Converting Original Combined Loan-to-Value (CLTV) to the 25 percentile: 63.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 7.3 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.2 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 3.8 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 3.4 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 5.4 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 3.4 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 4.3 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 5.0 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.9 %

All Banks
Original Foreclos

In [13]:
# Median Household Income
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
better['Median Household Income'], \
better_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 75, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 75 percentile: 53615.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 14.4 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 8.4 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 8.9 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 8.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 10.3 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 8.9 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 8.7 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 11.3 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 11.5 %

All Banks
Original Foreclosures 

In [14]:
# Median Household Income
print('Median Household Income Distribution')
print(X['Median Household Income'].describe().round(2))
print('')
best['Median Household Income'], \
best_value['Median Household Income'] = changing_assumptions(
    'Median Household Income', 100, 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Median Household Income Distribution
count    1611881.00
mean       48740.22
std         8623.32
min        28831.94
25%        43298.12
50%        46017.38
75%        53615.10
max       101651.30
Name: Median Household Income, dtype: float64

Converting Median Household Income to the 100 percentile: 101651.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 6.5 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 4.7 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 3.5 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 3.4 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 3.2 %

SunTrust Mortgage
Original Foreclosures 10.3 %
Predicted Foreclosures 4.3 %

AmTrust Bank
Original Foreclosures 9.3 %
Predicted Foreclosures 3.2 %

PNC Bank
Original Foreclosures 8.8 %
Predicted Foreclosures 3.3 %

Flagstar Bank
Original Foreclosures 11.7 %
Predicted Foreclosures 5.1 %

All Banks
Original Foreclosures 9.

In [15]:
# Loan Change
print('Loan Change (1 Year) Distribution')
print(X['Loan Change (1 Year)'].describe().round(2))
print('')
print('Loan Change (5 Years) Distribution')
print(X['Loan Change (5 Years)'].describe().round(2))
print('')
better['Loan Change'], \
better_value['Loan Change']= changing_assumptions(
    ['Loan Change (1 Year)', 'Loan Change (5 Years)'], [25, 25], 
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Loan Change (1 Year) Distribution
count    1611881.00
mean       15266.28
std        24070.98
min      -450000.00
25%         2291.87
50%        14106.36
75%        27660.71
max       394000.00
Name: Loan Change (1 Year), dtype: float64

Loan Change (5 Years) Distribution
count    1611881.00
mean       62087.90
std        35183.64
min      -244333.33
25%        37980.10
50%        61000.00
75%        86357.61
max       522500.00
Name: Loan Change (5 Years), dtype: float64

Converting Loan Change (1 Year) to the 25 percentile: 2292.0

Converting Loan Change (5 Years) to the 25 percentile: 37980.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 11.1 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 3.6 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.4 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 4.2 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 7.0 %

SunTrust Mortgage
Original Fo

In [16]:
# Bank Loan Liabilities
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Lnlsnet (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Lnlsnet (5 Yr)'].describe().round(2))
print('')
better['Lnlsnet'], \
better_value['Lnlsnet'] = changing_assumptions(
    ['Lnlsnet (1 Yr)', 'Lnlsnet (5 Yr)'], [25, 25],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Lnlsnet (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Lnlsnet (5 Yr), dtype: float64

Converting Lnlsnet (1 Yr) to the 25 percentile: 1.0

Converting Lnlsnet (5 Yr) to the 25 percentile: 1.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 13.2 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 6.7 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 7.6 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 7.7 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 9.0 %

SunTrust Mortgage
Original Foreclosures 10

In [17]:
# Bank Loan Liabilities
print('Bank Loan Liabilities (1 Year) Distribution')
print(X['Lnlsnet (1 Yr)'].describe().round(2))
print('')
print('Bank Loan Liabilities (5 Years) Distribution')
print(X['Lnlsnet (5 Yr)'].describe().round(2))
print('')
best['Lnlsnet'], \
best_value['Lnlsnet'] = changing_assumptions(
    ['Lnlsnet (1 Yr)', 'Lnlsnet (5 Yr)'], [100, 100],
    banks, bank_idx, X, 
    vote_models, vote_thresholds, 
    Banks_X, Banks_X_val, Banks_X_test,
    Banks_y, Banks_y_val, Banks_y_test
)

Bank Loan Liabilities (1 Year) Distribution
count    1611881.00
mean         169.96
std         1003.32
min           -0.98
25%            0.96
50%            1.05
75%            2.19
max        10104.83
Name: Lnlsnet (1 Yr), dtype: float64

Bank Loan Liabilities (5 Years) Distribution
count    1611881.00
mean         310.14
std         1375.71
min           -0.99
25%            0.97
50%            1.03
75%            2.64
max        16357.85
Name: Lnlsnet (5 Yr), dtype: float64

Converting Lnlsnet (1 Yr) to the 0 percentile: -1.0

Converting Lnlsnet (5 Yr) to the 0 percentile: -1.0

Bank of America
Original Foreclosures 11.6 %
Predicted Foreclosures 10.6 %

Wells Fargo Bank
Original Foreclosures 7.6 %
Predicted Foreclosures 5.9 %

CitiMortgage
Original Foreclosures 7.7 %
Predicted Foreclosures 5.5 %

JPMorgan Chase
Original Foreclosures 7.6 %
Predicted Foreclosures 6.6 %

GMAC Mortgage
Original Foreclosures 9.7 %
Predicted Foreclosures 13.3 %

SunTrust Mortgage
Original Foreclosures 1

***

In [18]:
# Predictions on full data
data = [better, better_value, best, best_value]
with open("..\Data\Pickle\pred_votes_better.pkl", "wb") as f:
    pickle.dump(data, f)

***