# <center>Bad Bank Behavior<br>Analyzing Bank Mortgage during the 2007 Housing Bubble</center>  

<center>Michael Siebel</center>
<center>August 2020</center>

<br>
    
## Table of Contents
- [Goals](#Goals)<br>
- [Load Packages](#Load-Packages)<br>
- [Set Up Functions](#Set-Up-Functions)<br>
- [Implement Data Cleanings](#Implement-Data-Cleanings)<br>
- [Analysis Functions](#Analysis-Functions)<br>
- [Imbalanced Prediction](#Imbalanced-Prediction)
- [Downsampling Prediction](#Downsampling-Prediction)<br>
- [Upsampling Prediction](#Upsampling-Prediction)<br>
- [Conclusion](#Conclusion)<br>

# Goals  
<br>

 

***

# Load Functions

In [1]:
%run Functions.ipynb
pd.set_option("display.max_columns", 999)

file_to_open = open('..\Data\df.pickle', 'rb') 
df  = pickle.load(file_to_open) 
file_to_open.close()

# Drop mergeID column
df = df.drop(labels='Loan ID', axis=1)

# Foreclosure Descriptive Statistics

In [2]:
print('Shape:\n', df.shape)
print('\nColumns:\n', df.columns)

Shape:
 (1240937, 54)

Columns:
 Index(['Origination Channel', 'Bank', 'Original Interest Rate',
       'Original Mortgage Amount', 'Original Loan Term', 'Original Date',
       'Original Combined Loan-to-Value (CLTV)', 'Single Borrower',
       'Original Debt to Income Ratio', 'First Time Home Buyer',
       'Loan Purpose', 'Property Type', 'Occupancy Type', 'Property State',
       'Zip Code', 'Mortgage Insurance %', 'Mortgage Insurance Type',
       'File Year', 'File Quarter', 'Foreclosed', 'Harmonized Credit Score',
       'Median Household Income', 'Month', 'Year', 'Region',
       'Household Financial Obligations (Qtr)',
       'Household Financial Obligations (Yr)',
       'Consumer Debt Service Payment (Qtr)',
       'Consumer Debt Service Payment (Yr)', 'National Home Price Index (Qtr)',
       'National Home Price Index (Yr)',
       'Mortgage Debt Service Payments (Qtr)',
       'Mortgage Debt Service Payments (Yr)', 'Monthly Supply of Houses (Qtr)',
       'Monthly Supply 

In [3]:
Foreclosed = Foreclosure_Data(df = df) # subset = "df['Property State']=='FL'", 
Foreclosed

Unnamed: 0_level_0,Foreclosed (%),Foreclosed (N),Mortgage Amount ($),Credit Score,Debt to Income Ratio,First Time Home Buyer (%),Refinanced,Interest Rate,Loan Term,Combined Loan-to-Value (CLTV),Single Borrower Ratio,Mortgage Insurance Ratio,Mortgage Insurance %,Estimated Household Income ($)
Foreclosed,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Not Forclosed,91.7,1138262,194520.86,725,37.7,10.4,61.0,6.36,333,72.0,0.45,0.16,3.64,48275.81
Forclosed,8.3,102675,196661.87,691,41.5,8.7,70.8,6.53,352,80.5,0.6,0.32,7.68,47866.37


# Bank Descriptive Statistics

In [4]:
# Number of loans per Bank
df['Bank'].value_counts()

BANK OF AMERICA, N.A.                        371423
OTHER                                        213433
CITIMORTGAGE, INC.                           135536
SMALL LOAN BANKS                             108886
JPMORGAN CHASE BANK, NATIONAL ASSOCIATION     88255
GMAC MORTGAGE                                 78842
PNC BANK, N.A.                                63726
SUNTRUST MORTGAGE INC.                        53548
AMTRUST BANK                                  38024
FLAGSTAR CAPITAL MARKETS CORPORATION          34789
FIRST TENNESSEE BANK NATIONAL ASSOCIATION     28718
CHASE HOME FINANCE                            15791
FDIC, RECEIVER, INDYMAC FEDERAL BANK FSB       9966
Name: Bank, dtype: int64

In [5]:
Banks = Bank_Data(df = df)
Banks

Unnamed: 0_level_0,Bank (%),Bank (N),Foreclosed (%),Mortgage Amount ($),Credit Score,Debt to Income Ratio,First Time Home Buyer (%),Refinance,Interest Rate,Loan Term,Combined Loan-to-Value (CLTV),Single Borrower Ratio,Mortgage Insurance Ratio,Mortgage Insurance %,Median Household Income ($)
Bank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
AMTRUST BANK,3.1,38024,8.2,180999.03,723,38.2,11.4,60.4,6.32,332,73.8,0.47,0.18,4.15,47245.41
"BANK OF AMERICA, N.A.",29.9,371423,10.0,191314.37,713,38.4,8.7,67.4,6.42,338,72.9,0.51,0.2,4.65,48611.76
CHASE HOME FINANCE,1.3,15791,8.0,192730.48,731,40.2,12.3,49.0,6.33,341,73.9,0.47,0.2,4.63,47641.78
"CITIMORTGAGE, INC.",10.9,135536,7.4,220207.2,724,35.2,12.7,59.9,6.29,333,71.6,0.44,0.14,3.13,49434.01
"FDIC, RECEIVER, INDYMAC FEDERAL BANK FSB",0.8,9966,16.7,221906.28,705,40.0,6.9,77.5,6.49,336,71.5,0.54,0.21,4.83,50589.8
FIRST TENNESSEE BANK NATIONAL ASSOCIATION,2.3,28718,6.7,190259.87,730,38.6,10.8,59.3,6.41,336,74.3,0.43,0.19,4.41,47820.49
FLAGSTAR CAPITAL MARKETS CORPORATION,2.8,34789,10.7,185156.23,717,41.1,11.7,57.7,6.48,342,73.9,0.5,0.22,5.07,47997.8
GMAC MORTGAGE,6.4,78842,9.0,200808.92,710,39.7,6.4,71.1,6.36,341,72.0,0.47,0.11,2.27,48854.87
"JPMORGAN CHASE BANK, NATIONAL ASSOCIATION",7.1,88255,6.7,204813.64,727,36.9,7.3,61.8,6.33,325,73.5,0.46,0.13,2.96,48279.9
OTHER,17.2,213433,5.8,172378.26,734,37.2,12.6,54.9,6.36,330,71.1,0.42,0.18,4.11,47165.9


In [6]:
# Banks represented
Banks[['Bank (%)', 'Bank (N)', 'Foreclosed (%)']]

Unnamed: 0_level_0,Bank (%),Bank (N),Foreclosed (%)
Bank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AMTRUST BANK,3.1,38024,8.2
"BANK OF AMERICA, N.A.",29.9,371423,10.0
CHASE HOME FINANCE,1.3,15791,8.0
"CITIMORTGAGE, INC.",10.9,135536,7.4
"FDIC, RECEIVER, INDYMAC FEDERAL BANK FSB",0.8,9966,16.7
FIRST TENNESSEE BANK NATIONAL ASSOCIATION,2.3,28718,6.7
FLAGSTAR CAPITAL MARKETS CORPORATION,2.8,34789,10.7
GMAC MORTGAGE,6.4,78842,9.0
"JPMORGAN CHASE BANK, NATIONAL ASSOCIATION",7.1,88255,6.7
OTHER,17.2,213433,5.8


In [7]:
def search_Banks(col, df = Banks, func = max, subset = True):
    print(col, func.__name__, "value")
    if (subset): cols = col
    else: cols = df.columns
    values = pd.DataFrame(df[cols][df[col] == func(df[col])])
    return values

search_Banks('Foreclosed (%)', func = max)

Foreclosed (%) max value


Unnamed: 0_level_0,Foreclosed (%)
Bank,Unnamed: 1_level_1
"FDIC, RECEIVER, INDYMAC FEDERAL BANK FSB",16.7


In [8]:
'''
month = [1,2,3]
for mnth in range(1, 13):

    yr = '2007'
    mnth = np.char.zfill(str(mnth), 2)
    print(str(mnth) + '/' + str(yr))
    Banks_mnth = Bank_Data(date1 = str(mnth) + '/' + str(yr))
    tbl = search_Banks('Foreclosed (%)', func = max, df = Banks_mnth[Banks_mnth['Bank (N)'] > 100])
    print(display(tbl[['Foreclosed (%)']]))
    print('')
'''

"\nmonth = [1,2,3]\nfor mnth in range(1, 13):\n\n    yr = '2007'\n    mnth = np.char.zfill(str(mnth), 2)\n    print(str(mnth) + '/' + str(yr))\n    Banks_mnth = Bank_Data(date1 = str(mnth) + '/' + str(yr))\n    tbl = search_Banks('Foreclosed (%)', func = max, df = Banks_mnth[Banks_mnth['Bank (N)'] > 100])\n    print(display(tbl[['Foreclosed (%)']]))\n    print('')\n"

In [9]:
'''
for yr in range(2004, 2007):
    print('Year', yr)
    Banks_yr = Bank_Data(date1 = '01/' + str(yr), date2 = '12/' + str(yr))
    tbl = search_Banks('Foreclosed (%)', func = max, subset = False, df = Banks_yr[Banks_yr['Bank (N)'] > 100])
    print(display(tbl[['Foreclosed (%)', 'Bank (%)', 'Bank (N)']]))
    print('')
'''

"\nfor yr in range(2004, 2007):\n    print('Year', yr)\n    Banks_yr = Bank_Data(date1 = '01/' + str(yr), date2 = '12/' + str(yr))\n    tbl = search_Banks('Foreclosed (%)', func = max, subset = False, df = Banks_yr[Banks_yr['Bank (N)'] > 100])\n    print(display(tbl[['Foreclosed (%)', 'Bank (%)', 'Bank (N)']]))\n    print('')\n"

In [10]:
'''
import pickle
file_to_store = open("..\Data\df.pickle", "wb")
pickle.dump(df, file_to_store)
file_to_store.close()
'''

'\nimport pickle\nfile_to_store = open("..\\Data\\df.pickle", "wb")\npickle.dump(df, file_to_store)\nfile_to_store.close()\n'

In [11]:
df_cat = df.select_dtypes(include=['object'])
df_cat

Unnamed: 0,Origination Channel,Bank,Original Date,First Time Home Buyer,Property Type,Occupancy Type,Property State,File Year,File Quarter,Month,Year,Region
0,R,OTHER,04/2000,N,SF,P,TX,2007,Q4,04,2000,South
1,R,OTHER,07/2000,N,SF,P,IL,2007,Q3,07,2000,Midwest
2,R,OTHER,08/2000,N,SF,P,TX,2007,Q3,08,2000,South
3,R,OTHER,09/2000,N,SF,S,WV,2007,Q3,09,2000,South
4,R,OTHER,09/2000,N,SF,P,TX,2007,Q3,09,2000,South
...,...,...,...,...,...,...,...,...,...,...,...,...
1240932,B,"BANK OF AMERICA, N.A.",12/2007,N,SF,P,ME,2007,Q4,12,2007,Northeast
1240933,C,"BANK OF AMERICA, N.A.",12/2007,Y,SF,P,NJ,2007,Q4,12,2007,Northeast
1240934,R,"BANK OF AMERICA, N.A.",12/2007,N,PU,P,FL,2007,Q4,12,2007,South
1240935,C,"CITIMORTGAGE, INC.",12/2007,N,PU,I,CA,2007,Q4,12,2007,West


In [12]:
# Variables to drop
dropvars = ['Original Date', 'File Year', 'File Quarter', 'Month', 'Region',
            'Zip Code', 'Mortgage Insurance Type']  # 'Property State', 

# All Data
All_X = df.drop(labels=dropvars, axis=1)
All_X = All_X[All_X.notnull()]
All_y = All_X['Foreclosed']
All_X = All_X.drop(labels='Foreclosed', axis=1) 

# split dataset
X_ignore, X_keep, y_ignore, y_keep = train_test_split(All_X, All_y, test_size = 0.2, 
                                                      stratify = All_y, random_state=2019)
X_train, X_test, y_train, y_test = train_test_split(X_keep, y_keep, test_size = 0.5, 
                                                    stratify = y_keep, random_state=2019)

# One hot encoding on remaining data
X_train = onehotencoding(X_train)
X_test = onehotencoding(X_test) 

# Drop onehotencoding minority values
X_train = X_train.drop(labels=['First Time Home Buyer_U'], axis=1) 
X_test = X_test.drop(labels=['First Time Home Buyer_U'], axis=1) 

# Save columns
X_cols = X_train.columns

# Relative Importance Dictionary
rel_imp = {}

print(X_cols)

Index(['Original Interest Rate', 'Original Mortgage Amount',
       'Original Loan Term', 'Single Borrower', 'Loan Purpose',
       'Mortgage Insurance %', 'Household Financial Obligations (Qtr)',
       'Household Financial Obligations (Yr)',
       'Consumer Debt Service Payment (Qtr)',
       'Consumer Debt Service Payment (Yr)',
       ...
       'asset (Qtr)', 'asset (Yr)', 'lnlsnet (Qtr)', 'lnlsnet (Yr)',
       'liab (Qtr)', 'liab (Yr)', 'dep (Qtr)', 'dep (Yr)', 'eqtot (Qtr)',
       'eqtot (Yr)'],
      dtype='object', length=124)


In [13]:
X_train.shape

(124094, 124)

In [14]:
# Missing
(X_train.isna().sum() / X_train.shape[0] * 100).round(2)

Original Interest Rate      0.00
Original Mortgage Amount    0.00
Original Loan Term          0.00
Single Borrower             0.00
Loan Purpose                0.00
                            ... 
liab (Yr)                   1.26
dep (Qtr)                   1.26
dep (Yr)                    1.26
eqtot (Qtr)                 1.26
eqtot (Yr)                  1.26
Length: 124, dtype: float64

In [15]:
# impute on mean or mode
X_train = missing_treat(X_train)

# Missing
X_train[['Household Financial Obligations (Qtr)', 'Household Financial Obligations (Yr)', 
         'Consumer Debt Service Payment (Qtr)', 'Consumer Debt Service Payment (Yr)',
         'National Home Price Index (Qtr)', 'National Home Price Index (Yr)',
         'Mortgage Debt Service Payments (Qtr)', 'Mortgage Debt Service Payments (Yr)',
         'Monthly Supply of Houses (Qtr)', 'Monthly Supply of Houses (Yr)',
         'Vacant Housing Units for Sale (Qtr)', 'Vacant Housing Units for Sale (Yr)',
         'Homeownership Rate (Qtr)', 'Homeownership Rate (Yr)', 'Vacant Housing Units for Rent (Qtr)',
         'Vacant Housing Units for Rent (Yr)', 'Rental Vacancy Rate (Qtr)', 'Rental Vacancy Rate (Yr)',
         'numemp', 'asset (Qtr)',  'asset (Yr)', 'lnlsnet (Qtr)', 'lnlsnet (Yr)', 'liab (Qtr)', 'liab (Yr)',
         'dep (Qtr)', 'dep (Yr)', 'eqtot (Qtr)', 'eqtot (Yr)']].isna().sum()

Household Financial Obligations (Qtr)    0
Household Financial Obligations (Yr)     0
Consumer Debt Service Payment (Qtr)      0
Consumer Debt Service Payment (Yr)       0
National Home Price Index (Qtr)          0
National Home Price Index (Yr)           0
Mortgage Debt Service Payments (Qtr)     0
Mortgage Debt Service Payments (Yr)      0
Monthly Supply of Houses (Qtr)           0
Monthly Supply of Houses (Yr)            0
Vacant Housing Units for Sale (Qtr)      0
Vacant Housing Units for Sale (Yr)       0
Homeownership Rate (Qtr)                 0
Homeownership Rate (Yr)                  0
Vacant Housing Units for Rent (Qtr)      0
Vacant Housing Units for Rent (Yr)       0
Rental Vacancy Rate (Qtr)                0
Rental Vacancy Rate (Yr)                 0
numemp                                   0
asset (Qtr)                              0
asset (Yr)                               0
lnlsnet (Qtr)                            0
lnlsnet (Yr)                             0
liab (Qtr) 

In [16]:
# Preview
X_train.tail()

Unnamed: 0,Original Interest Rate,Original Mortgage Amount,Original Loan Term,Single Borrower,Loan Purpose,Mortgage Insurance %,Household Financial Obligations (Qtr),Household Financial Obligations (Yr),Consumer Debt Service Payment (Qtr),Consumer Debt Service Payment (Yr),National Home Price Index (Qtr),National Home Price Index (Yr),Mortgage Debt Service Payments (Qtr),Mortgage Debt Service Payments (Yr),Monthly Supply of Houses (Qtr),Monthly Supply of Houses (Yr),Vacant Housing Units for Sale (Qtr),Homeownership Rate (Qtr),Homeownership Rate (Yr),Vacant Housing Units for Rent (Qtr),Rental Vacancy Rate (Qtr),Rental Vacancy Rate (Yr),Origination Channel_B,Origination Channel_C,Origination Channel_R,Bank_AMTRUST BANK,"Bank_BANK OF AMERICA, N.A.",Bank_CHASE HOME FINANCE,"Bank_CITIMORTGAGE, INC.","Bank_FDIC, RECEIVER, INDYMAC FEDERAL BANK FSB",Bank_FIRST TENNESSEE BANK NATIONAL ASSOCIATION,Bank_FLAGSTAR CAPITAL MARKETS CORPORATION,Bank_GMAC MORTGAGE,"Bank_JPMORGAN CHASE BANK, NATIONAL ASSOCIATION",Bank_OTHER,"Bank_PNC BANK, N.A.",Bank_SMALL LOAN BANKS,Bank_SUNTRUST MORTGAGE INC.,First Time Home Buyer_N,First Time Home Buyer_Y,Property Type_CO,Property Type_CP,Property Type_MH,Property Type_PU,Property Type_SF,Occupancy Type_I,Occupancy Type_P,Occupancy Type_S,Property State_AK,Property State_AL,Property State_AR,Property State_AZ,Property State_CA,Property State_CO,Property State_CT,Property State_DC,Property State_DE,Property State_FL,Property State_GA,Property State_HI,Property State_IA,Property State_ID,Property State_IL,Property State_IN,Property State_KS,Property State_KY,Property State_LA,Property State_MA,Property State_MD,Property State_ME,Property State_MI,Property State_MN,Property State_MO,Property State_MS,Property State_MT,Property State_NC,Property State_ND,Property State_NE,Property State_NH,Property State_NJ,Property State_NM,Property State_NV,Property State_NY,Property State_OH,Property State_OK,Property State_OR,Property State_PA,Property State_RI,Property State_SC,Property State_SD,Property State_TN,Property State_TX,Property State_UT,Property State_VA,Property State_VT,Property State_WA,Property State_WI,Property State_WV,Property State_WY,Year_2000,Year_2001,Year_2002,Year_2003,Year_2004,Year_2005,Year_2006,Year_2007,Original Combined Loan-to-Value (CLTV),Original Debt to Income Ratio,Harmonized Credit Score,Median Household Income,Vacant Housing Units for Sale (Yr),Vacant Housing Units for Rent (Yr),numemp,asset (Qtr),asset (Yr),lnlsnet (Qtr),lnlsnet (Yr),liab (Qtr),liab (Yr),dep (Qtr),dep (Yr),eqtot (Qtr),eqtot (Yr)
124089,6.375,283000,360,1.0,1,0.0,0.008024,0.023268,-0.002505,-0.030116,-0.006377,0.017328,0.014117,0.071238,-0.029851,0.326531,0.080336,0.002833,-0.004219,0.045872,0.042017,0.087719,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,86.0,53.0,654.0,44085.0,0.412226,0.122047,134.709826,0.433691,0.251945,0.226842,0.254634,0.456459,0.228677,0.345833,0.205984,0.239727,0.559997
124090,6.0,256000,360,0.0,0,0.0,0.00696,0.026456,0.005217,-0.007341,-0.004564,-0.013526,0.006373,0.047366,0.083333,0.258065,-0.077206,-0.00554,-0.009655,-0.071935,-0.082645,-0.119048,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,80.0,60.0,723.5,43465.072464,0.148741,-0.11068,128.508097,0.339843,-0.012415,0.259071,0.05495,0.347836,0.002519,0.270256,-0.038111,0.289888,-0.147162
124091,6.625,170000,360,0.0,1,0.0,0.002016,0.026231,-0.002524,-0.029667,-0.008045,0.010446,0.00384,0.069516,0.074627,0.358491,0.074362,-0.002825,0.002841,0.066082,0.056452,0.201835,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,78.0,34.0,667.5,48669.166667,0.446936,0.22349,2.327719,0.165277,-0.614725,-0.04347,-0.565303,0.181836,-0.607025,-0.079467,-0.690119,0.031283,-0.698489
124092,6.625,176000,360,0.0,1,0.0,0.00696,0.026456,0.005217,-0.007341,-0.006003,-0.00826,0.006373,0.047366,0.138462,0.174603,-0.044421,-0.009915,-0.007102,-0.104224,-0.122137,0.036036,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,80.0,53.0,665.75,43466.518201,0.20915,0.075049,128.508097,0.339843,0.196385,0.259071,0.275149,0.347836,0.18111,0.270256,0.190128,0.289888,0.397132
124093,6.25,190000,360,1.0,1,0.0,0.00696,0.026456,0.005217,-0.007341,-0.004564,-0.013526,0.006373,0.047366,0.083333,0.258065,-0.044421,-0.009915,-0.007102,-0.104224,-0.122137,0.036036,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,80.0,32.0,798.0,44090.105263,0.20915,0.075049,2.082841,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
def relative_importance(X_train, y_train, bank_str, bal=True):
    # Transform X
    ## define datasets 
    y = y_train  
    X = X_train
    readd = X.loc[:, str('Bank_' + bank_str)]
    X = X.filter(regex=r'^(?!Bank_).*$')
    X.loc[:, str('Bank_' + bank_str)] = readd
    
    ## Add interaction terms
    X = Bank_Interactions(X, bank_str = bank_str)
    
    ## Standardize Vars
    X_cols = X.columns
    scaler = StandardScaler().fit(X)
    X = scaler.transform(X)
    
    # Permutation importance for feature evaluation
    if bal:
        clf = BalancedRandomForestClassifier(n_estimators=50, random_state=2020, max_features=1, 
                                             replacement=False, n_jobs=-1)
    else:
        clf = RandomForestClassifier(n_estimators=50, random_state=2020, max_features=1, n_jobs=-1)
    
    clf = clf.fit(X, y)
    result = permutation_importance(clf, X, y, n_repeats=10,
                                    random_state=2020)
    importances = pd.Series(result.importances_mean, index=X_cols)
    return(importances)

In [None]:
# Relative importance for balanced data
rel_imp_bal = {}
for bank_str in np.unique(df['Bank']):
    rel_imp_bal[bank_str] = relative_importance(X_train, y_train, bank_str, bal=True)
    print(rel_imp_bal[bank_str])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.006535
Original Mortgage Amount                        0.004167
Original Loan Term                             -0.000947
Single Borrower                                 0.008480
Loan Purpose                                    0.006124
                                                  ...   
Original Interest Rate [Int]                   -0.000553
Original Combined Loan-to-Value (CLTV) [Int]   -0.001783
Original Debt to Income Ratio [Int]            -0.000635
Mortgage Insurance % [Int]                     -0.000193
Median Household Income [Int]                  -0.000197
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.004910
Original Mortgage Amount                        0.004309
Original Loan Term                             -0.001176
Single Borrower                                 0.006986
Loan Purpose                                    0.006189
                                                  ...   
Original Interest Rate [Int]                   -0.006641
Original Combined Loan-to-Value (CLTV) [Int]   -0.017834
Original Debt to Income Ratio [Int]            -0.010179
Mortgage Insurance % [Int]                     -0.001727
Median Household Income [Int]                  -0.001899
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.004730
Original Mortgage Amount                        0.004535
Original Loan Term                             -0.001768
Single Borrower                                 0.007787
Loan Purpose                                    0.006187
                                                  ...   
Original Interest Rate [Int]                   -0.000264
Original Combined Loan-to-Value (CLTV) [Int]   -0.000671
Original Debt to Income Ratio [Int]            -0.000132
Mortgage Insurance % [Int]                     -0.000047
Median Household Income [Int]                  -0.000048
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.004559
Original Mortgage Amount                        0.004884
Original Loan Term                             -0.001983
Single Borrower                                 0.007407
Loan Purpose                                    0.006899
                                                  ...   
Original Interest Rate [Int]                   -0.001882
Original Combined Loan-to-Value (CLTV) [Int]   -0.004256
Original Debt to Income Ratio [Int]            -0.001386
Mortgage Insurance % [Int]                     -0.000270
Median Household Income [Int]                  -0.000114
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.006678
Original Mortgage Amount                        0.005007
Original Loan Term                             -0.001184
Single Borrower                                 0.008639
Loan Purpose                                    0.007233
                                                  ...   
Original Interest Rate [Int]                   -0.000494
Original Combined Loan-to-Value (CLTV) [Int]   -0.000449
Original Debt to Income Ratio [Int]            -0.000236
Mortgage Insurance % [Int]                     -0.000027
Median Household Income [Int]                  -0.000011
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Original Interest Rate                          0.007643
Original Mortgage Amount                        0.005061
Original Loan Term                             -0.000081
Single Borrower                                 0.008682
Loan Purpose                                    0.006509
                                                  ...   
Original Interest Rate [Int]                   -0.000167
Original Combined Loan-to-Value (CLTV) [Int]   -0.000336
Original Debt to Income Ratio [Int]            -0.000196
Mortgage Insurance % [Int]                     -0.000052
Median Household Income [Int]                  -0.000066
Length: 118, dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [None]:
# Relative importance for balanced data
rel_imp_unbal = {}
for bank_str in np.unique(df['Bank']):
    rel_imp_unbal[bank_str] = relative_importance(X_train, y_train, bank_str, bal=False)
    print(rel_imp_unbal[bank_str])