<img style="float: left;" src="../images/fanniemae.png">
<br><br><br><br><br><br>
______

# Mortgage Loan Default Classifier
____________
____________

## Problem Statement:
_____________
Fannie Mae, or more specifically the Federal National Mortgage Association (FNMA), is a government sponsored entity whose primary goal is to raise home ownership and affordable housing levels.  Fannie Mae attempts to accomplish this in essence by purchasing mortgage loans within certain parameters from mortgage lenders.  In turn, mortgage lenders are provided cash flow to issue additional mortgages.<br>

The cause of the Financial Crisis of 2008 can in part be drawn back to the purchase of mortgage loans with an actual probability of default that were higher than assumed.  By creating a classification model that will predict whether a mortgage loan will default based on pre-purchase characteristics, Fannie Mae may better avoid high risk mortgage loans.  The model will be evaluated based on Accuracy and False Negative Rate.  In this particular case, the "positive" class will be loans that default therefore, we will seek to minimize the False Negative Rate while maximizing Accuracy.

## Load Datasets and Basic Clean
_______
Fannie Mae (hyperlink) provides 2 options in regards to the data available for download; the entire dataset or subsets per quarter of loan acquisition.  The entire dataset encompasses approximately 26 GB of data in comparison to approximately 200 - 500 MB for quarter subsets.  The quarter subsets on average contain greater than 200,000 observations which will be sufficient for the scope of this project as the projects current scope seeks predict loan default using the acquisition data from the previous year.

In summary, this workbook will load the raw data files, reduce data to essential features, then merge to a single dataset.  The Acquisition file contains static loan characteristics for each loan acquired in the given quarter.  In contrast, the Performance file contains the monthly performance such as payment history, loan balance, and final disposition through the entire life cycle of each loan acquired in the associated quarter.  Since the intent of the model is to predict loan default before acquisition by Fannie Mae, the model will concentrate on the features from the Acquisition file while only extracting the loan disposition from the Performance file.

### Import packages

In [1]:
# import necessary packages
import pandas as pd
import numpy as np

### Load data files

In [2]:
# define function to load data set
def load_dataset(year=2011, quarter=1):

    # load acquisition data dictionary csv for column names
    acq_data_dict_fp = '../data/acquisition_data_dict_summary.csv'
    list_acq_col_names = pd.read_csv(acq_data_dict_fp)['Field Name'].tolist()
    
    # load acquisition data dictionary csv for column names
    perf_data_dict_fp = '../data/performance_data_dict_summary.csv'
    list_perf_col_names = pd.read_csv(perf_data_dict_fp)['Field Name'].tolist()

    # load acquisition csv
    ## string for fp needs variable
    acq_data_fp = f'../data/{year}Q{quarter}/Acquisition_{year}Q{quarter}.txt'
    df_acq = pd.read_csv(acq_data_fp, sep='|', header=None, names=list_acq_col_names)
    
    # load performance csv
    perf_data_fp = f'../data/{year}Q{quarter}/Performance_{year}Q{quarter}.txt'
    df_perf = pd.read_csv(perf_data_fp, sep='|', header=None, names=list_perf_col_names)
    
    return df_acq, df_perf, year, quarter

In [4]:
# assign datasets and identifiers
df_acq, df_perf, year, quarter = load_dataset(2011, 1)

  if (yield from self.run_code(code, result)):


__Insight:__<br>
The raw data are in pipe separated, text file format without headers included.  Headers have been extracted from the File Layout pdfs from Fannie Mae.  For this project, I have decided to use data from the Q1 of 2011 to construct the loan default classifier.  The core reasoning for this decision is based on the assumption that loans originated prior the Financial Crisis of 2008 were most likely affected by factors outside the scope of the data being used.  Additionally, lax regulation which can be argued to be a key factor in the Financial Crisis may also have had influence on the thoroughness in regards to data collection.

### Explore Acquisition File

In [5]:
df_acq.head()

Unnamed: 0,LOAN IDENTIFIER,ORIGINATION CHANNEL,SELLER NAME,ORIGINAL INTEREST RATE,ORIGINAL UPB,ORIGINAL LOAN TERM,ORIGINATION DATE,FIRST PAYMENT DATE,ORIGINAL LOAN-TO-VALUE (LTV),ORIGINAL COMBINED LOAN-TO-VALUE (CLTV),...,PROPERTY TYPE,NUMBER OF UNITS,OCCUPANCY TYPE,PROPERTY STATE,ZIP CODE SHORT,PRIMARY MORTGAGE INSURANCE PERCENT,PRODUCT TYPE,CO-BORROWER CREDIT SCORE AT ORIGINATION,MORTGAGE INSURANCE TYPE,RELOCATION MORTGAGE INDICATOR
0,100000841305,C,"CITIMORTGAGE, INC.",4.125,124000,360,12/2010,02/2011,79,79.0,...,SF,1,P,TX,750,,FRM,,,N
1,100001889356,R,OTHER,4.625,115000,240,01/2011,03/2011,68,68.0,...,SF,1,P,IL,613,,FRM,,,N
2,100006453372,C,"BANK OF AMERICA, N.A.",4.375,175000,360,01/2011,03/2011,52,52.0,...,PU,1,S,AZ,859,,FRM,791.0,,N
3,100010656545,C,"BANK OF AMERICA, N.A.",4.375,365000,360,12/2010,02/2011,59,59.0,...,PU,1,P,IL,600,,FRM,812.0,,N
4,100010758624,R,"CITIMORTGAGE, INC.",3.875,69000,120,02/2011,04/2011,28,28.0,...,SF,1,P,SC,292,,FRM,,,N


__Insight:__<br>
The Aquisition file contains all of the features that will be used to train the loan default classification model.  In summary, there are 25 total features;<br>

- 1 unique loan identifier 
- 8 continuous
- 3 discrete
- 11 categorical
- 2 binary

Complete Data Description (hyperlink)

In [6]:
df_acq.describe()

Unnamed: 0,LOAN IDENTIFIER,ORIGINAL INTEREST RATE,ORIGINAL UPB,ORIGINAL LOAN TERM,ORIGINAL LOAN-TO-VALUE (LTV),ORIGINAL COMBINED LOAN-TO-VALUE (CLTV),NUMBER OF BORROWERS,ORIGINAL DEBT TO INCOME RATIO,BORROWER CREDIT SCORE AT ORIGINATION,NUMBER OF UNITS,ZIP CODE SHORT,PRIMARY MORTGAGE INSURANCE PERCENT,CO-BORROWER CREDIT SCORE AT ORIGINATION,MORTGAGE INSURANCE TYPE
count,505196.0,505196.0,505196.0,505196.0,505196.0,505169.0,505155.0,504966.0,504802.0,505196.0,505196.0,41004.0,308064.0,41004.0
mean,550317200000.0,4.403725,218270.0,286.670756,66.144294,67.608428,1.614987,31.345819,767.303063,1.031938,535.138287,22.839918,774.804703,1.053922
std,259631600000.0,0.489727,134879.2,90.946002,17.321929,17.242392,0.498424,9.805506,39.266386,0.239936,313.622139,7.686777,35.233759,0.225866
min,100000800000.0,2.625,10000.0,60.0,1.0,1.0,1.0,1.0,467.0,1.0,0.0,6.0,500.0,1.0
25%,325636500000.0,4.0,118000.0,180.0,55.0,57.0,1.0,24.0,746.0,1.0,239.0,12.0,758.0,1.0
50%,549972400000.0,4.375,183000.0,360.0,71.0,73.0,2.0,32.0,778.0,1.0,546.0,25.0,785.0,1.0
75%,774782000000.0,4.75,288000.0,360.0,80.0,80.0,2.0,39.0,797.0,1.0,840.0,30.0,801.0,1.0
max,999998200000.0,7.0,1181000.0,360.0,97.0,133.0,6.0,64.0,850.0,4.0,999.0,44.0,840.0,2.0


__Insight:__
A few of the numerical categories appear within industry standard practice such as interest rate and loan term.  However, many categories reveal some interesting minimums and maximums.  

*LTV and CLTV (Loan to Value Ratio and Combined Loan to Value Ratio):*
While not out of the realm of possibility, a LTV (and CLTV) of 1% is highly unlikely.  For instance, an 1% LTV on a home valued at $500,000 would equate to a loan of \$5,000 which is not a normal financing practice.

*DTI (Debt to Income Ratio):*
The standard maximum DTI within Fannie Mae guidelines is 43%.  There are scenarios where that threshold can be surpassed however, that allowance requires compensating factors that tend to be difficult to meet.  Practically speaking, a borrower's ability to repay a loan becomes highly questionable once passed this threshold.

*Borrower and Coborrower Credit Score*
Similar to LTV and CLTV, there is a standard limit in Fannie Mae guidelines.  The standard minimum for a credit score is 620*.  Again, surpassing this threshold requires compensating factors that tend to be difficult to meet.  

*FHA loans are a common subset that require a minimum credit score of 580 but only for borrowers that meet certain requirements mainly first time home buyers


In [7]:
df_acq.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 505196 entries, 0 to 505195
Data columns (total 25 columns):
LOAN IDENTIFIER                            505196 non-null int64
ORIGINATION CHANNEL                        505196 non-null object
SELLER NAME                                505196 non-null object
ORIGINAL INTEREST RATE                     505196 non-null float64
ORIGINAL UPB                               505196 non-null int64
ORIGINAL LOAN TERM                         505196 non-null int64
ORIGINATION DATE                           505196 non-null object
FIRST PAYMENT DATE                         505196 non-null object
ORIGINAL LOAN-TO-VALUE (LTV)               505196 non-null int64
ORIGINAL COMBINED LOAN-TO-VALUE (CLTV)     505169 non-null float64
NUMBER OF BORROWERS                        505155 non-null float64
ORIGINAL DEBT TO INCOME RATIO              504966 non-null float64
BORROWER CREDIT SCORE AT ORIGINATION       504802 non-null float64
FIRST TIME HOME BUYER INDICATO

__Insight:__<br>
All feature data types were inferred correctly.  Additionally, there are 505,196 observations to start with which will provide flexibility in null value handling.

In [11]:
df_acq.isna().sum()

LOAN IDENTIFIER                                 0
ORIGINATION CHANNEL                             0
SELLER NAME                                     0
ORIGINAL INTEREST RATE                          0
ORIGINAL UPB                                    0
ORIGINAL LOAN TERM                              0
ORIGINATION DATE                                0
FIRST PAYMENT DATE                              0
ORIGINAL LOAN-TO-VALUE (LTV)                    0
ORIGINAL COMBINED LOAN-TO-VALUE (CLTV)         27
NUMBER OF BORROWERS                            41
ORIGINAL DEBT TO INCOME RATIO                 230
BORROWER CREDIT SCORE AT ORIGINATION          394
FIRST TIME HOME BUYER INDICATOR                 0
LOAN PURPOSE                                    0
PROPERTY TYPE                                   0
NUMBER OF UNITS                                 0
OCCUPANCY TYPE                                  0
PROPERTY STATE                                  0
ZIP CODE SHORT                                  0


__Insight:__<br>
There are a few features with nulls that will need to be addressed.  The nulls in CLTV, number of borrowers, DTI, and borrower credit score features refer to missing data.  I have chosen ignore these observations.  The loss of data has been considered however, these observations account for less than 0.15% of the dataset.  

The nulls in PRIMARY MORTGAGE INSURANCE PERCENT and MORTGAGE INSURANCE TYPE represent the absence of mortgage insurance or a co-borrower credit score.  Since mortgage insurance percentage tends to follow LTV when over 80% and mortgage insurance type influences the interest rate, these features variance can be captured in part in other characteristics.  I have chosen then to replace these two features with a new binary features simply indicating whether MI exists.  

Lastly, the nulls in CO-BORROWER CREDIT SCORE AT ORIGINATION represent the absence of a coborrower.  It is standard practice when there are multiple borrowers to use the minimum score between the borrowers.  Since this is the case, borrower and coborrower credit score features will be replaced by single feature that takes the minimum between the two thus also circumventing the need for null imputation as well.

### Explore Performance File

In [12]:
df_perf.head()

Unnamed: 0,LOAN IDENTIFIER,MONTHLY REPORTING PERIOD,SERVICER NAME,CURRENT INTEREST RATE,CURRENT ACTUAL UPB,LOAN AGE,REMAINING MONTHS TO LEGAL MATURITY,ADJUSTED MONTHS TO MATURITY,MATURITY DATE,METROPOLITAN STATISTICAL AREA (MSA),...,ASSOCIATED TAXES FOR HOLDING PROPERTY,NET SALE PROCEEDS,CREDIT ENHANCEMENT PROCEEDS,REPURCHASE MAKE WHOLE PROCEEDS,OTHER FORECLOSURE PROCEEDS,NON INTEREST BEARING UPB,PRINCIPAL FORGIVENESS AMOUNT,REPURCHASE MAKE WHOLE PROCEEDS FLAG,FORECLOSURE PRINCIPAL WRITE-OFF AMOUNT,SERVICING ACTIVITY INDICATOR
0,100000841305,01/01/2011,"CITIMORTGAGE, INC.",4.125,,0,360,360.0,01/2041,19100,...,,,,,,,,,,N
1,100000841305,02/01/2011,,4.125,,1,359,359.0,01/2041,19100,...,,,,,,,,,,N
2,100000841305,03/01/2011,,4.125,,2,358,358.0,01/2041,19100,...,,,,,,,,,,N
3,100000841305,04/01/2011,,4.125,,3,357,357.0,01/2041,19100,...,,,,,,,,,,N
4,100000841305,05/01/2011,,4.125,,4,356,356.0,01/2041,19100,...,,,,,,,,,,N


__Insight:__  
The Performance file contains a wealth of interesting data.  It comprises the entire monthly loan history from origination to disposition of each loan acquired in the associated quarter.  It also includes any costs associated with a default type credit event.  In further considerations, exploration of this post acquistion data and default subset costs may serve very useful.  However, for the scope of this project, only the loan's final disposition will be extracted from this file.  All other feature will be dropped.

At first glance, the target may be narrowered down to FORECLOSURE DATE and ZERO BALANCE CODE.  Upon further exploration, the foreclosure date does not appear to encompass all default type credit events.  The zero balance code on the other hand appears to implicate all final dispositions of loans broken down as follows:<br>

- 01 = Prepaid or Matured
- 02 = Third Party Sale
- 03 = Short Sale
- 06 = Repurchased
- 09 = Deed-in-Lieu,REO
- 15 = Note Sale
- 16 = Reperforming Loan Sale

Third Party Sale, Short Sale, Repurchased, Deed-in-Lieu (REO), and Note Sale refer to default type credit events that will serve as the Default classification.  Prepaid or Matured and Reperforming Loan Sale will serve as No Default classification.  

In [13]:
df_perf.describe()

Unnamed: 0,LOAN IDENTIFIER,CURRENT INTEREST RATE,CURRENT ACTUAL UPB,LOAN AGE,REMAINING MONTHS TO LEGAL MATURITY,ADJUSTED MONTHS TO MATURITY,METROPOLITAN STATISTICAL AREA (MSA),ZERO BALANCE CODE,FORECLOSURE COSTS,PROPERTY PRESERVATION AND REPAIR COSTS,ASSET RECOVERY COSTS,MISCELLANEOUS HOLDING EXPENSES AND CREDITS,ASSOCIATED TAXES FOR HOLDING PROPERTY,NET SALE PROCEEDS,CREDIT ENHANCEMENT PROCEEDS,REPURCHASE MAKE WHOLE PROCEEDS,OTHER FORECLOSURE PROCEEDS,NON INTEREST BEARING UPB,PRINCIPAL FORGIVENESS AMOUNT,FORECLOSURE PRINCIPAL WRITE-OFF AMOUNT
count,28616970.0,28616970.0,25597510.0,28616970.0,28616970.0,28558330.0,28616970.0,340820.0,1196.0,877.0,526.0,1054.0,1017.0,1186.0,219.0,36.0,610.0,58648.0,40.0,0.0
mean,550276100000.0,4.374983,169687.3,36.95963,246.897,236.2029,27074.93,1.030189,5527.375109,7684.51886,1947.560057,1218.061385,5634.415093,128948.360329,43184.571826,86660.562222,4173.040443,1317.07593,0.0,
std,259693300000.0,0.4921834,115632.4,25.80382,96.20157,101.4002,14252.44,0.501271,4057.398021,8964.692747,1786.940241,3139.625271,7615.898317,95239.80597,41967.398269,85705.119438,21149.751427,8328.251338,0.0,
min,100000800000.0,2.0,0.0,-1.0,0.0,0.0,0.0,1.0,1.8,5.0,1.0,-19248.59,15.93,720.13,1358.12,505.01,0.0,0.0,0.0,
25%,325512200000.0,4.0,87758.4,14.0,156.0,147.0,16860.0,1.0,2973.66,2280.0,750.0,-25.9025,1476.26,57725.26,19438.63,27513.415,200.22,0.0,0.0,
50%,549800900000.0,4.375,138589.6,33.0,289.0,276.0,31080.0,1.0,4602.16,4037.47,1500.0,744.79,3040.32,105902.435,35487.27,51710.2,896.915,0.0,0.0,
75%,774927100000.0,4.75,221145.0,57.0,333.0,329.0,38860.0,1.0,7297.71,9851.88,2500.0,1961.9475,6584.0,171426.6775,50336.08,124112.925,2627.6925,0.0,0.0,
max,999998200000.0,7.0,1172244.0,115.0,482.0,474.0,49740.0,16.0,38196.7,69948.97,12712.3,30034.24,99546.79,712851.97,349594.7,312701.16,421458.29,134092.49,0.0,


In [14]:
df_perf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28616974 entries, 0 to 28616973
Data columns (total 31 columns):
LOAN IDENTIFIER                               int64
MONTHLY REPORTING PERIOD                      object
SERVICER NAME                                 object
CURRENT INTEREST RATE                         float64
CURRENT ACTUAL UPB                            float64
LOAN AGE                                      int64
REMAINING MONTHS TO LEGAL MATURITY            int64
ADJUSTED MONTHS TO MATURITY                   float64
MATURITY DATE                                 object
METROPOLITAN STATISTICAL AREA (MSA)           int64
CURRENT LOAN DELINQUENCY STATUS               object
MODIFICATION FLAG                             object
ZERO BALANCE CODE                             float64
ZERO BALANCE EFFECTIVE DATE                   object
LAST PAID INSTALLMENT DATE                    object
FORECLOSURE DATE                              object
DISPOSITION DATE                     

In [15]:
df_perf.isna().sum()

LOAN IDENTIFIER                                      0
MONTHLY REPORTING PERIOD                             0
SERVICER NAME                                 27929894
CURRENT INTEREST RATE                                0
CURRENT ACTUAL UPB                             3019464
LOAN AGE                                             0
REMAINING MONTHS TO LEGAL MATURITY                   0
ADJUSTED MONTHS TO MATURITY                      58648
MATURITY DATE                                        0
METROPOLITAN STATISTICAL AREA (MSA)                  0
CURRENT LOAN DELINQUENCY STATUS                   1290
MODIFICATION FLAG                                    0
ZERO BALANCE CODE                             28276154
ZERO BALANCE EFFECTIVE DATE                   28276154
LAST PAID INSTALLMENT DATE                    28615684
FORECLOSURE DATE                              28615808
DISPOSITION DATE                              28615769
FORECLOSURE COSTS                             28615778
PROPERTY P

In [17]:
# define function to clean and merge datasets
def clean_merge_datasets(df_acq=df_acq, df_perf=df_perf, year=year, quarter=quarter):

    # condense df_perf down to last status of each loan
    loan_ids = df_perf['LOAN IDENTIFIER'].tolist()

    last_index = []
    for i in range(1,len(loan_ids)):
        if loan_ids[i] != loan_ids[i-1]:
            last_index.append(i-1)

        if i == len(loan_ids)-1:
            last_index.append(i)

    df_perf = df_perf.iloc[last_index]

    # condense df_perf down to loan id and zero balance code
    df_perf = df_perf[['LOAN IDENTIFIER', 'ZERO BALANCE CODE']]

    # rename target column to default
    df_perf.rename({'ZERO BALANCE CODE': 'DEFAULT'}, axis=1, inplace=True)

    # map zero balance codes to binary 
    df_perf['DEFAULT'] = df_perf['DEFAULT'].map(lambda x: 1 if x in [2, 3, 6, 9, 15] else 0)

    # merge
    df_cmp = pd.merge(df_acq, df_perf, on='LOAN IDENTIFIER')

    # create binary MI column
    df_cmp['MI'] = df_cmp['PRIMARY MORTGAGE INSURANCE PERCENT'].fillna(0).map(lambda x: 1 if x > 0 else x)

    # drop nulls
    df_cmp.dropna(subset=['DEFAULT', 'ORIGINAL COMBINED LOAN-TO-VALUE (CLTV)', 'NUMBER OF BORROWERS', 'ORIGINAL DEBT TO INCOME RATIO', 
                        'BORROWER CREDIT SCORE AT ORIGINATION'], inplace=True)

    # create MINIMUM CREDIT SCORE column
    df_cmp['CO-BORROWER CREDIT SCORE AT ORIGINATION'].fillna(1000, inplace=True)
    df_cmp['MIN CREDIT SCORE'] = df_cmp[['BORROWER CREDIT SCORE AT ORIGINATION', 'CO-BORROWER CREDIT SCORE AT ORIGINATION']].min(axis=1)

    # drop columns
    df_cmp.drop(columns=['PRIMARY MORTGAGE INSURANCE PERCENT', 'MORTGAGE INSURANCE TYPE', 
                         'BORROWER CREDIT SCORE AT ORIGINATION', 'CO-BORROWER CREDIT SCORE AT ORIGINATION',
                         'ORIGINATION DATE', 'FIRST PAYMENT DATE', 'ZIP CODE SHORT'], inplace=True)

    # reset index
    df_cmp.reset_index(drop=True, inplace=True)

    # save to csv
    df_cmp.to_csv(f'../data/complete{year}q{quarter}.csv', index=False)
    
    return df_cmp

In [18]:
df_cmp = clean_merge_datasets()

__Insight:__  
*Cleaning the Acquisition file:*
As noted before, new features were created and labeled MI and MIN CREDIT SCORE.  These have replaced PRIMARY MORTGAGE INSURANCE PERCENT, MORTGAGE INSURANCE TYPE, BORROWER CREDIT SCORE AT ORIGINATION, CO-BORROWER CREDIT SCORE AT ORIGINATION.  This aggregation serves a few purposes and provides the added benefit of eliminating null values.  Also as previously noted, observations containing null values contained in the ORIGINAL COMBINED LOAN-TO-VALUE (CLTV), NUMBER OF BORROWERS, ORIGINAL DEBT TO INCOME RATIO, and BORROWER CREDIT SCORE AT ORIGINATION have been removed to simplify data handling.

Decisions were also made to remove a few other features; ORIGINATION DATE, FIRST PAYMENT DATE, and ZIP CODE SHORT.  The dates features are of little consequences as they offer little variance.  They simply generalize the dataset being used to train the model.  Removing these also add the benefit of simplifying data handling for the scope for this project.  The zip code feature does offer useful categorical subsets however, this feature contains approximately 900 unique categories which would require a larger training set and therefore computing power beyond the scope of this project. Future considerations would broaden the scope to incorporate these features to explore the influence of time and specfic regional areas.

In [19]:
df_cmp.isna().sum()

LOAN IDENTIFIER                           0
ORIGINATION CHANNEL                       0
SELLER NAME                               0
ORIGINAL INTEREST RATE                    0
ORIGINAL UPB                              0
ORIGINAL LOAN TERM                        0
ORIGINAL LOAN-TO-VALUE (LTV)              0
ORIGINAL COMBINED LOAN-TO-VALUE (CLTV)    0
NUMBER OF BORROWERS                       0
ORIGINAL DEBT TO INCOME RATIO             0
FIRST TIME HOME BUYER INDICATOR           0
LOAN PURPOSE                              0
PROPERTY TYPE                             0
NUMBER OF UNITS                           0
OCCUPANCY TYPE                            0
PROPERTY STATE                            0
PRODUCT TYPE                              0
RELOCATION MORTGAGE INDICATOR             0
DEFAULT                                   0
MI                                        0
MIN CREDIT SCORE                          0
dtype: int64

In [20]:
df_cmp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504559 entries, 0 to 504558
Data columns (total 21 columns):
LOAN IDENTIFIER                           504559 non-null int64
ORIGINATION CHANNEL                       504559 non-null object
SELLER NAME                               504559 non-null object
ORIGINAL INTEREST RATE                    504559 non-null float64
ORIGINAL UPB                              504559 non-null int64
ORIGINAL LOAN TERM                        504559 non-null int64
ORIGINAL LOAN-TO-VALUE (LTV)              504559 non-null int64
ORIGINAL COMBINED LOAN-TO-VALUE (CLTV)    504559 non-null float64
NUMBER OF BORROWERS                       504559 non-null float64
ORIGINAL DEBT TO INCOME RATIO             504559 non-null float64
FIRST TIME HOME BUYER INDICATOR           504559 non-null object
LOAN PURPOSE                              504559 non-null object
PROPERTY TYPE                             504559 non-null object
NUMBER OF UNITS                           50