# Analyzing Freddie Mac Single Family Loans

## Abstract
The Federal Home Loan Mortgage Corporation (FHLMC), known as Freddie Mac, is a public government-sponsored enterprise which was created to expand the secondary market for mortgages in the United States. Freddie Mac buys mortgages on the secondary market, pools them, and sells them as a mortgage-backed security to investors on the open market. This secondary mortgage market increases the supply of money available for mortgage lending and increases the money available for new home purchases.  
<br>
At the direction of its regulator, the Federal Housing Finance Agency (FHFA), Freddie Mac has made the Single Family Loan-Level Dataset (the "Dataset") available as part of a larger effort to increase transparency and help investors build mre accurate credit performance models in support of ongoing and future credit risk-sharing transactions. The Dataset includes: <br>
1. Loan-level origination, monthly loan performance, and actual loss data on a portion of the fully amortizing 30-year fixed-rate Single Family mortgages that Freddie Mac acquired with origination dates from 11999 to the Origination Cutoff Date. 
2. Loan-level origination, monthly loan performance, and actual loss data on a portion of the fully amortizing 15- and 20-year fixed-rate Single Family mortgages that Freddie Mac acquired with origination dates from January 1, 2005, to the Origination Cutoff Date. 
Loan performance information in the Dataset includes the monthly loan balance, delinquency status and certain information up to and including the earliest of the following termination events: <br>
    a. Prepaid or Matured (voluntary Payoff) <br>
    b. Foreclosure Alternative Group (Short Sale, Third Party Sale, Charge Off or Note Sale) <br>
    c. Repurchase prior to Property Disposition. <br>
    d. REO Disposition <br>

## Single Family Loan-Level Dataset Sample
Freddie Mac has created a smaller dataset which is a simple random sample of 50,000 loans selected from each full vintage year and a proportionate nubmer of loans from each partial vintage year of the full Single Family Loan-Level Dataset. Each vintage year has one origination data file and one corresponding monthly performance data file, containing the same loan-level data fields as those included in the full Dataset.

## Hypothesis
The outcome of housing loan defaults can be statistically modeled and predicted as a function of financial factors, specifically interest Rate and consumer credit history.  

## Dataset Characteristics

#### Time period: 2012 to 2016
#### Number of loans:
#### Features:

# Initial Data Import & Treatment
Origination and performance data must be downloaded individually by year from the Freddie Mac website. <br>
<br>
Format as follows: <br>
sample_orig_YYYY.txt --> origination data <br>
sample_svcg_YYYY.txt --> monthly performance data <br>
<br>
Both origination and performance files share the common "loan sequence number" which serves as the unique loan identifier. The "loan sequence number" includes the year and quarter (Q1, Q2, etc) of loan origination. The performance file shows monthly performance for each loan, thus there are multiple rows (month1, month2, etc) corresponding to each "loan sequence number". There are multiple null values and data-types within the datasets, and the data is considered "living", meaning it is subject to change as Freddie Mac receives & updates to their housing loan profiles.

In [1]:
# Import Modules:
import time
import math
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Aesthetics.
%matplotlib inline
sns.set_style('white')
#fig, ax = plt.subplots(1, 2, figsize=(18, 4))

In [2]:
# Load raw files.
orig2012 = pd.read_csv('~/src/data/u3CapstoneData/origination-data/sample_orig_2012.txt', sep='|', header=None)
orig2013 = pd.read_csv('~/src/data/u3CapstoneData/origination-data/sample_orig_2013.txt', sep='|', header=None)
orig2014 = pd.read_csv('~/src/data/u3CapstoneData/origination-data/sample_orig_2014.txt', sep='|', header=None)
orig2015 = pd.read_csv('~/src/data/u3CapstoneData/origination-data/sample_orig_2015.txt', sep='|', header=None)
orig2016 = pd.read_csv('~/src/data/u3CapstoneData/origination-data/sample_orig_2016.txt', sep='|', header=None)

perf2012 = pd.read_csv('~/src/data/u3CapstoneData/perf-data/sample_svcg_2012.txt', sep='|', header=None, low_memory=False)
perf2013 = pd.read_csv('~/src/data/u3CapstoneData/perf-data/sample_svcg_2013.txt', sep='|', header=None, low_memory=False)
perf2014 = pd.read_csv('~/src/data/u3CapstoneData/perf-data/sample_svcg_2014.txt', sep='|', header=None, low_memory=False)
perf2015 = pd.read_csv('~/src/data/u3CapstoneData/perf-data/sample_svcg_2015.txt', sep='|', header=None, low_memory=False)
perf2016 = pd.read_csv('~/src/data/u3CapstoneData/perf-data/sample_svcg_2016.txt', sep='|', header=None, low_memory=False)

In [3]:
# Setting header names & apply.
orig_cols = ['creditScore', 'firstPaymentDate', 'firstTimeHomebuyerFlag', 'maturityDate',
                   'metroArea', 'miPercentage', 'numberOfUnits',
                   'occupancyStatus', 'cltvRatio', 'dtiRatio', 'upb',
                   'ltvRatio', 'interestRate', 'channel', 'ppmFlag', 'productType',
                   'propertyState', 'propertyType', 'postalCode', 'lsn',
                   'loanPurpose', 'originalLoanTerm', 'numberOfBorrowers', 'sellerName',
                   'servicerName', 'superConformingFlag'] # 'pre-HarpLoanSequenceNumber'

perf_cols = ['lsn', 'monthlyReportingPeriod', 'currentActualUpb',
                 'dlq', 'loanAge', 'remainMthsToMaturity', 'repurchaseFlag',
                 'modificationFlag', 'zeroBalCode', 'zeroBalEffDate', 'currentIntRate',
                 'curDeferredUpb', 'ddlpi', 'miRecov', 'netSalesProceeds', 'nonMiRecov',
                 'expenses', 'legalCosts', 'maintPreservationCosts', 'taxesInsurance',
                 'miscExpenses', 'actualLossCalc', 'modificationCost',
                 'stepModificationFlag', 'deferredPaymentModification']

orig2012.columns = orig_cols
orig2013.columns = orig_cols
orig2014.columns = orig_cols
orig2015.columns = orig_cols
orig2016.columns = orig_cols

perf2012.columns = perf_cols
perf2013.columns = perf_cols
perf2014.columns = perf_cols
perf2015.columns = perf_cols
perf2016.columns = perf_cols

In [4]:
# Merge origination files.
frames_orig = [orig2012, orig2013, orig2014, orig2015, orig2016]
orig_combined = pd.concat(frames_orig)
print(orig_combined.shape)
orig_combined.head()

(250000, 26)


Unnamed: 0,creditScore,firstPaymentDate,firstTimeHomebuyerFlag,maturityDate,metroArea,miPercentage,numberOfUnits,occupancyStatus,cltvRatio,dtiRatio,...,propertyState,propertyType,postalCode,lsn,loanPurpose,originalLoanTerm,numberOfBorrowers,sellerName,servicerName,superConformingFlag
0,814,201203,9,204202,49420.0,0,1,P,57,36,...,WA,SF,98900,F112Q1000057,C,360,2,Other sellers,Other servicers,
1,745,201204,9,204203,,0,1,S,69,31,...,NH,SF,3200,F112Q1000089,P,360,2,Other sellers,"PNCBANK,NATL",
2,707,201203,9,204202,47260.0,0,1,P,80,36,...,VA,SF,23000,F112Q1000137,C,360,2,Other sellers,Other servicers,
3,712,201203,9,204202,31540.0,0,1,P,59,21,...,WI,SF,53500,F112Q1000154,N,360,2,Other sellers,Other servicers,
4,783,201203,9,204202,21060.0,0,1,I,75,25,...,KY,SF,42700,F112Q1000162,N,360,2,Other sellers,Other servicers,


In [5]:
# Create 'year' column within the merged origination file.
# The loan sequence number 'lsn' is formatted F1YYQnXXXXXX,
# where F1 refers to the product 'Fixed Rate Mortgage',
# YYQn refers to origination year and quarter.

orig_combined['year'] = ['19' + x if x == '99' else '20' + x
                         for x in (orig_combined['lsn'].apply(lambda x: x[2:4]))]
orig_combined.tail()

Unnamed: 0,creditScore,firstPaymentDate,firstTimeHomebuyerFlag,maturityDate,metroArea,miPercentage,numberOfUnits,occupancyStatus,cltvRatio,dtiRatio,...,propertyType,postalCode,lsn,loanPurpose,originalLoanTerm,numberOfBorrowers,sellerName,servicerName,superConformingFlag,year
49995,808,201701,N,204612,21780.0,0,1,P,80,21,...,SF,47700,F116Q4434422,P,360,1,Other sellers,Other servicers,,2016
49996,721,201701,9,204612,35614.0,0,2,P,55,49,...,SF,10400,F116Q4434529,C,360,1,"JPMORGANCHASEBANK,NA","JPMORGANCHASEBANK,NA",,2016
49997,716,201710,9,204709,38060.0,0,1,P,66,35,...,PU,85100,F116Q4434568,N,360,2,Other sellers,Other servicers,,2016
49998,791,201701,9,204612,37964.0,0,1,I,72,39,...,SF,19000,F116Q4434582,P,360,2,Other sellers,Other servicers,,2016
49999,721,201710,9,204709,39340.0,25,1,P,88,40,...,SF,84000,F116Q4434621,N,360,2,Other sellers,Other servicers,,2016


In [6]:
# Merge performance files.
frames_perf = [perf2012, perf2013, perf2014, perf2015, perf2016]
perf_combined = pd.concat(frames_perf)
print(perf_combined.shape)
perf_combined.head()

(8727988, 25)


Unnamed: 0,lsn,monthlyReportingPeriod,currentActualUpb,dlq,loanAge,remainMthsToMaturity,repurchaseFlag,modificationFlag,zeroBalCode,zeroBalEffDate,...,nonMiRecov,expenses,legalCosts,maintPreservationCosts,taxesInsurance,miscExpenses,actualLossCalc,modificationCost,stepModificationFlag,deferredPaymentModification
0,F112Q1000057,201202,103000.0,0,0,360,,,,,...,,,,,,,,,,
1,F112Q1000057,201203,103000.0,0,1,359,,,,,...,,,,,,,,,,
2,F112Q1000057,201204,103000.0,0,2,358,,,,,...,,,,,,,,,,
3,F112Q1000057,201205,102000.0,0,3,357,,,,,...,,,,,,,,,,
4,F112Q1000057,201206,102000.0,0,4,356,,,,,...,,,,,,,,,,


In [7]:
# Check deliquency values, we will have to remove whitespace for '0' later.
perf_combined['dlq'].value_counts()

0     7857555
0      830116
1       25190
2        4436
1        2774
3        1793
4        1093
5         873
6         602
7         465
2         427
8         372
9         325
10        263
11        211
12        190
3         153
13        152
14        117
R         101
15         85
16         78
17         67
4          65
18         51
5          43
19         39
20         32
21         29
25         26
       ...   
8          15
27         13
10         12
9          11
28         10
30         10
29          9
31          6
32          6
33          5
36          4
35          4
11          4
34          4
12          3
37          3
13          2
38          2
40          2
41          2
XX          2
39          2
14          1
43          1
42          1
47          1
46          1
44          1
45          1
48          1
Name: dlq, Length: 66, dtype: int64

In [8]:
# Delinquency Status values where XX = Unknown, R = REO Acquistion
perf_combined['dlq'] = [999 if x == 'R' else x for x in (perf_combined['dlq'].apply(lambda x: x))]
perf_combined['dlq'] = [0 if x == 'XX' else x for x in (perf_combined['dlq'].apply(lambda x: x))]

In [9]:
# Not delinquent ie: '0' adds up including the 2 'XX's.
perf_combined['dlq'].value_counts()

0      7857555
0       830118
1        25190
2         4436
1         2774
3         1793
4         1093
5          873
6          602
7          465
2          427
8          372
9          325
10         263
11         211
12         190
3          153
13         152
14         117
999        101
15          85
16          78
17          67
4           65
18          51
5           43
19          39
20          32
21          29
25          26
        ...   
7           15
8           15
27          13
10          12
9           11
28          10
30          10
29           9
32           6
31           6
33           5
36           4
35           4
34           4
11           4
37           3
12           3
39           2
41           2
40           2
13           2
38           2
45           1
42           1
14           1
43           1
44           1
47           1
48           1
46           1
Name: dlq, Length: 65, dtype: int64

In [10]:
# Combining based on loan sequence number.
orig_combined['is_dlq'] = orig_combined['lsn'].map(
    perf_combined.set_index('lsn')['dlq'].to_dict())
orig_combined.head()

Unnamed: 0,creditScore,firstPaymentDate,firstTimeHomebuyerFlag,maturityDate,metroArea,miPercentage,numberOfUnits,occupancyStatus,cltvRatio,dtiRatio,...,postalCode,lsn,loanPurpose,originalLoanTerm,numberOfBorrowers,sellerName,servicerName,superConformingFlag,year,is_dlq
0,814,201203,9,204202,49420.0,0,1,P,57,36,...,98900,F112Q1000057,C,360,2,Other sellers,Other servicers,,2012,0
1,745,201204,9,204203,,0,1,S,69,31,...,3200,F112Q1000089,P,360,2,Other sellers,"PNCBANK,NATL",,2012,0
2,707,201203,9,204202,47260.0,0,1,P,80,36,...,23000,F112Q1000137,C,360,2,Other sellers,Other servicers,,2012,0
3,712,201203,9,204202,31540.0,0,1,P,59,21,...,53500,F112Q1000154,N,360,2,Other sellers,Other servicers,,2012,0
4,783,201203,9,204202,21060.0,0,1,I,75,25,...,42700,F112Q1000162,N,360,2,Other sellers,Other servicers,,2012,0


In [11]:
# Renaming dataframe.
merged_df = orig_combined

In [12]:
len(merged_df)

250000

In [13]:
merged_df.columns

Index(['creditScore', 'firstPaymentDate', 'firstTimeHomebuyerFlag',
       'maturityDate', 'metroArea', 'miPercentage', 'numberOfUnits',
       'occupancyStatus', 'cltvRatio', 'dtiRatio', 'upb', 'ltvRatio',
       'interestRate', 'channel', 'ppmFlag', 'productType', 'propertyState',
       'propertyType', 'postalCode', 'lsn', 'loanPurpose', 'originalLoanTerm',
       'numberOfBorrowers', 'sellerName', 'servicerName',
       'superConformingFlag', 'year', 'is_dlq'],
      dtype='object')

In [16]:
# Second check for nulls.
print('nulls before:', merged_df['is_dlq'].isnull().sum())

# Fill NaNs.
merged_df['is_dlq'] = merged_df['is_dlq'].fillna(0)
print('nulls after:', merged_df['is_dlq'].isnull().sum())

nulls before: 4
nulls after: 0


In [17]:
# Let's come back to the whitespace.
# Per the Freddie Mac manual on delinquency status:
#    0 = Current, or less than 30 days past due.
#    Space (3) = Unavailable
#    XX = Unknown

merged_df.is_dlq.value_counts()

0      197881
0       49480
1        1136
2         386
1         290
3         188
2         142
4          65
999        60
3          55
5          52
6          40
7          31
8          20
10         19
11         17
9          17
12         13
5          11
13         11
4          11
16          9
14          9
6           6
19          6
20          5
25          5
17          5
18          4
15          4
7           3
10          2
8           2
29          2
21          2
27          2
12          1
14          1
37          1
41          1
30          1
32          1
48          1
33          1
26          1
Name: is_dlq, dtype: int64

In [18]:
# We will assume 'unavailable' loan delinquency status as not-delinquent.

# Change dtype for 'dlq'.
merged_df['is_dlq'] = merged_df['is_dlq'].astype('int64')
merged_df.is_dlq.value_counts()

0      247361
1        1426
2         528
3         243
4          76
5          63
999        60
6          46
7          34
8          22
10         21
9          17
11         17
12         14
13         11
14         10
16          9
19          6
17          5
20          5
25          5
15          4
18          4
27          2
21          2
29          2
48          1
26          1
30          1
32          1
33          1
37          1
41          1
Name: is_dlq, dtype: int64

In [19]:
# Creating the binary feature for delinquent loan. 
merged_df['is_dlq'] = merged_df['is_dlq'].apply(lambda x: 1 if x >= 1 else 0)
merged_df.is_dlq.value_counts()

0    247361
1      2639
Name: is_dlq, dtype: int64

In [20]:
len(merged_df)

250000

In [21]:
# Renaming dataframe.
df1 = merged_df
print(df1.columns)
df1.tail()

Index(['creditScore', 'firstPaymentDate', 'firstTimeHomebuyerFlag',
       'maturityDate', 'metroArea', 'miPercentage', 'numberOfUnits',
       'occupancyStatus', 'cltvRatio', 'dtiRatio', 'upb', 'ltvRatio',
       'interestRate', 'channel', 'ppmFlag', 'productType', 'propertyState',
       'propertyType', 'postalCode', 'lsn', 'loanPurpose', 'originalLoanTerm',
       'numberOfBorrowers', 'sellerName', 'servicerName',
       'superConformingFlag', 'year', 'is_dlq'],
      dtype='object')


Unnamed: 0,creditScore,firstPaymentDate,firstTimeHomebuyerFlag,maturityDate,metroArea,miPercentage,numberOfUnits,occupancyStatus,cltvRatio,dtiRatio,...,postalCode,lsn,loanPurpose,originalLoanTerm,numberOfBorrowers,sellerName,servicerName,superConformingFlag,year,is_dlq
49995,808,201701,N,204612,21780.0,0,1,P,80,21,...,47700,F116Q4434422,P,360,1,Other sellers,Other servicers,,2016,0
49996,721,201701,9,204612,35614.0,0,2,P,55,49,...,10400,F116Q4434529,C,360,1,"JPMORGANCHASEBANK,NA","JPMORGANCHASEBANK,NA",,2016,0
49997,716,201710,9,204709,38060.0,0,1,P,66,35,...,85100,F116Q4434568,N,360,2,Other sellers,Other servicers,,2016,0
49998,791,201701,9,204612,37964.0,0,1,I,72,39,...,19000,F116Q4434582,P,360,2,Other sellers,Other servicers,,2016,0
49999,721,201710,9,204709,39340.0,25,1,P,88,40,...,84000,F116Q4434621,N,360,2,Other sellers,Other servicers,,2016,0


In [22]:
# Check for nulls.
df1.isnull().sum()

creditScore                    0
firstPaymentDate               0
firstTimeHomebuyerFlag         0
maturityDate                   0
metroArea                  26492
miPercentage                   0
numberOfUnits                  0
occupancyStatus                0
cltvRatio                      0
dtiRatio                       0
upb                            0
ltvRatio                       0
interestRate                   0
channel                        0
ppmFlag                       95
productType                    0
propertyState                  0
propertyType                   0
postalCode                     0
lsn                            0
loanPurpose                    0
originalLoanTerm               0
numberOfBorrowers              0
sellerName                     0
servicerName                   0
superConformingFlag       239659
year                           0
is_dlq                         0
dtype: int64

In [26]:
df1.ppmFlag.value_counts()

N    249905
Name: ppmFlag, dtype: int64

In [27]:
df1.superConformingFlag.value_counts()

Y    10341
Name: superConformingFlag, dtype: int64

In [28]:
# Fill remaining nulls.
df1['metroArea'] = df1['metroArea'].fillna(0)
df1['ppmFlag'] = df1['ppmFlag'].fillna('Y')
df1['superConformingFlag'] = df1['superConformingFlag'].fillna('N')

df1.isnull().sum()

creditScore               0
firstPaymentDate          0
firstTimeHomebuyerFlag    0
maturityDate              0
metroArea                 0
miPercentage              0
numberOfUnits             0
occupancyStatus           0
cltvRatio                 0
dtiRatio                  0
upb                       0
ltvRatio                  0
interestRate              0
channel                   0
ppmFlag                   0
productType               0
propertyState             0
propertyType              0
postalCode                0
lsn                       0
loanPurpose               0
originalLoanTerm          0
numberOfBorrowers         0
sellerName                0
servicerName              0
superConformingFlag       0
year                      0
is_dlq                    0
dtype: int64

In [43]:
df1.productType.value_counts()

FRM    250000
Name: productType, dtype: int64

In [44]:
# Removing redundant geographical columns.
df1 = df1.drop(['productType', 'metroArea', 'postalCode'], axis=1)
df1.columns

Index(['creditScore', 'firstPaymentDate', 'firstTimeHomebuyerFlag',
       'maturityDate', 'miPercentage', 'numberOfUnits', 'occupancyStatus',
       'cltvRatio', 'dtiRatio', 'upb', 'ltvRatio', 'interestRate', 'channel',
       'ppmFlag', 'propertyState', 'propertyType', 'lsn', 'loanPurpose',
       'originalLoanTerm', 'numberOfBorrowers', 'sellerName', 'servicerName',
       'superConformingFlag', 'year', 'is_dlq'],
      dtype='object')

In [48]:
# Renaming dataframe.
df2 = df1

In [49]:
# Reordering columns, assigning variables for each column.
lsn_var = df2['lsn']
is_dlq_var = df2['is_dlq']
creditScore_var = df2['creditScore']
interestRate_var = df2['interestRate']
ltvRatio_var = df2['ltvRatio']
dtiRatio_var = df2['dtiRatio']
cltvRatio_var = df2['cltvRatio']
upb_var = df2['upb']
miPercentage_var = df2['miPercentage']
loanPurpose_var = df2['loanPurpose']
numberOfUnits_var = df2['numberOfUnits']
occupancyStatus_var = df2['occupancyStatus']
numberOfBorrowers_var = df2['numberOfBorrowers']
firstTimeHomebuyerFlag_var = df2['firstTimeHomebuyerFlag']
superConformingFlag_var = df2['superConformingFlag']
ppmFlag_var = df2['ppmFlag']
propertyState_var = df2['propertyState']
propertyType_var = df2['propertyType']
channel_var = df2['channel']
sellerName_var = df2['sellerName']
servicerName_var = df2['servicerName']
originalLoanTerm_var = df2['originalLoanTerm']
maturityDate_var = df2['maturityDate']
firstPaymentDate_var = df2['firstPaymentDate']
year_var = df2['year']

In [51]:
# Remove columns then add back in specified order.
df2.drop(['lsn', 'is_dlq', 'creditScore', 'interestRate', 'ltvRatio',
          'dtiRatio', 'cltvRatio', 'upb', 'miPercentage', 'loanPurpose',
          'numberOfUnits', 'occupancyStatus', 'numberOfBorrowers',
          'firstTimeHomebuyerFlag', 'superConformingFlag', 'ppmFlag',
          'propertyState', 'propertyType', 'channel', 'sellerName',
          'servicerName', 'originalLoanTerm', 'maturityDate',
          'firstPaymentDate', 'year'], axis=1, inplace=True)

# Reorder & rename.
df2.insert(0, 'lsn', lsn_var)
df2.insert(1, 'delinquent', is_dlq_var)
df2.insert(2, 'credit_score', creditScore_var)
df2.insert(3, 'int_rate', interestRate_var)
df2.insert(4, 'ltv_ratio', ltvRatio_var)
df2.insert(5, 'dti_ratio', dtiRatio_var)
df2.insert(6, 'cltv_ratio', cltvRatio_var)
df2.insert(7, 'unpaid_princ_bal', upb_var)
df2.insert(8, 'mortgage_insurance_pctg', miPercentage_var)
df2.insert(9, 'loan_purpose', loanPurpose_var)
df2.insert(10, 'no_of_units', numberOfUnits_var)
df2.insert(11, 'occupancy_status', occupancyStatus_var)
df2.insert(12, 'no_of_borrowers', numberOfBorrowers_var)
df2.insert(13, 'first_home_flag', firstTimeHomebuyerFlag_var)
df2.insert(14, 'super_conform_flag', superConformingFlag_var)
df2.insert(15, 'ppm_flag', ppmFlag_var)
df2.insert(16, 'state', propertyState_var)
df2.insert(17, 'channel', channel_var)
df2.insert(18, 'seller', sellerName_var)
df2.insert(19, 'servicer', servicerName_var)
df2.insert(20, 'loan_term', originalLoanTerm_var)
df2.insert(21, 'maturity_date', maturityDate_var)
df2.insert(22, 'first_pmt_date', firstPaymentDate_var)
df2.insert(23, 'year', year_var)

# View.
print(df2.columns)
print(df2.shape)

Index(['lsn', 'delinquent', 'credit_score', 'int_rate', 'ltv_ratio',
       'dti_ratio', 'cltv_ratio', 'unpaid_princ_bal',
       'mortgage_insurance_pctg', 'loan_purpose', 'no_of_units',
       'occupancy_status', 'no_of_borrowers', 'first_home_flag',
       'super_conform_flag', 'ppm_flag', 'state', 'channel', 'seller',
       'servicer', 'loan_term', 'maturity_date', 'first_pmt_date', 'year'],
      dtype='object')
(250000, 24)


Unnamed: 0,lsn,delinquent,credit_score,int_rate,ltv_ratio,dti_ratio,cltv_ratio,unpaid_princ_bal,mortgage_insurance_pctg,loan_purpose,...,super_conform_flag,ppm_flag,state,channel,seller,servicer,loan_term,maturity_date,first_pmt_date,year
0,F112Q1000057,0,814,4.0,57,36,57,103000,0,C,...,N,N,WA,R,Other sellers,Other servicers,360,204202,201203,2012
1,F112Q1000089,0,745,4.0,69,31,69,417000,0,P,...,N,N,NH,R,Other sellers,"PNCBANK,NATL",360,204203,201204,2012
2,F112Q1000137,0,707,4.5,80,36,80,146000,0,C,...,N,N,VA,R,Other sellers,Other servicers,360,204202,201203,2012
3,F112Q1000154,0,712,4.0,59,21,59,381000,0,N,...,N,N,WI,R,Other sellers,Other servicers,360,204202,201203,2012
4,F112Q1000162,0,783,4.25,75,25,75,83000,0,N,...,N,N,KY,R,Other sellers,Other servicers,360,204202,201203,2012


# Exploratory Data Analysis

In [None]:
# Rename for EDA.
df = df2
df2.head()