mortgage-risk-classification

Contains the notebooks for the first of two projects I undertook during my summer internship at Andrew Davidson & Co. Working under the supervision of AD-Co's Behavior Modeling team, I analyzed Freddie Mac's single family loan-level datasets, focusing on the years postdating the Subprime Mortgage Crisis. I implemented a random decision forest in scikit-learn to classify month-to-month delinquency and turnover risk in mortgages.

Some Financial Background:

Companies that buy and sell mortgage-backed securities have a considerable interest (get it?) in evaluating the risk of these assets, which may be made up of a myriad of individual home loans. The two main categories of mortgage credit risk are (i) delinquency, when the borrower stops paying, and (ii) early termination, when the borrower pays off the mortgage earlier than expected (thus closing off the lender's source of fixed income, viz. interest payments). The latter typically takes one of two forms: refinance (where the debtor takes out a second mortgage on more favorable terms to pay off the first) and turnover (where the borrower simply moves and sells their equity).

In general, refinance is the easiest form of credit risk to predict: assuming people behave at least semi-rationally, when interest rates drop, they'll refinance. [Hence the seemingly-backwards terminology of mortgages: from the mortgage holder's perspective, 'premium' is a lower-interest, lower-risk loan while 'discount' is a higher-interest, riskier loan.] Delinquency is somewhat harder to predict, but credit score-which borrowers are required to disclose at the government loan agencies--serves as a rough estimator of risk of borrower default. Turnover, however, is rather difficult: the decision that goes into moving into a new house is a complicated one that can arise from any number of factors, few of which are immediately apparent in the data.

The Approach: I chose to work with a Random Decision Forest classifier (RF) for several reasons. First, AD-Co's proprietary loan risk model is built on a logistic regression, and a RF algorithm is different enough so as to offer an alternative perpsective on the problem. Secondly, I wanted to experiment with adding new, unconvential predictors to the model, and RFs are generally resillient to extra variables (e.g., multicolinearity is not a major hazard.) An RF makes few statistical assumptions about the dataset, if any, and can handle 'raw' variables. Logistic regressions, on the other hand, assume a linear response from their predictors: in terms of classifications, this means the space of data has a linear decision boundary. To realize a logistic regression's true utility means careful engineering of features. Conversely, for a novice in the domain of mortgages like me, the RF model makes for a gentler introduction. The RF as implemented in scikit-learn comes with feature importance metric, useful for picking up insight. Lastly, on the level of intuition, I suspect that the behavior that goes into default, turnover, and refi resembles a decision tree on some cognitive level. Perhaps this means that a decision forest is well-suited to this type of problem.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
samples		samples
HPA and Unemployment by year.ipynb		HPA and Unemployment by year.ipynb
README.md		README.md
UnemploymentHistory.txt		UnemploymentHistory.txt
delinquency_research_pres.ipynb		delinquency_research_pres.ipynb
nat_hp.csv		nat_hp.csv
primary15yr.txt		primary15yr.txt
primary30yr.txt		primary30yr.txt
state_del_by_year.csv		state_del_by_year.csv
state_hbi.csv		state_hbi.csv
state_uer.csv		state_uer.csv
turnover research.ipynb		turnover research.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samples

samples

HPA and Unemployment by year.ipynb

HPA and Unemployment by year.ipynb

README.md

README.md

UnemploymentHistory.txt

UnemploymentHistory.txt

delinquency_research_pres.ipynb

delinquency_research_pres.ipynb

nat_hp.csv

nat_hp.csv

primary15yr.txt

primary15yr.txt

primary30yr.txt

primary30yr.txt

state_del_by_year.csv

state_del_by_year.csv

state_hbi.csv

state_hbi.csv

state_uer.csv

state_uer.csv

turnover research.ipynb

turnover research.ipynb

Repository files navigation

mortgage-risk-classification

About

Releases

Packages

Languages

chathasphere/mortgage-risk-classification

Folders and files

Latest commit

History

Repository files navigation

mortgage-risk-classification

About

Resources

Stars

Watchers

Forks

Languages