# Uncertainty in Credit Scoring (Bayesian Deep Learning)

## Introduction 

The problem of credit scoring is a very standard one in Machine Learning literature and applications. Predicting whether or not a loan applicant will go default is one of the typical examples of classification problem, and usually serves as a good ground for application and comparison of various machine learning techniques- which, over the years, became very precise in making a binary prediction. However, the credit scoring problem can be thought about as a regression problem as well. What is to be predicted here, instead of failure probability as in the classification case, is the profit rate- earnings from a loan for the lender expressed as a percentage of the amount of money loaned. The motivation for the second approach is that, even if a borrower fails to pay off the entire loan, the lender can still earn the money, and still have an interest to invest in this particular opportunity rather than some other (if profit rate is higher than in some other investment opportunity with the same risk level). The Landing Club dataset of loans from 2007-2015 (data will be properly introduced in a separate section) offers a good example of such a situation. Since it is offering a peer-to-peer landing, we can think about each loan application as an individual investment opportunity (unlike in case of bak loans where decisions are driven by some more general risk management strategies). Hence, we can treat credit scoring (and making a subsequent decision on loan granting or rejecting) as a regression problem where the information applicants are providing are used as predictors of the profit rate. The goal is to avoid "bad" loans, or, in other words, the loans that make lenders lose money.

If we are treating a loan as other securities, such as bonds, commodities, currencies, and such, we might like to assess it in a similar way regarding its risk. For tradable assets, we can follow daily price changes and formulate the expected return, volatility, skewness and kurtosis of returns distribution, which is further used for making a portfolio decision. In the case of loans, we don't have such high-frequency data. Usually, there is some credit history, but few data points cannot lead to any reasonable assumptions about future movements. Hence, one solution would be to just use the predicted expected return or to use a classification approach.

One way around this caveat is the application of Bayesian Method in the estimation of a model. Unlike in the more traditional, deterministic, approach, the result of the prediction isn't a point estimate. Instead, by applying different methods in the estimation (Monte Carlo Simulations, among others), the outcome of prediction is (an approximation of) distribution of probabilities. Generally, these methods have been known for a very long time, but due to very high costs of computation were usually overlooked. However, with recent advances in statistical theory as well as with an increase in computational power of computers, different methods were invented that overcome the aforementioned hardships yet achieve their goal. As usual, there is no free lunch, and these new methods have their own weaknesses which will be discussed later in this article. In any case, the authors made use of these enhancements to predict distributions of profit rates for the loan applications. With this result, it was easy to calculate shape measures of distribution (mean, median, mode, variance, skewness, kurtosis).

With these measures, we can finally compare loans with other investment opportunities. One of the most traditional methods used to evaluate an investment opportunity is the Sharpe Ratio. Sharpe Ratio, introduced by William F. Sharpe in 1966 under the name "reward-to-variability" (Sharpe, W.E. 1966), penalizes the excess expected return over risk-free rate by the standard deviation of the returns. Hence, we could use the predicted mean profit rate of a loan application and divide it by the standard deviation of the predicted distribution. Hence, from two loans that bring us the same profit rate, but under different risk (standard deviations), we would prefer the one with a lower standard deviation. In other words, a loan that has a higher Sharpe Ratio. It should be, however, noted that Sharpe Ratio assumes distributional normality of returns. This is a standard assumption in finance which means that different stocks, for example, have normally distributed returns and that they differ in the mean and variance only, while kurtosis and skewness are the same (normal distribution has a skewness of zero- it is a symmetrical distribution, and kurtosis, the fatness of tails, of 3). Indeed, Jarque-Bera test is a way to statistically test for normal distribution of a random variable and it is constructed from skewness and kurtosis estimates. In our case, we find an assumption of normally distributed returns unnecessarily strong. We would expect that some loan applicants have a higher probability of default (hence a fatter tail of the probability distribution) and distributions not to be symmetrical. Hence, mean and variance do not sufficiently describe the distribution, and we need a measure that includes skewness and kurtosis. Pezi(1996) proposes an adjustment of Sharpe Ratio that penalizes Sharpe Ratio for a negative skewness and excess kurtosis. This sounds applicable in our case, as we expect individual distributions to have a longer left than a right, tail. The Adjusted Sharpe Ratio will be explained in more detail later in this article.

The structure of this article is as follows:
First, we will explain the theory behind Bayesian neural networks, including different approaches to it. Then we'll move towards application by introducing the lending club dataset. Finally, we present a novel method to evaluate the results and compare the models. During the blog post, relevant chunks of code will be included and briefly explained.