# $\underline{\textbf{Credit Risk}}$

Credit Risk is the probability that a borrower will fail to repay a loan / meet the actual debt obligations.

**Key Points**


*   Borrower Defaults: This means that th borrower might stop making paymnts which is a concern.
*   Lenders / Investors: This risk is mostly relevent to banks, bondholders or anyone who extend credit.
*   Impact: If the borrower default the lender can lose some or all loaned amount.


Mathematically, credit risk is typically evaluated in terms of expected loss (EL), which incorporates:

1.   Probability of Default (PD)
2.   Loss Given Default (LGD)
3.   Exposure at Default (EAD)


| Term    | Description |
|----------|-----|
| PD (Probability of Default)    | The likelihood that the borrower will default over a given time horizon (e.g., 1 year). It ranges from 0 to 1.  | 
| LGD (Loss Given Default)     | The proportion of the exposure the lender expects to lose if a borrower defaults. Typically expressed as a % of the exposure.  |
| EAD (Exposure at Default)  | The total value the lender is exposed to at the time of default. It includes principal + accrued interest + fees.  |


The Credit Risk formulae is normally given by:

$$
EL=PD×LGD×EAD
$$

Where $EL$ is expected Loss.










-----------------

**Example 1:**

----------

Suppose a bank lends $R100,000$ to a company, and the bank estimates:

PD = 5% (0.05)

LGD = 60% (0.60)

EAD = $100,000

--------

**Example 2: Credit Risk with Machine Learning Application**

-----

In [1]:
def Expected_Loss(PD,LGD, EAD):

  EL = PD * LGD * EAD

  print(f"Bank expects an average loss of R{EL} from this loan due to credit risk")

PD = 0.05

LGD = 0.60

EAD = 100000.0


Expected_Loss(PD, LGD, EAD)

Bank expects an average loss of R3000.0 from this loan due to credit risk


--------

**Example 2: Credit Risk with Machine Learning Application**

-----

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [4]:
# 1. Simulate dataset
np.random.seed(42)
n_samples = 10

data = pd.DataFrame({
    'credit_score': np.random.randint(300, 850, size=n_samples),
    'loan_amount': np.random.randint(1000, 50000, size=n_samples),
    'income': np.random.randint(20000, 150000, size=n_samples),
    'loan_term_months': np.random.choice([12, 24, 36, 48, 60], size=n_samples),
})

data

Unnamed: 0,credit_score,loan_amount,income,loan_term_months
0,402,22962,124724,24
1,735,48191,103104,48
2,570,45131,73707,48
3,406,17023,105305,36
4,371,42090,48693,48
5,320,2685,91932,48
6,421,1769,147723,12
7,766,3433,113016,36
8,514,6311,126970,60
9,630,38819,45658,36


In [5]:
# 2. Create a target variables: 1 = Default, 0 = No default
# Higher credit score and income => less chance of default
# Higher loan amount => more chance of default


data['default'] = ( (data['credit_score'] < 600).astype(int) |
    (data['income'] < 40000).astype(int) |
    (data['loan_amount'] > 40000).astype(int)
).astype(int)

data

Unnamed: 0,credit_score,loan_amount,income,loan_term_months,default
0,402,22962,124724,24,1
1,735,48191,103104,48,1
2,570,45131,73707,48,1
3,406,17023,105305,36,1
4,371,42090,48693,48,1
5,320,2685,91932,48,1
6,421,1769,147723,12,1
7,766,3433,113016,36,0
8,514,6311,126970,60,1
9,630,38819,45658,36,0


In [6]:
# 3. Split into training/testing
X = data[['credit_score', 'loan_amount', 'income', 'loan_term_months']]
y = data['default']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

In [7]:
# 4. Train a logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

In [8]:
# 5. Predict and evaluate
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.67      1.00      0.80         2

    accuracy                           0.67         3
   macro avg       0.33      0.50      0.40         3
weighted avg       0.44      0.67      0.53         3



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [9]:
# 6. Predict probability of default for a new borrower
new_borrower = pd.DataFrame([{
    'credit_score': 580,
    'loan_amount': 30000,
    'income': 35000,
    'loan_term_months': 36
}])

prob_default = model.predict_proba(new_borrower)[0][1]
print(f"\nPredicted Probability of Default for new borrower: {prob_default:.2%}")


Predicted Probability of Default for new borrower: 100.00%


References


1.   Basel Committee on Banking Supervision (BCBS). (2006). International 
 convergence of capital measurement and capital standards: A revised framework – comprehensive version. Bank for International Settlements.
https://www.bis.org/publ/bcbs128.htm

2.   Hull, J. C. (2018). Risk Management and Financial Institutions (5th ed.). Wiley. Chapter on credit risk covers Expected Loss, PD, LGD, and EAD in detail

3.   Lando, D. (2004). Credit Risk Modeling: Theory and Applications. Princeton University Press. A deep theoretical treatment of credit risk modeling approaches, including probability of default.

4.    Christodoulakis, G. A., & Satchell, S. (2008). The Analytics of Risk Model Validation. Academic Press. Discusses model validation in credit risk frameworks, including EL modeling.

In [10]:
# Create an empty list to store default flags
default_flags = []

# Loop through each row in the DataFrame
for index, row in data.iterrows():
    if row['credit_score'] < 600 or row['income'] < 40000 or row['loan_amount'] > 40000:
        default_flags.append(1)  # High risk (default)
    else:
        default_flags.append(0)  # Low risk (no default)

# Assign the result to a new column
data['default'] = default_flags
