### FICO Rating Map for Credit Risk Assessment
- The goal is to transform continuous FICO scores into discrete rating buckets using a general, data-driven method.
- We use a log-likelihood optimization approach to create buckets that capture default risk patterns.

### Problem solved
- Traditional loan approval systems rely on static FICO thresholds that may not reflect actual default risk.
- This approach eliminates guesswork by creating dynamic, optimized score buckets aligned with real outcomes.

### How It Can Be Used
- Banks and lenders can use the resulting rating map to automate risk scoring for new applicants.
- It enables consistent, explainable, and statistically sound decision-making in credit approval pipelines.

### Quantifiable Use
- By rejecting high-risk applicants based on rating, lenders can reduce default-related losses by up to X amount annually annually.
- The method supports better capital allocation and regulatory risk reporting via quantified credit segmentation.

### Process
- Load and clean historical loan data with FICO scores and default flags.
- Round FICO scores to reduce complexity and group similar values.
- Count total applicants and defaults at each FICO level.
- Use dynamic programming to explore all possible score splits.
- For each candidate split, compute the log-likelihood of default patterns.
- Recursively select splits that maximize the overall log-likelihood.
- Backtrack through the DP table to retrieve optimal bucket boundaries.
- Label the resulting score intervals as “Rating 1” (best) to “Rating N” (worst).
- Store the rating map and optionally assign ratings back to each applicant.
- Integrate ratings into risk models, dashboards, or policy rules for loan approval.

In [4]:
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("Task 3 and 4_Loan_Data.csv")

# Drop NA and sort
df = df[['fico_score', 'default']].dropna().sort_values(by='fico_score').reset_index(drop=True)

# Round FICO scores to reduce complexity
df['fico_score_rounded'] = df['fico_score'].round().astype(int)

# Group by rounded FICO scores
grouped = df.groupby('fico_score_rounded').agg(total=('default', 'count'), defaults=('default', 'sum')).reset_index()
fico_vals = grouped['fico_score_rounded'].values
total_vals = grouped['total'].values
default_vals = grouped['defaults'].values
m = len(fico_vals)
k = 5  # number of buckets

# Compute prefix sums
cum_total = np.zeros(m + 1)
cum_defaults = np.zeros(m + 1)
for i in range(m):
    cum_total[i + 1] = cum_total[i] + total_vals[i]
    cum_defaults[i + 1] = cum_defaults[i] + default_vals[i]

# Log-likelihood function for range [i, j]
def log_likelihood_range(i, j):
    ni = cum_total[j + 1] - cum_total[i]
    ki = cum_defaults[j + 1] - cum_defaults[i]
    if ni == 0 or ki == 0 or ki == ni:
        return 0
    pi = ki / ni
    return ki * np.log(pi) + (ni - ki) * np.log(1 - pi)

# Initialize DP arrays
dp = np.full((m + 1, k + 1), -np.inf)
path = np.zeros((m + 1, k + 1), dtype=int)
dp[m][0] = 0  # base case

# Fill DP table
for b in range(1, k + 1):
    for i in range(m - 1, -1, -1):
        for j in range(i + 1, m + 1):
            ll = log_likelihood_range(i, j - 1)
            if dp[j][b - 1] + ll > dp[i][b]:
                dp[i][b] = dp[j][b - 1] + ll
                path[i][b] = j

# Recover bucket boundaries
boundaries = []
i, b = 0, k
while b > 0:
    j = path[i][b]
    boundaries.append(fico_vals[j - 1])
    i, b = j, b - 1

# Create FICO rating map
bucket_edges = [fico_vals[0]] + boundaries
bucket_edges = sorted(set(bucket_edges))

bucket_map = {}
for i in range(len(bucket_edges) - 1):
    bucket_map[f"Rating {i+1}"] = (int(bucket_edges[i]), int(bucket_edges[i + 1]))

# Output the rating map
bucket_df = pd.DataFrame(bucket_map.items(), columns=["Rating", "FICO Range"])
print(bucket_df)


     Rating  FICO Range
0  Rating 1  (408, 520)
1  Rating 2  (520, 580)
2  Rating 3  (580, 640)
3  Rating 4  (640, 696)
4  Rating 5  (696, 850)
