### Optimized AI Robo-Credit Underwriter with Multi-Agent RL & Risk-Aware Learning

#### Introduction
Traditional credit underwriting relies on rule-based scoring models and statistical risk assessments, which often fail to adapt to dynamic financial environments. With the advancement of Artificial Intelligence (AI), particularly Reinforcement Learning (RL), credit decision-making can be enhanced by autonomous agents that optimize risk-return trade-offs in real-time. This project develops an AI-driven Robo-Credit Underwriter, integrating Multi-Agent Reinforcement Learning (PPO & DQN) to balance credit approvals and risk management. Additionally, Risk-Aware Policy Learning (Conditional Value at Risk - CVaR & Bayesian estimation) ensures responsible lending. A FastAPI backend serves real-time predictions, while a Streamlit UI enables interactive loan applications. This approach modernizes credit risk evaluation by leveraging AI for optimal, explainable, and adaptive decision-making.

#### Literature Review
AI and machine learning have transformed credit risk assessment, with deep learning and reinforcement learning emerging as key methodologies. Traditional models, such as logistic regression and decision trees, struggle with non-linear credit risk patterns (Lessmann et al., 2015). Deep learning improves default prediction accuracy but lacks interpretability (Leong et al., 2022). Reinforcement learning has been explored for financial decision-making (Dixon et al., 2020), with Deep Q-Learning (Mnih et al., 2015) and PPO (Schulman et al., 2017) demonstrating robust adaptability. Multi-Agent RL (Li et al., 2021) further enhances risk-aware credit scoring. SHAP (Lundberg & Lee, 2017) provides interpretability, bridging AI and regulatory transparency. Risk-aware policy learning via CVaR (Rockafellar & Uryasev, 2000) ensures downside risk mitigation, making AI credit underwriting safer and more robust.

#### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import gymnasium as gym
from fastapi import FastAPI
from gymnasium import spaces
from stable_baselines3 import PPO, DQN
import pickle
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from pydantic import BaseModel
from risk_policy import risk_adjusted_approval
import streamlit as st
import requests

#### I. Generate Synthetic Credit Data

In [2]:
np.random.seed(42)

N = 5000  # Number of samples

data = {
    "credit_score": np.random.randint(300, 850, N),
    "income": np.random.randint(20000, 150000, N),
    "debt_to_income": np.random.uniform(0.1, 0.9, N),
    "age": np.random.randint(21, 70, N),
    "employment_years": np.random.randint(0, 40, N),
    "loan_amount": np.random.randint(5000, 50000, N),
    "interest_rate": np.random.uniform(1, 15, N),
    "approved": np.random.choice([0, 1], N, p=[0.3, 0.7]),
}

df = pd.DataFrame(data)
df["default_risk"] = np.where(df["credit_score"] < 600, np.random.uniform(0.4, 0.9, N), np.random.uniform(0.01, 0.3, N))
df.to_csv("synthetic_credit_data.csv", index=False)
print("Synthetic dataset generated: synthetic_credit_data.csv")


Synthetic dataset generated: synthetic_credit_data.csv


- #### Train Credit Approval Model

In [3]:
# Load dataset
df = pd.read_csv("synthetic_credit_data.csv")  

# Features & Target
X = df.drop(columns=["approved"])  
y = df["approved"]  

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Random Forest Model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

# Save the model
with open("credit_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("credit_model.pkl saved successfully!")


credit_model.pkl saved successfully!


#### II. Train Multi-Agent RL (PPO & DQN)

In [4]:
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "API is running!"}

- #### Reinforcement Learning Environment
This custom Gym environment trains RL agents.

In [5]:
# Load synthetic data
df = pd.read_csv("synthetic_credit_data.csv")

class CreditEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.observation_space = spaces.Box(low=0, high=1, shape=(8,), dtype=np.float32)
        self.action_space = spaces.Discrete(2)  # Approve (1) / Reject (0)
        self.data = df.sample(frac=1).reset_index(drop=True)
        self.index = 0

        # Define state & action space
        self.observation_space = spaces.Box(low=0, high=1, shape=(8,), dtype=np.float32)
        self.action_space = spaces.Discrete(2)  # Approve (1) or Reject (0)

    def reset(self, seed=None, options=None):
        self.index = 0
        obs = self.data.iloc[self.index, :-2].values.astype(np.float32)
        
        #  Ensure correct shape (8,)
        if obs.shape[0] != 8:
            obs = np.pad(obs, (0, 8 - obs.shape[0]), mode='constant')

        return obs, {}

    def step(self, action):
        row = self.data.iloc[self.index]
        reward = (1 if action == row["approved"] else -1) - (3 if row["default_risk"] > 0.5 else 0)
        self.index += 1
        done = self.index >= len(self.data)

        if not done:
            obs = self.data.iloc[self.index, :-2].values.astype(np.float32)
            # Ensure correct shape (8,)
            if obs.shape[0] != 8:
                obs = np.pad(obs, (0, 8 - obs.shape[0]), mode='constant')
        else:
            obs = np.zeros(8)

        return obs, reward, done, False, {}

# Test the environment
if __name__ == "__main__":
    env = CreditEnv()
    print("CreditEnv initialized!")


CreditEnv initialized!


- #### Train RL Agents 
This script trains PPO & DQN RL agents.

In [6]:
# Initialize environment
env = CreditEnv()

# Use PPO and DQN (since DDPG requires continuous actions)
ppo_agent = PPO("MlpPolicy", env, verbose=1)
dqn_agent = DQN("MlpPolicy", env, verbose=1)

# Train the agents
ppo_agent.learn(total_timesteps=50000)
dqn_agent.learn(total_timesteps=50000)

# Save models
ppo_agent.save("ppo_credit_agent")
dqn_agent.save("dqn_risk_control_agent")

print("Multi-Agent RL Training Complete!")

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
-----------------------------
| time/              |      |
|    fps             | 625  |
|    iterations      | 1    |
|    time_elapsed    | 3    |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 438        |
|    iterations           | 2          |
|    time_elapsed         | 9          |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.01858196 |
|    clip_fraction        | 0.187      |
|    clip_range           | 0.2        |
|    entropy_loss         | -0.676     |
|    explained_variance   | -0.00324   |
|    learning_rate        | 0.0003     |
|    loss                 | 45.9       |
|    n_updates            | 

#### III. Risk-Aware Policy Learning (CVaR & Bayesian Estimation)

In [7]:
def compute_cvar(returns, alpha=0.05):
    return np.mean(np.sort(returns)[:int(len(returns) * alpha)])

def risk_adjusted_approval(default_prob, loan_amount, interest_rate):
    return int((1 - default_prob) * loan_amount * (1 + interest_rate) - default_prob * loan_amount > 5000)


#### IV. FastAPI Backend

In [8]:
# !run: uvicorn api:app --reload

#### V. Streamlit UI

In [9]:
# !run:  streamlit run app.py

#### Further Advancement:

- Extend to include real-time financial risk indicators (e.g., market trends), make the model robust, yet simple and not over-complicated for practical application!

### References
**Dixon, M., Halperin, I., & Bilokon, P. (2020).** Machine learning in finance: From theory to practice. Springer. <br>
**Leong, C., Tan, B., Xiao, X., & Tan, F. (2022).** Explainable AI in credit scoring: Challenges and opportunities. Journal of Financial Innovation, 8(1), 22–37.<br>
**Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015)**. Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 66(1), 1–23.<br>
**Li, T., Yang, Y., & Wang, J. (2021).** Multi-agent reinforcement learning in finance: Applications and challenges. Finance and AI Review, 10(3), 55–78.<br>
**Lundberg, S. M., & Lee, S. I. (2017).** A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS), 30.<br>
**Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015).** Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.<br>
**Rockafellar, R. T., & Uryasev, S. (2000).** Optimization of conditional value-at-risk. Journal of Risk, 2(3), 21–42.<br>
**Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017)**. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.<br>
**Van Gestel, T., & Baesens, B. (2009)**. Credit risk management: Basic concepts: Financial risk components, rating analysis, models, economic and regulatory capital. Oxford University Press.<br>
**Zhou, Y., & Wang, Z. (2021)**. Deep reinforcement learning for credit risk assessment: A comparative study. Expert Systems with Applications, 174, 114670.<br>