# LLM-Powered Credit Quality Assurance (QA) System

## Introduction

This notebook implements an **LLM-powered Credit Quality Assurance (QA) system** that combines machine-learning credit risk models with bank credit policy documents to support explainable and policy-aligned lending decisions.

In modern banks, credit models such as LightGBM or XGBoost are used to estimate the **Probability of Default (PD)** for each loan. However, PD alone is not sufficient for operational decision-making. Credit Quality Assurance teams must also ensure that decisions are **consistent with internal credit policies**, **risk appetite**, and **regulatory standards**.

This notebook represents the **AI decision-support layer** of the credit process. It takes as input:

- A table of **QA loan cases** generated by a PD model and rule-based policy engine  
- **Credit policy documents** (PDFs) such as underwriting guidelines and risk appetite statements  

Using **Retrieval-Augmented Generation (RAG)**, the system retrieves the most relevant policy text for each loan and uses a large language model (LLM) to produce:

- An **AI credit decision**  
- A **policy compliance status**  
- A **human-readable explanation**  
- A **recommended next action**  

## System Architecture

Loan Data

   ↓

PD Model + SHAP

   ↓

Policy Rules → QA Cases

   ↓

RAG (Policy PDFs)

   ↓

LLM → AI QA Decision

In [None]:
import pandas as pd
import re

from langchain_community.document_loaders import PyPDFLoader,DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAI,OpenAIEmbeddings,ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEmbeddings
import json


In [37]:
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)

In [2]:
# Load the QA cases dataset
qa_cases = pd.read_csv(r'C:\Users\USER\Desktop\data set\lending club data set\data\qa_cases.csv')
qa_cases.head()

Unnamed: 0,case_id,PD,Action,dti,loan_to_income,revol_util,emp_length,verification_status,top_risk_factors,segment_default_rate,pd_vs_segment
0,1452671,0.5319,Review/Reject,0.2749,0.200994,44.4,10.0,Source Verified,"int_rate ,dti ,acc_open_past_24mths ,term ,hom...",0.3433,0.1886
1,464451,0.268195,QA,0.2075,0.128205,45.9,10.0,Source Verified,"int_rate ,acc_open_past_24mths ,loan_to_income...",0.1645,0.103695
2,809706,0.288979,QA,0.0603,0.25,45.9,3.0,Source Verified,"int_rate ,avg_cur_bal ,acc_open_past_24mths ,t...",0.1645,0.124479
3,1538286,0.099637,QA,0.2784,0.105448,9.4,10.0,Source Verified,"int_rate ,revol_util ,dti ,acc_open_past_24mth...",0.1645,-0.064863
4,822739,0.366241,Review/Reject,0.388,0.416667,86.3,10.0,Not Verified,"dti ,avg_cur_bal ,acc_open_past_24mths ,loan_t...",0.3433,0.022941


In [3]:
# Loda PDF documtents
loader = DirectoryLoader(
    r"C:\Users\USER\Desktop\books\policy",
    glob='*.pdf',
    loader_cls=PyPDFLoader
)

documensts= loader.load()
print(f"loaded {len(documensts)} pages")

loaded 11 pages


In [4]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50)

chunks = text_splitter.split_documents(documensts)

In [5]:
# Embed the documents and create a vector store
embedding = HuggingFaceEmbeddings(model_name = "sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embedding)

# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":4})

In [28]:
# set the llm model
llm = ChatOpenAI(model="gpt-4o-mini",temperature=0)

In [None]:
# Function to extract JSON from text
def extract_json(text):
    text = text.strip()
    text = re.sub(r"^```json", "", text)
    text = re.sub(r"^```", "", text)
    text = re.sub(r"```$", "", text)
    return text.strip()

qa

In [46]:
# Creat a QA loop with the first 5 cases
results = []

for _, case in qa_cases.head(10).iterrows():
    query = f"""
    Loan ID: {case.case_id}
    PD: {case.PD:.2%}
    Decision Segment: {case.Action}
    Segment default rate: {case.segment_default_rate:.2%}
    Top risk drivers: {case.top_risk_factors}

    Explain the credit decision using the bank's policy.
    """

    # Retrieve relevant policy chunks
    docs = retriever.invoke(query)
    context = "\n\n".join([d.page_content for d in docs])

    # Build structured prompt
    prompt = f"""
    You are a Credit Quality Assurance (QA) analyst.

    Bank Policy:
    {context}

    Loan Case:
    {query}

    Return your answer strictly in this JSON format:
{{
  "ai_decision": "...",
  "policy_status": "...",
  "ai_reason": "...",
  "next_step": "..."
}}
"""

    response = llm.invoke(prompt).content

    try:
        clean = extract_json(response)
        decision = json.loads(clean)
    except:
        decision = {
            "ai_decision": "Parsing Error",
            "policy_status": "Unknown",
            "ai_reason": response,
            "next_step": "Manual review"
        }

    results.append({
        "loan_id": case.case_id,
        "pd": case.PD,
        "segment": case.Action,
        "segment_default_rate": case.segment_default_rate,
        "Top Risk Drivers": case.top_risk_factors,
        "ai_decision": decision["ai_decision"],
        "policy_status": decision["policy_status"],
        "next_step": decision["next_step"],
        "ai_reason": decision["ai_reason"]
    })

In [47]:
ai_decison = pd.DataFrame(results)
ai_decison.index = ai_decison.index +1
ai_decison

Unnamed: 0,loan_id,pd,segment,segment_default_rate,Top Risk Drivers,ai_decision,policy_status,next_step,ai_reason
1,1452671,0.5319,Review/Reject,0.3433,"int_rate ,dti ,acc_open_past_24mths ,term ,home_ownership_RENT",Reject,Unacceptable credit risk due to high PD,Reject the loan application and notify the applicant.,"The Probability of Default (PD) is 53.19%, which is greater than the threshold of 30%. This indicates a very high risk of default according to the bank's policy. Additionally, the segment default rate of 34.33% further supports the decision to reject the loan."
2,464451,0.268195,QA,0.1645,"int_rate ,acc_open_past_24mths ,loan_to_income ,avg_cur_bal ,application_type_Joint App",Decline,Unacceptable credit risk due to PD below threshold,Mandatory QA Review required due to the decision segment being QA.,"The probability of default (PD) is 26.82%, which is below the unacceptable threshold of 30%. Although the segment default rate is 16.45%, the PD does not meet the minimum requirement for approval. Additionally, the identified top risk drivers may contribute to the overall risk assessment."
3,809706,0.288979,QA,0.1645,"int_rate ,avg_cur_bal ,acc_open_past_24mths ,term ,dti",Review Required,Mandatory QA Review,Route to QA for further assessment.,"The PD is below 30% but above 5%, and the loan falls into the QA segment due to a higher default rate. Therefore, it requires a review."
4,1538286,0.099637,QA,0.1645,"int_rate ,revol_util ,dti ,acc_open_past_24mths ,loan_to_income",Review or route to QA,Fails mandatory QA review conditions,Route the loan case to QA for further review.,"The PD of 9.96% is above the acceptable threshold of 5%, and the segment default rate of 16.45% indicates a higher risk. Additionally, the top risk drivers suggest potential issues with interest rate, revolving utilization, and debt-to-income ratio."
5,822739,0.366241,Review/Reject,0.3433,"dti ,avg_cur_bal ,acc_open_past_24mths ,loan_to_income ,bc_util",Reject,Unacceptable credit risk due to PD exceeding 30%,Reject the loan application based on the unacceptable credit risk.,"The Probability of Default (PD) is 36.62%, which is greater than the acceptable threshold of 30%. This indicates a higher risk of default, aligning with the bank's policy that stipulates such cases represent unacceptable credit risk."
6,549665,0.229531,QA,0.1645,"int_rate ,loan_to_income ,total_rev_hi_lim ,acc_open_past_24mths ,purpose_home_improvement",Decline,Unacceptable credit risk due to PD below threshold,Review other factors and consider alternative options for the applicant.,"The Probability of Default (PD) is 22.95%, which is below the acceptable threshold of 30%. Although the segment default rate is higher at 16.45%, the PD model indicates a lower risk than the policy threshold, leading to a decline in the application."
7,521926,0.083505,QA,0.1645,"int_rate ,avg_cur_bal ,loan_to_income ,mths_since_last_delinq ,acc_open_past_24mths",Decline,Unacceptable credit risk due to PD < 30%,Review the application for potential reconsideration or provide alternative options to the applicant.,"The probability of default (PD) is 8.35%, which is below the acceptable threshold of 30%."
8,383370,0.230352,QA,0.1645,"term ,loan_to_income ,avg_cur_bal ,loan_amnt ,int_rate",Decline,Mandatory QA Review Required,Conduct a thorough QA review to assess the risk factors and determine if the application can be approved with additional conditions or if it should be declined.,"The PD of 23.04% is below the unacceptable threshold of 30%, but the decision segment is flagged for QA due to a higher segment default rate of 16.45% and the presence of top risk drivers."
9,707053,0.1442,QA,0.1645,"avg_cur_bal ,int_rate ,application_type_Joint App ,loan_to_income ,loan_amnt",Decline,High Risk,Recommend further review by a credit analyst for potential exceptions or alternative solutions.,"The probability of default (PD) for this loan application is 14.42%, which is significantly higher than the segment default rate of 16.45%. Additionally, the identified top risk drivers, including average current balance, interest rate, application type (Joint App), loan-to-income ratio, and loan amount, indicate a higher likelihood of default."
10,1629238,0.397524,Review/Reject,0.3433,"int_rate ,dti ,revol_util ,total_rev_hi_lim ,earliest_cr_line_Oct-1999",Reject,Unacceptable Credit Risk,Notify the applicant of the rejection and provide reasons based on the bank's credit policy.,"The probability of default (PD) is 39.75%, which is greater than the 30% threshold defined in the bank's policy, indicating a very high risk. Additionally, the segment default rate of 34.33% further supports the decision to reject the loan application."


# AI Credit QA Decision Output

The table below shows the **LLM-generated Credit Quality Assurance (QA) decisions** for a sample of high-risk and borderline loan applications.

Each row represents a loan that was flagged by the machine-learning risk model and credit policy rules.  
Using **Retrieval-Augmented Generation (RAG)**, the LLM reviewed the relevant credit policy documents and produced:

- An **AI decision** (e.g. Reject, Approve, or Route to QA)  
- A **policy compliance status**  
- A **recommended next action**  
- A **human-readable explanation** linking the borrower’s risk profile to the bank’s credit policy  

This output demonstrates how large language models can be used as a **governed decision-support layer** on top of traditional credit-risk models, enabling faster, more consistent, and more explainable QA reviews.

In [None]:
# Save the AI decisions to a CSV file
ai_decision.to_csv("ai_decisions.csv", index=False)