# LLM-Powered Credit Quality Assurance (QA) System

# Introduction

This notebook presents an LLM-powered Credit Quality Assurance (QA) system that integrates machine learning–based credit risk models with bank credit policy documents to support explainable, consistent, and policy-aligned lending decisions.

In modern banking, machine learning models such as LightGBM or XGBoost are commonly used to estimate the Probability of Default (PD) for individual loan applications. While PD is a critical input to risk assessment, it is not sufficient on its own for operational decision-making. Credit Quality Assurance (QA) teams must also ensure that lending decisions align with internal credit policies, risk appetite frameworks, and regulatory requirements.

This notebook implements an AI decision-support layer within the credit approval process. It takes as inputs:

- A dataset of QA loan cases generated from a PD model and rule-based policy engine, and

- Credit policy documents (e.g., underwriting guidelines and risk appetite statements) in PDF format.

**Using Retrieval-Augmented Generation (RAG)**, the system retrieves the most relevant policy information for each loan case and leverages a large language model (LLM) to generate:

- An AI credit decision,

- A policy compliance assessment,

- A clear, human-readable justification, and

- A recommended next action for the case.

The objective of this system is not to replace human judgment, but to augment credit QA processes by improving consistency, transparency, and efficiency in policy interpretation and decision justification.

# System Architecture

Loan Data

   ↓

PD Model + SHAP

   ↓

Policy Rules → QA Cases

   ↓

RAG (Policy PDFs)

   ↓

LLM Reasoning Node
   ↓
Langraph Decision Orchestrator
- Approve (Auto)
-  Reject (Auto)
- Ask for More Info
- Escalate to Human Review
- Log / Audit / Feedback Loop

### Architecture Description

The system follows a two-layer decision framework that combines deterministic rules with LLM-based reasoning:

**1. Input and Feature Layer**

- Loan applications are first evaluated by a PD model (e.g., LightGBM), which produces a probability of default along with key risk drivers identified using SHAP.

- A rule-based policy engine applies initial screening and generates structured QA cases for further review.

**2. Retrieval-Augmented Generation (RAG) Layer**

- Relevant credit policy documents are retrieved from a vector database based on the loan case details.

- This ensures that the LLM bases its reasoning on the most pertinent and up-to-date policy information.

**3. LLM Reasoning Layer**

- A large language model analyzes the retrieved policies alongside the loan case features.

- It applies structured decision logic to determine whether a loan should be Approved, Approved with Caution, Rejected, or Escalated for Manual Review.

**4. Decision Orchestration (LangGraph)**

- LangGraph coordinates the workflow, ensuring that cases pass through the risk gate, policy retrieval, LLM reasoning, and parsing stages in a controlled manner.

- It also supports conditional routing based on rule-based or AI-based decisions.

**5. Audit and Governance Layer**

- Every decision is logged in an audit table, capturing key variables such as PD, segment, risk drivers, AI decision, justification, timestamp, model version, and decision source.

- This enables transparency, traceability, and post-hoc evaluation of system performance

In [None]:
import pandas as pd
import re
from datetime import datetime
import os

import chromadb
import langgraph
from langgraph.graph import StateGraph
from langchain_community.document_loaders import PyPDFLoader,DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAI,OpenAIEmbeddings,ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEmbeddings
from typing import TypedDict

import json

In [2]:
pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)

In [3]:
# Load the QA cases dataset
qa_cases = pd.read_csv(r'C:\Users\USER\Desktop\data set\lending club data set\data\qa_cases.csv')
qa_cases.head()

Unnamed: 0,case_id,PD,loan_amnt,Action,dti,loan_to_income,revol_util,emp_length,verification_status,LGD,expected_loss,top_risk_factors,segment_default_rate,pd_vs_segment
0,1452671,0.5319,18000,Review/Reject,0.2749,0.200994,44.4,10.0,Source Verified,0.75,7180.650779,"int_rate ,dti ,acc_open_past_24mths ,term ,home_ownership_RENT",0.3418,0.1901
1,464451,0.268195,10000,QA,0.2075,0.128205,45.9,10.0,Source Verified,0.45,1206.878739,"int_rate ,avg_cur_bal ,application_type_Joint App ,acc_open_past_24mths ,loan_to_income",0.1617,0.106495
2,809706,0.288979,30000,QA,0.0603,0.25,45.9,3.0,Source Verified,0.45,3901.221581,"int_rate ,avg_cur_bal ,term ,dti ,acc_open_past_24mths",0.1617,0.127279
3,1538286,0.099637,6000,QA,0.2784,0.105448,9.4,10.0,Source Verified,0.45,269.020804,"int_rate ,revol_util ,dti ,acc_open_past_24mths ,inq_last_6mths",0.1617,-0.062063
4,822739,0.366241,20000,Review/Reject,0.388,0.416667,86.3,10.0,Not Verified,0.65,4761.137546,"dti ,acc_open_past_24mths ,avg_cur_bal ,bc_util ,loan_to_income",0.3418,0.024441


In [4]:
# Loda PDF documtents
loader = DirectoryLoader(
    r"C:\Users\USER\Desktop\books\policy",
    glob='*.pdf',
    loader_cls=PyPDFLoader
)

documensts= loader.load()
print(f"loaded {len(documensts)} pages")

loaded 11 pages


In [5]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50)

chunks = text_splitter.split_documents(documensts)

In [6]:
# Embed the documents and create a vector store
embedding = HuggingFaceEmbeddings(model_name = "sentence-transformers/all-MiniLM-L6-v2")
vector_store = Chroma.from_documents(chunks, embedding,persist_directory="./chroma_policy_db")

# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":4})

In [7]:
# set the llm model
llm = ChatOpenAI(model="gpt-4o-mini",temperature=0)

In [38]:
# Create the workflow for langgraph
class QAState(TypedDict):
    case:dict
    case_text: str
    policy: str
    llm_output: str
    decision: dict
    meta:dict

In [39]:
# function to get the current time stamp
def now():
    return datetime.utcnow().isoformat()

In [137]:
# Function to laod a case es
def load_case_node(state):
    # read file
    df = pd.read_csv(r"C:\Users\USER\Desktop\data set\lending club data set\data\qa_cases.csv")

    row = df.iloc[0]

    # put it into the state
    state["case"] = {
        "case_id": row["case_id"],
        "PD": row["PD"],
        "segment": row["Action"],
        "expected_loss": row['expected_loss'],
        "segment_default_rate": row["segment_default_rate"],
        "top_risk_drivers": [
            x.strip() for x in str(row["top_risk_factors"]).split(",") if x.strip()
        ],
        'pd_vs_segment': row['pd_vs_segment']
    }

    return state

In [173]:
# Create a risk gate for auto rejects

def risk_gate(state):

    pd = state["case"]["PD"]
    segment = state["case"]["segment"]
    el = state["case"]['expected_loss']
    pd_vs_segment = state["case"]["pd_vs_segment"] 


    if pd > 0.4:
        return {"decision": {
                "ai_decision": "Reject",
                "policy_status": "High PD",
                "ai_reason": f"PD of {pd:.2%} exceeds threshold.",
                "next_step": "Auto-reject"
            },
            "decision_source": "risk_gate"
        }

    elif segment == "Review/Reject":
        return {"decision": {
                "ai_decision": "Reject",
                "policy_status": "Hard Risk Band",
                "ai_reason": "Loan is in Review/Reject segment based on PD and portfolio risk.",
                "next_step": "Auto-reject"
            },
            "decision_source": "risk_gate"
        }

    elif el > 5000:
        return {
            "decision": {
                "ai_decision": "Reject",
                "policy_status": "High Expected Loss",
                "ai_reason": f"Expected loss of {el} exceeds risk tolerance.",
                "next_step": "Decline or request collateral"
            },
            "decision_source": "risk_gate"
        }
    
    elif pd_vs_segment >= 0.1:
        return {
            "decision": {
                "ai_decision": "Reject",
                "policy_status": "PD vs Segment Rate",
                "ai_reason": f"PD is {pd_vs_segment:.2f} times the segment default rate.",
                "next_step": "Auto-reject"
            },
            "decision_source": "risk_gate"
        }

    
    
    return {}  

## Risk Gate (Rule-Based Auto-Rejection Layer)

- Purpose: Acts as a fast, explainable safety filter before any LLM reasoning.

- Type: Deterministic and rule-based.

- Decision Source: When triggered, sets decision_source = "risk_gate".

### Auto-Reject Conditions

A case is automatically rejected if any of the following holds:

- High PD Rule

    - If PD > 40%, the case is rejected as “High PD.”

- Hard Risk Band Rule

    - If the loan segment is “Review/Reject,” it is automatically rejected due to portfolio risk policy.

- High Expected Loss Rule

    - If Expected Loss > 5,000, the case is rejected or flagged for collateral consideration.

- PD vs Segment Rule

    - If PD ≥ 0.1 × segment default rate, the case is rejected due to relative underperformance.

- If No Rule is Triggered

    - The case passes through to the RAG + LLM layer for policy-based reasoning.

In [42]:
# function to retrieve policy
def retrieve_policy(state):
    docs = retriever.invoke(state['case_text'])
    policy = "\n\n".join([doc.page_content for doc in docs])
    return {"policy":policy}

In [139]:
# function to run the prompt
def run_qa(state):
    prompt = f"""
You are credit QA analyst.

Use ONLY the retrieved Bank Policy below to justify your decision.
If the policy does not clearly support your decision, choose "Manual Review".

Bank Policy:
{state['policy']}

Loan_case:
{state['case_text']}

Important decision logic:
- if pd is **below 30% and below the segment default rate**, and no other risk factors are present, the loan can be **Approved**.
- if pd is **below 30% and above the segment default rate**, but other risk factors are present, the loan should be **Approved with Caution**.
- Use **Reject** only when absolutely necessary, providing clear justifications based on policy.
- Use **Manual Review** when the case is borderline or requires human judgment.

Your "ai_decision" must be one of the following:
- Approved
- Approved with Caution
- Manual Review
- Reject

Return JSON:
    {{
      "ai_decision": "...",
      "policy_status": "...",
      "ai_reason": "...",
      "next_step": "..."
    }}
    """

    output = llm.invoke(prompt)
    return {"llm_output": output.content if hasattr(output, "content") else output}

In [140]:
# function to parse the JSON output
def parse_json(state):
    raw = state["llm_output"]

    try:
        # Extract the JSON object from the text
        match = re.search(r"```json\s*(\{.*?\})\s*```", raw, re.DOTALL)

        if match:
            json_str = match.group(1)
        else:
            match = re.search(r"(\{.*\})", raw, re.DOTALL)
            if not match:
                raise ValueError("No JSON object found")
            json_str = match.group(1)


        # Try to load JSON
        parsed = json.loads(json_str)

    except Exception:
        parsed = {
            "ai_decision": "Parsing Error",
            "policy_status": "Unknown",
            "ai_reason": raw,
            "next_step": "Manual review"
        }

    decision_map = {
        "reject": "Reject",
        "decline": "Reject",
        'deny': "Reject",
        "auto-reject": "Reject",
        "denied": "Reject"
    }

    if isinstance(parsed.get("ai_decision"), str):
        key = parsed["ai_decision"].lower()
        if key in decision_map:
            parsed["ai_decision"] = decision_map[key]

    return {"decision": parsed, "decision_source": "llm"}

In [None]:
# function to save audit record
def save_audit(record):
    
    df = pd.DataFrame([record])

    if os.path.exists("qa_decision_audit.csv"):
        df.to_csv("qa_decision_audit.csv", mode="a", header=False, index=False)
    else:
        df.to_csv("qa_decision_audit.csv", index=False)

In [176]:
# audit node for saving audit records
def audit_node(state):

    decision = state.get("decision", {
        "ai_decision": "Missing",
        "policy_status": "Unknown",
        "ai_reason": "No decision generated",
        "next_step": "Manual review",
    })


    audit_record = {
        "case_id": state['case']['case_id'],
        "pd":state['case']['PD'],
        "segment":state['case']['segment'],
        "segment_default_rate":state['case']['segment_default_rate'],
        "top_risk_drivers":state['case']['top_risk_drivers'],

        "ai_decision":decision['ai_decision'],
        "policy_status":decision['policy_status'],
        "ai_reason":decision['ai_reason'],
        "next_step":decision['next_step'],

        "decision_source": state.get(
            "decision_source",
            "llm" if "decision" in state else "unknown"
        ),

        "timestamp": now(),
        "model_version": "gpt-4o-mini",
        "graph_version": " 1.0.1"
              }
    save_audit(audit_record)

    return state

In [177]:
# Build the langgraph work flow
graph = StateGraph(QAState)

graph.add_node("load_case",load_case_node)
graph.add_node("risk_gate",risk_gate)
graph.add_node("retrieve",retrieve_policy)
graph.add_node("qa",run_qa)
graph.add_node("parse",parse_json)
graph.add_node("audit",audit_node)

graph.set_entry_point("load_case")
graph.add_edge("load_case","risk_gate")
graph.add_edge("risk_gate","retrieve")
graph.add_edge("retrieve","qa")
graph.add_edge("qa","parse")
graph.add_edge("parse","audit")

app = graph.compile()

qa

In [178]:
# Creat a QA loop with the first 10 cases
results = []

for _, case in qa_cases.head(10).iterrows():

    clean_risk_factors = [
        x.strip() for x in str(case.top_risk_factors).split(",") if x.strip()
    ]

    case_text = f"""
    case_id: {case.case_id}
    PD: {case.PD:.2%}
    LGD : {case.LGD:.2%}
    Expected Loss : {case.expected_loss:.2}$
    Decision Segment: {case.Action}
    Segment default rate: {case.segment_default_rate:.2%}
    Top risk drivers: {clean_risk_factors}
    """

    # Retrieve relevant policy chunks
    out = app.invoke({"case_text": case_text})

    decision = out['decision']

    results.append({
        "case_id": case.case_id,
        "pd": case.PD,
        "segment": case.Action,
        "segment_default_rate": case.segment_default_rate,
        "Top Risk Drivers": clean_risk_factors,
        "ai_decision": decision["ai_decision"],
        "policy_status": decision["policy_status"],
        "next_step": decision["next_step"],
        "ai_reason": decision["ai_reason"]
    })

# AI Decision Table

The table below shows the **LLM-generated Credit Quality Assurance (QA) decisions** for a sample of high-risk and borderline loan applications.

- **Purpose**: Records the final AI-assisted credit decision for each case.

- **Content**: Combines loan risk metrics with the system’s recommendation.

- **Key fields include**:

    - `case_id` – Unique loan identifier

    - `pd` – Model-estimated Probability of Default

    - `segment` – Risk segment from policy rules

    - `segment_default_rate` – Benchmark default rate for the segment

    - `Top Risk Drivers` – Main features influencing the PD model

    - `ai_decision` – Final AI recommendation (Approve / Approve with Caution / Manual Review / Reject)

    - `policy_status` – High-level compliance assessment

    - `ai_reason` – Human-readable justification based on policy and risk

    - `next_step` – Recommended action for operations or QA team

In [167]:
# Create a dataframe to view the results
ai_decision = pd.DataFrame(results)
ai_decision.index = ai_decision.index +1
ai_decision

Unnamed: 0,case_id,pd,segment,segment_default_rate,Top Risk Drivers,ai_decision,policy_status,next_step,ai_reason
1,1452671,0.5319,Review/Reject,0.3418,"[int_rate, dti, acc_open_past_24mths, term, home_ownership_RENT]",Reject,Very High Risk,Reject the loan application based on the very high risk associated with the PD.,"The PD of 53.19% falls into the 'Very High Risk' category as it is greater than 30%. According to the bank policy, this level of PD indicates a significant risk, and the decision hierarchy suggests that such cases should be rejected unless there are compelling reasons to approve them, which are not present in this case."
2,464451,0.268195,QA,0.1617,"[int_rate, avg_cur_bal, application_type_Joint App, acc_open_past_24mths, loan_to_income]",Manual Review,PD is below 30% and below the segment default rate,Conduct a manual review to assess the impact of the top risk drivers on the overall credit risk.,"The PD is 26.82%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, the presence of other risk factors (top risk drivers) necessitates a more thorough evaluation."
3,809706,0.288979,QA,0.1617,"[int_rate, avg_cur_bal, term, dti, acc_open_past_24mths]",Approved with Caution,Manual Review,Proceed with caution and monitor the loan closely.,"The PD is below 30% but above the segment default rate, and there are other risk factors present."
4,1538286,0.099637,QA,0.1617,"[int_rate, revol_util, dti, acc_open_past_24mths, inq_last_6mths]",Approved with Caution,Manual Review,Proceed with caution and consider additional risk assessments.,"The PD of 9.96% is below 30% but above the segment default rate of 16.17%. Additionally, there are other risk factors present, which necessitates a cautious approach."
5,822739,0.366241,Review/Reject,0.3418,"[dti, acc_open_past_24mths, avg_cur_bal, bc_util, loan_to_income]",Manual Review,PD ≥ 30%,Conduct a manual review to assess the overall credit risk and determine the appropriate action.,"The PD is 36.62%, which is above the 30% threshold, indicating a higher credit risk. However, the decision requires further evaluation due to the presence of other risk factors and the segment default rate being close to the PD."
6,549665,0.229531,QA,0.1617,"[int_rate, total_rev_hi_lim, loan_to_income, acc_open_past_24mths, purpose_home_improvement]",Manual Review,PD is below 30% and below the segment default rate,Conduct a manual review to assess the impact of the top risk drivers on the overall credit risk.,"The PD is 22.95%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, the presence of other risk factors (top risk drivers) necessitates a more thorough evaluation."
7,521926,0.083505,QA,0.1617,"[int_rate, avg_cur_bal, loan_to_income, acc_open_past_24mths, mths_since_last_delinq]",Manual Review,"PD is below 30% and below the segment default rate, but other risk factors are present.",Submit for human judgment to assess the overall risk.,"The PD is 8.35%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, the presence of other risk drivers indicates that the case may require further evaluation."
8,383370,0.230352,QA,0.1617,"[loan_to_income, term, int_rate, avg_cur_bal, mths_since_last_delinq]",Manual Review,"PD is below 30% and below the segment default rate, but other risk factors are present.",Conduct a manual review to assess the overall credit risk considering the additional risk factors.,"The PD is 23.04%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, there are other risk factors indicated, such as loan_to_income and mths_since_last_delinq, which necessitate a more thorough evaluation."
9,707053,0.1442,QA,0.1617,"[int_rate, application_type_Joint App, avg_cur_bal, loan_to_income, home_ownership_MORTGAGE]",Manual Review,"PD is below 30% and below the segment default rate, but other risk factors are present.",Conduct a manual review to assess the overall risk and make a final decision.,"The PD is 14.42%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, there are other risk factors indicated, such as high interest rates and joint application type, which necessitate a more thorough evaluation."
10,1629238,0.397524,Review/Reject,0.3418,"[int_rate, dti, revol_util, home_ownership_RENT, term]",Reject,Very High Risk,Inform the applicant of the rejection and provide justification based on the PD.,"The PD of 39.75% falls into the 'Very High Risk' category as it is greater than or equal to 30%. According to the bank policy, this risk band determines the base decision path and does not override policy limits. Therefore, the loan must be rejected."


Each row represents a loan that was flagged by the machine-learning risk model and credit policy rules.  
Using **Retrieval-Augmented Generation (RAG)**, the LLM reviewed the relevant credit policy documents and produced:

- An **AI decision** (e.g. Reject, Approve, or Route to QA)  
- A **policy compliance status**  
- A **recommended next action**  
- A **human-readable explanation** linking the borrower’s risk profile to the bank’s credit policy  

This output demonstrates how large language models can be used as a **governed decision-support layer** on top of traditional credit-risk models, enabling faster, more consistent, and more explainable QA reviews.

In [169]:
# Save the AI decisions to a CSV file
ai_decision.to_csv("ai_decisions.csv", index=False)

In [153]:
pip freeze > requirements-rag.txt

Note: you may need to restart the kernel to use updated packages.


# Audit Table (Governance & Traceability Layer)
Purpose of the audit table:

- Provides a governance record of every decision made by the system.

- Ensures traceability, transparency, and reproducibility of AI-assisted decisions.

- Enables later review, validation, and model monitoring.

Key information logged:

- `case_id` – Unique loan identifier

- `pd` – Probability of Default used in the decision

- `segment` – Risk segment from policy rules

- `segment_default_rate` – Benchmark default rate

- `top_risk_factors` – Key drivers of the PD model

- `ai_decision` – Final system recommendation

- `policy_status` – High-level compliance assessment

- `ai_reason` – Explanation of the decision

- `next_step` – Recommended operational action

- `decision_source` – Whether the decision came from:

    - "risk_gate" (rule-based) or

    - "llm" (RAG + LLM reasoning)

- `timestamp` – When the decision was made

- `model_version` – LLM version used

- `graph_version` – Workflow version

In [179]:
# audit table
audit_df = pd.read_csv(r"C:\Users\USER\Desktop\data set\lending club data set\data\qa_decision_audit.csv")
audit_df.head()

Unnamed: 0,case_id,pd,segment,segment_default_rate,top_risk_drivers,ai_decision,policy_status,ai_reason,next_step,decision_source,timestamp,model_version,graph_version
0,1452671,0.5319,Review/Reject,0.3418,"['int_rate', 'dti', 'acc_open_past_24mths', 'term', 'home_ownership_RENT']",Reject,Very High Risk,"The PD of 53.19% falls into the 'Very High Risk' category as it is greater than 30%. According to the bank policy, this level of PD indicates a significant risk, and the decision hierarchy suggests that such cases should be rejected unless there are compelling reasons to approve them, which are not present in this case.",Reject the loan application based on the very high risk associated with the PD.,llm,2026-01-16T05:23:46.042506,gpt-4o-mini,1.0.1
1,1452671,0.5319,Review/Reject,0.3418,"['int_rate', 'dti', 'acc_open_past_24mths', 'term', 'home_ownership_RENT']",Manual Review,PD is below 30% and below the segment default rate,"The PD is 26.82%, which is below the 30% threshold, and it is also below the segment default rate of 16.17%. However, the presence of other risk factors (top risk drivers) necessitates a more thorough evaluation.",Conduct a manual review to assess the impact of the top risk drivers on the overall credit risk.,llm,2026-01-16T05:23:49.603085,gpt-4o-mini,1.0.1
2,1452671,0.5319,Review/Reject,0.3418,"['int_rate', 'dti', 'acc_open_past_24mths', 'term', 'home_ownership_RENT']",Approved with Caution,Manual Review,"The PD is below 30% but above the segment default rate, and there are other risk factors present. This aligns with the decision logic for 'Approved with Caution'.",Proceed with caution and monitor the loan closely.,llm,2026-01-16T05:23:51.731782,gpt-4o-mini,1.0.1
3,1452671,0.5319,Review/Reject,0.3418,"['int_rate', 'dti', 'acc_open_past_24mths', 'term', 'home_ownership_RENT']",Approved with Caution,Validated,"The PD of 9.96% is below 30% but above the segment default rate of 16.17%. Additionally, there are other risk factors present, which necessitates caution in the approval.",Proceed with caution and monitor the loan closely.,llm,2026-01-16T05:23:53.542929,gpt-4o-mini,1.0.1
4,1452671,0.5319,Review/Reject,0.3418,"['int_rate', 'dti', 'acc_open_past_24mths', 'term', 'home_ownership_RENT']",Manual Review,PD ≥ 30%,"The PD is 36.62%, which is above the 30% threshold, indicating a higher credit risk. However, the decision requires further evaluation due to the presence of other risk factors and the segment default rate being close to the PD.",Conduct a manual review to assess the overall credit risk and determine the appropriate action.,llm,2026-01-16T05:23:57.191657,gpt-4o-mini,1.0.1


# Conclusion

This notebook demonstrates a practical implementation of an LLM-powered Credit Quality Assurance (QA) system that integrates machine learning–based credit risk assessment with bank policy documents through Retrieval-Augmented Generation (RAG). The system combines a transparent, rule-based risk gate with an LLM reasoning layer, ensuring that clearly unacceptable cases are filtered out while more nuanced cases benefit from policy-aware AI support.

Future improvements could include more rigorous evaluation of RAG retrieval quality, additional post-LLM sanity checks, and performance monitoring to further strengthen reliability and compliance.