#  Applied Task – Loan Approval Decision Agent

## Hybrid Rule-Based + LLM Explanation System

⚠️ Important
This is NOT a prompting exercise.
This is a system design and rule-engine exercise.

The LLM must NEVER decide the loan.
Your Python logic must decide first.

---

#  LEVEL 1 – Basic Hybrid Decision Engine

##  Objective

Build a simple Loan Approval Decision System that:

* Extracts loan amount from user input
* Matches it against predefined rules
* Selects one rule
* Uses LLM only to explain the decision

---

# 🏗 Required Architecture (Level 1)

User Input

↓

Extract amount

↓

Match rule condition

↓

Select first matching rule

↓

LLM explanation

↓

Structured final output


---

#  Step 1 – Create Loan Rules CSV (Level 1)

Your CSV must contain:

* section
* rule_description
* condition
* decision
* risk_level

### Example Rules

| section | rule_description         | condition       | decision      | risk_level |
| ------- | ------------------------ | --------------- | ------------- | ---------- |
| Loan    | Small loan auto approved | amount <= 5000  | APPROVED      | Low        |
| Loan    | Medium loan needs review | amount <= 20000 | MANUAL_REVIEW | Medium     |
| Loan    | High loan rejected       | amount > 20000  | REJECTED      | High       |

⚠️ Rules must support only:

* amount
* single comparison operator
* one numeric value

No AND.
No credit score.
Keep it simple.

---

#  Step 2 – Extract Loan Amount

Example input:

"I need a loan of 15000 dollars."

Your system must:

* Extract 15000
* Assign to variable: amount

Hint:
Use regex to extract first numeric value.

---

#  Step 3 – Implement Safe Condition Matching

You must support:

* <=
* <
* >
* > =

Example:

Condition: amount <= 5000

If amount = 4000 → True
If amount = 8000 → False

⚠️ Do NOT use eval().

Implement manual comparison logic.

---

#  Step 4 – Rule Selection Strategy (Level 1)

Loop through rules in order.

The first rule that returns True wins.

Stop checking after match.

---

#  Step 5 – LLM Explanation Layer

After selecting the rule:

Send to LLM:

* User input
* Selected rule
* Decision
* Risk level

Required Output Format:


Decision: <APPROVED / REJECTED / MANUAL_REVIEW>

Risk Level: <Low / Medium / High>

Reasoning: <Short explanation>


⚠️ The LLM must not change the decision.

---

# Step 6 – Testing (Level 1)

You must test at least:

3 approved cases
3 manual review cases
3 rejected cases

Example:

"I need 2000"
"I want 15000 loan"
"Give me 50000"

---

# Level 1 Deliverables

1. Loan rules CSV
2. Python decision engine
3. 9 test cases
4. Half-page explanation of:

   * How rule matching works
   * Why LLM does not decide

---


In [4]:
# Mount Google Drive (if using Colab)
try:
    from google.colab import drive
    drive.mount('/content/drive')
    IN_COLAB = True
except:
    IN_COLAB = False
    print("Not running in Colab")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!pip install bitsandbytes accelerate

Collecting bitsandbytes
  Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Downloading bitsandbytes-0.49.2-py3-none-manylinux_2_24_x86_64.whl (60.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.7/60.7 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.49.2


In [93]:
# -------------------------
# 0) Setup & Imports
# -------------------------
import json
import re
import torch
import pandas as pd
from typing import Dict, Any, Tuple, Optional, List

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline


# -------------------------
# 1) Load Model (same approach)
# -------------------------
# Change these paths to match your environment
model_path = "/content/drive/MyDrive/Phi_3_5_mini_instruct"

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    local_files_only=True
)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    local_files_only=True
)

print("✅ Model & tokenizer loaded")

Device: cuda


Loading weights:   0%|          | 0/195 [00:00<?, ?it/s]

✅ Model & tokenizer loaded


In [78]:
df=pd.read_csv("loan_rules.csv")
print(df.head())
print("Columns:", df.columns)

          section            rule_description        condition  decision  \
0     Micro_Loans  Nano loan instant approval   amount <= 1000  APPROVED   
1     Micro_Loans          Small starter loan   amount <= 2500  APPROVED   
2  Consumer_Loans        Standard retail loan   amount <= 5000  APPROVED   
3  Consumer_Loans        Elevated retail loan   amount <= 7500  APPROVED   
4  Consumer_Loans         Premium retail loan  amount <= 10000  APPROVED   

  risk_level  
0        Low  
1        Low  
2        Low  
3        Low  
4     Medium  
Columns: Index(['section', 'rule_description', 'condition', 'decision', 'risk_level'], dtype='object')


In [79]:
# ─────────────────────────────────────────────
# STEP 1: PARSE RULES FROM YOUR DATAFRAME
# ─────────────────────────────────────────────
def parse_condition(condition_str: str) -> tuple:
    """
    Parse "amount <= 1000" → ("amount", "<=", 1000.0)
    Supports: <=, >=, <, >, =, !=
    """
    pattern = r"(\w+)\s*(<=|>=|!=|<|>|=)\s*([\d.]+)"
    match = re.match(pattern, condition_str.strip())
    if not match:
        raise ValueError(f"Cannot parse condition: '{condition_str}'")
    return (match.group(1), match.group(2), float(match.group(3)))


def load_rules_from_df(rules_df: pd.DataFrame) -> list[dict]:
    """
    Convert each DataFrame row into a rule dict.
    Row order = rule priority (first match wins).

    Required columns: section, rule_description, condition, decision, risk_level
    """
    rules = []
    for idx, row in rules_df.iterrows():
        rules.append({
            "id":            idx,
            "section":       row["section"],
            "label":         row["rule_description"],
            "condition_str": row["condition"],
            "condition":     parse_condition(row["condition"]),
            "decision":      row["decision"],
            "risk_level":    row["risk_level"],
        })
    return rules



In [80]:
# ─────────────────────────────────────────────
# STEP 2: EXTRACT LOAN AMOUNT FROM USER INPUT
# ─────────────────────────────────────────────
def extract_amount(user_input: str) -> float | None:
    """
    Extract first numeric value from free-text input.
    Handles: $15,000  |  15000  |  15.5k  |  "$500"
    """
    cleaned = user_input.replace(",", "")

    # Handle shorthand: "15k" → 15000
    k_match = re.search(r"\$?(\d+(?:\.\d+)?)\s*k\b", cleaned, re.IGNORECASE)
    if k_match:
        return float(k_match.group(1)) * 1000

    match = re.search(r"\$?(\d+(?:\.\d+)?)", cleaned)
    return float(match.group(1)) if match else None



In [96]:
def evaluate_condition(amount: float, condition: tuple) -> bool:
    """
    Manually evaluate (field, operator, threshold).
    ⚠️ Never uses eval().
    """
    _, operator, threshold = condition
    operator = operator.strip()

    if operator == "<=":
        return amount <= threshold
    elif operator == "<":
        return amount < threshold
    elif operator == ">=":
        return amount >= threshold
    elif operator == ">":
        return amount > threshold
    elif operator == "=":
        return amount == threshold
    elif operator == "!=":
        return amount != threshold
    else:
        raise ValueError(f"Unsupported operator: {operator}")

In [82]:
# ─────────────────────────────────────────────
# STEP 4: RULE SELECTION (first-match wins)
# ─────────────────────────────────────────────
def select_rule(amount: float, rules: list[dict]) -> dict | None:
    """
    Loop through rules in DataFrame order.
    Return FIRST rule whose condition evaluates to True.
    ⚠️ Python decides — NOT the LLM.
    """
    for rule in rules:
        if evaluate_condition(amount, rule["condition"]):
            return rule
    return None


In [99]:
# ─────────────────────────────────────────────
# STEP 5: LLM EXPLANATION LAYER
# ─────────────────────────────────────────────
def build_prompt(user_input: str, amount: float, rule: dict) -> str:
    return (
        "You are a professional loan officer assistant. "
        "A loan decision has already been made by our rule engine. "
        "Your ONLY job is to write a clear, professional one sentence explanation "
        "for the applicant. Do NOT change, question, or override the decision.\n\n"
        f"Applicant Request : \"{user_input}\"\n"
        f"Loan Amount       : ${amount:,.0f}\n"
        f"Loan Category     : {rule['section']}\n"
        f"Applied Rule      : {rule['label']}\n"
        f"Condition Met     : {rule['condition_str']}\n"
        f"Decision          : {rule['decision']}\n"
        f"Risk Level        : {rule['risk_level']}\n\n"
        "Explanation:"
    )


def get_llm_explanation(pipe, user_input: str, amount: float, rule: dict) -> str:
    """Call Phi-3.5-mini-instruct to generate an explanation (never the decision)."""
    messages = [{"role": "user", "content": build_prompt(user_input, amount, rule)}]
    output = pipe(messages, max_new_tokens=150, do_sample=False)

    response = output[0]["generated_text"]
    if isinstance(response, list):
        for msg in reversed(response):
            if isinstance(msg, dict) and msg.get("role") == "assistant":
                return msg["content"].strip()
    return str(response).strip()

In [84]:
# ─────────────────────────────────────────────
# STEP 6: MAIN DECISION PIPELINE
# ─────────────────────────────────────────────
def process_loan_application(pipe, user_input: str, rules: list[dict]) -> dict:
    """
    Single application pipeline:
      Extract amount → Select rule (Python) → LLM explains → Structured output
    """
    amount = extract_amount(user_input)

    if amount is None:
        return {
            "user_input":    user_input,
            "amount":        None,
            "section":       "N/A",
            "matched_rule":  "N/A",
            "condition_met": "N/A",
            "decision":      "ERROR",
            "risk_level":    "N/A",
            "reasoning":     "Could not extract a loan amount from the input.",
        }

    rule = select_rule(amount, rules)

    if rule is None:
        return {
            "user_input":    user_input,
            "amount":        amount,
            "section":       "N/A",
            "matched_rule":  "No rule matched",
            "condition_met": "N/A",
            "decision":      "REJECTED",
            "risk_level":    "High",
            "reasoning":     "No applicable rule found for this loan amount.",
        }

    return {
        "user_input":    user_input,
        "amount":        amount,
        "section":       rule["section"],
        "matched_rule":  rule["label"],
        "condition_met": rule["condition_str"],
        "decision":      rule["decision"],      # ← Set by Python rule engine
        "risk_level":    rule["risk_level"],    # ← Set by Python rule engine
        "reasoning":     get_llm_explanation(pipe, user_input, amount, rule),  # ← LLM only
    }


def print_result(result: dict):
    icons = {"APPROVED": "✅", "MANUAL_REVIEW": "🔍", "REJECTED": "❌", "ERROR": "⚠️"}
    icon  = icons.get(result["decision"], "")
    amt   = f"${result['amount']:,.0f}" if result["amount"] is not None else "N/A"
    print("=" * 65)
    print(f"  Input      : {result['user_input']}")
    print(f"  Amount     : {amt}")
    print(f"  Section    : {result['section']}")
    print(f"  Rule       : {result['matched_rule']}")
    print(f"  Condition  : {result['condition_met']}")
    print(f"  Decision   : {icon} {result['decision']}")
    print(f"  Risk Level : {result['risk_level']}")
    print(f"  Reasoning  :")
    print(f"    {result['reasoning']}")
    print("=" * 65 + "\n")



In [98]:
# ─────────────────────────────────────────────
# STEP 7: USER QUERY FUNCTION
# ─────────────────────────────────────────────

# Load rules and model once at module level
_rules = load_rules_from_df(df)

# Initialize the text generation pipeline
text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=150 # Set max_length here to match max_new_tokens in get_llm_explanation
)

def user_query(query: str):
    """
    Process a single loan request query.
    """
    result = process_loan_application(text_generation_pipeline, query, _rules)
    print_result(result)

Passing `generation_config` together with generation-related arguments=({'max_length'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.


In [102]:
user_query("i need 85,000 dollars") #1

Both `max_new_tokens` (=150) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input      : i need 85,000 dollars
  Amount     : $85,000
  Section    : Large_Loans
  Rule       : Major business loan
  Condition  : amount <= 100000
  Decision   : 🔍 MANUAL_REVIEW
  Risk Level : High
  Reasoning  :
    The loan request for $85,000 in a Major Business Loan category has been flagged for MANUAL_REVIEW due to a High risk level, despite meeting the condition of amount <= $100,000.



In [103]:
user_query("I need $800 to fix my phone.") #2

Both `max_new_tokens` (=150) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input      : I need $800 to fix my phone.
  Amount     : $800
  Section    : Micro_Loans
  Rule       : Nano loan instant approval
  Condition  : amount <= 1000
  Decision   : ✅ APPROVED
  Risk Level : Low
  Reasoning  :
    Your request for a $800 micro-loan for phone repairs has been approved under our Nano loan instant approval rule, as the amount requested is within our limit and deemed to carry a low risk level.



In [104]:
user_query("I'd like to borrow $7500 for home appliances.") #3

Both `max_new_tokens` (=150) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input      : I'd like to borrow $7500 for home appliances.
  Amount     : $7,500
  Section    : Consumer_Loans
  Rule       : Elevated retail loan
  Condition  : amount <= 7500
  Decision   : ✅ APPROVED
  Risk Level : Low
  Reasoning  :
    Your application for a $7,500 elevated retail loan for home appliances has been approved with a low risk assessment.



In [105]:
user_query("Please approve a loan of $45,000 for renovation.") #4

Both `max_new_tokens` (=150) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input      : Please approve a loan of $45,000 for renovation.
  Amount     : $45,000
  Section    : Large_Loans
  Rule       : Entry business loan
  Condition  : amount <= 50000
  Decision   : 🔍 MANUAL_REVIEW
  Risk Level : High
  Reasoning  :
    The loan request for $45,000 for renovation has been flagged for a MANUAL_REVIEW due to a high risk level, despite meeting the condition of being less than or equal to $50,000, as per the Entry Business Loan rule.



In [106]:
user_query("Apply for 500k loan for my company.") #5

Both `max_new_tokens` (=150) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input      : Apply for 500k loan for my company.
  Amount     : $500,000
  Section    : Hard_Limits
  Rule       : Excessive individual risk
  Condition  : amount > 100000
  Decision   : ❌ REJECTED
  Risk Level : High
  Reasoning  :
    The loan application for $500,000 for your company has been rejected due to an identified excessive individual risk, as the amount exceeds the 100,000 threshold and is categorized under 'Hard_Limits,' resulting in a high-risk assessment.



### How Rule Matching Works
The rule engine reads every rule directly from your DataFrame, preserving the exact row order. When a user submits a query, the system first extracts the numeric loan amount from the free-text input using regex. It then walks through the rules one by one, from row 0 downward, and evaluates each condition — such as amount <= 1000 or amount <= 25000 — using a manual comparison function that explicitly checks the operator without ever calling Python's eval(). The moment a condition returns True, that rule is selected and the loop stops immediately. This "first-match wins" strategy means the most specific, lowest-threshold rules at the top of the DataFrame always take priority over broader rules further down. If no rule matches at all, the system defaults to REJECTED. The result is a deterministic, fully traceable decision where every outcome can be explained by pointing to a single row in your DataFrame.
### Why the LLM Does Not Decide
The LLM's role is strictly limited to writing the explanation after the decision has already been made by the rule engine. By the time the model receives any input, the decision, risk level, matched rule, and condition are all finalized and passed in as fixed context. The prompt explicitly instructs the model not to change, question, or override the decision — it is only asked to translate the structured output into a professional, human-readable sentence or two. This separation exists for two critical reasons. First, LLMs are probabilistic — given the same input twice, they may produce different outputs, which makes them fundamentally unsuitable for consistent, auditable financial decisions. Second, regulatory and compliance frameworks in lending require that every decision be fully explainable and traceable back to a documented rule, something a neural network cannot provide on its own. Keeping the LLM in an explanation-only role gives you the best of both worlds: reliable, rule-based decisions that can be audited, and natural language output that communicates those decisions clearly to the applicant.


---

# LEVEL 2 – Extended Hybrid Decision Engine

⚠️ Complete Level 1 first.

---

## Objective

Upgrade your system to support:

* Multiple variables
* AND conditions
* Rule priority strategy
* Fallback handling

---

#  Required Architecture (Level 2)

User Input

↓

Extract multiple variables

↓

Parse AND conditions

↓

Evaluate rule logic

↓

Apply priority strategy

↓

Select final rule

↓

LLM explanation

↓

Structured output


---

#  Step 1 – Update Loan Rules CSV

Now support:

* amount
* credit_score
* AND conditions
* chained numeric condition

Example:

| section | rule_description                   | condition                             | decision      | risk_level |
| ------- | ---------------------------------- | ------------------------------------- | ------------- | ---------- |
| Loan    | Small loan auto approved           | amount <= 5000                        | APPROVED      | Low        |
| Loan    | Medium loan review                 | 5000 < amount <= 20000                | MANUAL_REVIEW | Medium     |
| Loan    | High loan with low credit rejected | amount > 20000 AND credit_score < 700 | REJECTED      | High       |
| Loan    | Very low credit score rejection    | credit_score < 600                    | REJECTED      | High       |

---

#  Step 2 – Extract Multiple Variables

Example input:

"I want a loan of 25000. My credit score is 680."

Expected:

amount = 25000
credit_score = 680

Hint:
Use keyword detection:

* If sentence contains "loan" → assign number to amount
* If contains "credit" → assign number to credit_score

If missing variable → handle safely.

---

#  Step 3 – Implement AND Logic

Example condition:

amount > 20000 AND credit_score < 700

You must:

1. Split condition by "AND"
2. Evaluate each part separately
3. Return True only if ALL parts are True

No eval().

---

#  Step 4 – Implement Rule Priority Strategy

Multiple rules may match.

You must define a strategy.

Recommended strategy:

1️⃣ Higher risk_level wins
High > Medium > Low

2️⃣ If same risk level → first match wins

Document your strategy in README.

---

#  Step 5 – Add Fallback Handling

If:

* No rule matches
* Missing required variable
* Input unclear

Return:

Decision: NEED_MORE_INFORMATION

Risk Level: Unknown

Reasoning: Missing required information.


---

# Step 6 – Testing (Level 2)

You must test:

5 low-risk

5 medium-risk

5 high-risk

3 missing-variable cases


Example edge case:

"I need a big loan but my credit is bad."

Describe how your system behaves.

---

# Level 2 Deliverables

1. Updated Loan rules CSV
2. Updated Python decision engine
3. 15+ test cases
4. 1-page explanation covering:

   * AND logic implementation
   * Rule priority strategy
   * Fallback behavior
   * System limitations

---

# Learning Outcome

After Level 2, you should understand:

* Difference between rule engine and LLM
* Hybrid architecture design
* Safe condition parsing
* Deterministic decision systems
* Basic conflict resolution


In [107]:
df = pd.DataFrame({
    "section": ["Loan"] * 7,
    "rule_description": [
        "Small loan auto approved",
        "Medium loan with good credit approved",
        "Medium loan review",
        "High loan with good credit manual review",
        "High loan with low credit rejected",
        "Very low credit score rejection",
        "Very large loan rejection",
    ],
    "condition": [
        "amount <= 5000",
        "5000 < amount <= 20000 AND credit_score >= 700",
        "5000 < amount <= 20000",
        "amount > 20000 AND credit_score >= 700",
        "amount > 20000 AND credit_score < 700",
        "credit_score < 600",
        "amount > 100000",
    ],
    "decision": [
        "APPROVED",
        "APPROVED",
        "MANUAL_REVIEW",
        "MANUAL_REVIEW",
        "REJECTED",
        "REJECTED",
        "REJECTED",
    ],
    "risk_level": [
        "Low",
        "Low",
        "Medium",
        "Medium",
        "High",
        "High",
        "High",
    ],
})
df.head()

Unnamed: 0,section,rule_description,condition,decision,risk_level
0,Loan,Small loan auto approved,amount <= 5000,APPROVED,Low
1,Loan,Medium loan with good credit approved,5000 < amount <= 20000 AND credit_score >= 700,APPROVED,Low
2,Loan,Medium loan review,5000 < amount <= 20000,MANUAL_REVIEW,Medium
3,Loan,High loan with good credit manual review,amount > 20000 AND credit_score >= 700,MANUAL_REVIEW,Medium
4,Loan,High loan with low credit rejected,amount > 20000 AND credit_score < 700,REJECTED,High


In [108]:
# ─────────────────────────────────────────────
# STEP 1: PARSE RULES FROM YOUR DATAFRAME
# ─────────────────────────────────────────────
CHAINED_PATTERN = re.compile(
    r"([\d.]+)\s*(<=|<|>=|>)\s*(\w+)\s*(<=|<|>=|>)\s*([\d.]+)"
)
SIMPLE_PATTERN = re.compile(
    r"(\w+)\s*(<=|>=|!=|<|>|=)\s*([\d.]+)"
)


def parse_single_condition(cond_str: str) -> dict:
    """
    Parse one condition string (no AND) into a structured dict.

    Handles two forms:
      Simple  : "amount <= 5000"          → {type: simple,  field, op, threshold}
      Chained : "5000 < amount <= 20000"  → {type: chained, field, low, low_op, high, high_op}
    """
    cond_str = cond_str.strip()

    # Try chained first: e.g. "5000 < amount <= 20000"
    m = CHAINED_PATTERN.match(cond_str)
    if m:
        return {
            "type":    "chained",
            "field":   m.group(3),
            "low":     float(m.group(1)),
            "low_op":  m.group(2),
            "high":    float(m.group(5)),
            "high_op": m.group(4),
        }

    # Try simple: e.g. "amount <= 5000"
    m = SIMPLE_PATTERN.match(cond_str)
    if m:
        return {
            "type":      "simple",
            "field":     m.group(1),
            "op":        m.group(2),
            "threshold": float(m.group(3)),
        }

    raise ValueError(f"Cannot parse condition part: '{cond_str}'")


def parse_condition(condition_str: str) -> list[dict]:
    """
    Split condition on AND, parse each part.
    Returns a list of condition dicts — ALL must be True for the rule to match.
    """
    parts = [p.strip() for p in re.split(r"\bAND\b", condition_str, flags=re.IGNORECASE)]
    return [parse_single_condition(p) for p in parts]


def load_rules_from_df(rules_df: pd.DataFrame) -> list[dict]:
    """
    Convert each DataFrame row into a rule dict.
    Row order is preserved; priority is applied later in select_rule().

    Required columns: section, rule_description, condition, decision, risk_level
    """
    rules = []
    for idx, row in rules_df.iterrows():
        rules.append({
            "id":            idx,
            "section":       row["section"],
            "label":         row["rule_description"],
            "condition_str": row["condition"],
            "conditions":    parse_condition(row["condition"]),
            "decision":      row["decision"],
            "risk_level":    row["risk_level"],
        })
    return rules


In [109]:
# ─────────────────────────────────────────────
# STEP 2: EXTRACT MULTIPLE VARIABLES
# ─────────────────────────────────────────────
def extract_variables(user_input: str) -> dict:
    """
    Extract amount and credit_score from free-text input.

    Strategy:
      - Scan all numbers and classify by surrounding keyword context (40 chars before)
      - "credit / score / fico" before number → credit_score
      - "loan / borrow / need / want / $"  before number → amount
      - Fallback by value range: 300–850 → credit_score, otherwise → amount
    """
    cleaned   = user_input.replace(",", "")
    variables = {}

    num_pattern    = re.compile(r"\$?(\d+(?:\.\d+)?)\s*(k?)\b")
    credit_keywords = re.compile(r"credit|score|fico", re.IGNORECASE)
    loan_keywords   = re.compile(r"loan|borrow|need|want|get|apply|request|\$", re.IGNORECASE)

    for m in num_pattern.finditer(cleaned):
        pos = m.start()
        val = float(m.group(1)) * (1000 if m.group(2).lower() == "k" else 1)
        context_before = cleaned[max(0, pos - 40): pos]

        if credit_keywords.search(context_before):
            variables["credit_score"] = val
        elif loan_keywords.search(context_before):
            if "amount" not in variables:
                variables["amount"] = val
        elif 300 <= val <= 850:
            if "credit_score" not in variables:
                variables["credit_score"] = val
        else:
            if "amount" not in variables:
                variables["amount"] = val

    return variables



In [110]:
# ─────────────────────────────────────────────
# STEP 3: SAFE CONDITION EVALUATOR (no eval())
# ─────────────────────────────────────────────
def _apply_op(left: float, op: str, right: float) -> bool:
    """Apply a single comparison operator. No eval()."""
    ops = {
        "<=": left <= right,
        "<":  left <  right,
        ">=": left >= right,
        ">":  left >  right,
        "=":  left == right,
        "!=": left != right,
    }
    if op not in ops:
        raise ValueError(f"Unsupported operator: {op}")
    return ops[op]


def evaluate_single_condition(cond: dict, variables: dict) -> bool | None:
    """
    Evaluate one parsed condition dict against extracted variables.
    Returns:
      True / False → evaluated successfully
      None         → required variable is missing (triggers fallback)
    """
    field = cond["field"]
    if field not in variables:
        return None  # missing variable → can't evaluate

    value = variables[field]

    if cond["type"] == "simple":
        return _apply_op(value, cond["op"], cond["threshold"])

    if cond["type"] == "chained":
        # e.g. 5000 < amount <= 20000
        left_ok  = _apply_op(cond["low"],  cond["low_op"],  value)
        right_ok = _apply_op(value, cond["high_op"], cond["high"])
        return left_ok and right_ok

    raise ValueError(f"Unknown condition type: {cond['type']}")


def evaluate_rule(rule: dict, variables: dict) -> bool | None:
    """
    Evaluate ALL conditions for a rule (AND logic).
    Returns:
      True  → all conditions pass
      False → at least one condition fails
      None  → a required variable is missing
    """
    for cond in rule["conditions"]:
        result = evaluate_single_condition(cond, variables)
        if result is None:
            return None   # missing variable
        if not result:
            return False  # short-circuit AND
    return True



In [111]:
# ─────────────────────────────────────────────
# STEP 4: PRIORITY STRATEGY
# ─────────────────────────────────────────────
# Strategy:
#   1. Collect ALL matching rules
#   2. Pick the rule with the HIGHEST risk level (High > Medium > Low)
#   3. Tie-break: first matching rule in DataFrame order wins

RISK_PRIORITY = {"High": 3, "Medium": 2, "Low": 1}


def select_rule(variables: dict, rules: list[dict]) -> tuple[dict | None, bool]:
    """
    Returns (best_matching_rule, needs_more_info).
    needs_more_info=True means at least one rule was skipped due to a missing variable.
    """
    matched    = []
    needs_info = False

    for rule in rules:
        result = evaluate_rule(rule, variables)
        if result is True:
            matched.append(rule)
        elif result is None:
            needs_info = True

    if not matched:
        return None, needs_info

    # Stable sort: highest risk first; ties keep original DataFrame order
    matched.sort(key=lambda r: RISK_PRIORITY.get(r["risk_level"], 0), reverse=True)
    return matched[0], False



In [112]:
# ─────────────────────────────────────────────
# STEP 5: FALLBACK HANDLING
# ─────────────────────────────────────────────
FALLBACK_REASONING = (
    "We were unable to reach a decision because required information is missing or unclear. "
    "Please provide both your requested loan amount and your credit score."
)


def build_fallback(user_input: str, variables: dict) -> dict:
    return {
        "user_input":    user_input,
        "amount":        variables.get("amount"),
        "credit_score":  variables.get("credit_score"),
        "section":       "N/A",
        "matched_rule":  "N/A",
        "condition_met": "N/A",
        "decision":      "NEED_MORE_INFORMATION",
        "risk_level":    "Unknown",
        "reasoning":     FALLBACK_REASONING,
    }



In [113]:
# ─────────────────────────────────────────────
# STEP 6: LLM EXPLANATION LAYER
# ─────────────────────────────────────────────
def build_prompt(user_input: str, variables: dict, rule: dict) -> str:
    amt    = f"${variables['amount']:,.0f}" if "amount"       in variables else "N/A"
    credit = int(variables["credit_score"]) if "credit_score" in variables else "N/A"
    return (
        "You are a professional loan officer assistant. "
        "A loan decision has already been made by our rule engine. "
        "Your ONLY job is to write a clear, professional one sentence explanation "
        "for the applicant. Do NOT change, question, or override the decision.\n\n"
        f"Applicant Request : \"{user_input}\"\n"
        f"Loan Amount       : {amt}\n"
        f"Credit Score      : {credit}\n"
        f"Loan Category     : {rule['section']}\n"
        f"Applied Rule      : {rule['label']}\n"
        f"Condition Met     : {rule['condition_str']}\n"
        f"Decision          : {rule['decision']}\n"
        f"Risk Level        : {rule['risk_level']}\n\n"
        "Explanation:"
    )


def get_llm_explanation(pipe, user_input: str, variables: dict, rule: dict) -> str:
    """Call Phi-3.5-mini-instruct for explanation only — never the decision."""
    messages = [{"role": "user", "content": build_prompt(user_input, variables, rule)}]
    output   = pipe(messages, max_new_tokens=150, do_sample=False)

    response = output[0]["generated_text"]
    if isinstance(response, list):
        for msg in reversed(response):
            if isinstance(msg, dict) and msg.get("role") == "assistant":
                return msg["content"].strip()
    return str(response).strip()



In [114]:
# ─────────────────────────────────────────────
# STEP 7: MAIN DECISION PIPELINE
# ─────────────────────────────────────────────
def process_loan_application(pipe, user_input: str, rules: list[dict]) -> dict:
    """
    Full Level-2 pipeline:
      Extract variables → Evaluate all rules → Apply priority →
      Fallback if needed → LLM explains → Structured output
    """
    # 1. Extract variables from free text
    variables = extract_variables(user_input)

    # 2. Fallback: nothing could be extracted
    if not variables:
        return build_fallback(user_input, variables)

    # 3. Select best matching rule with priority strategy
    rule, needs_info = select_rule(variables, rules)

    # 4. Fallback: no rule matched or a required variable was missing
    if rule is None:
        return build_fallback(user_input, variables)

    # 5. LLM writes the explanation (never changes the decision)
    reasoning = get_llm_explanation(pipe, user_input, variables, rule)

    return {
        "user_input":    user_input,
        "amount":        variables.get("amount"),
        "credit_score":  variables.get("credit_score"),
        "section":       rule["section"],
        "matched_rule":  rule["label"],
        "condition_met": rule["condition_str"],
        "decision":      rule["decision"],      # ← Set by Python rule engine
        "risk_level":    rule["risk_level"],    # ← Set by Python rule engine
        "reasoning":     reasoning,             # ← Generated by LLM
    }


def print_result(result: dict):
    icons = {
        "APPROVED":              "✅",
        "MANUAL_REVIEW":         "🔍",
        "REJECTED":              "❌",
        "NEED_MORE_INFORMATION": "❓",
    }
    icon  = icons.get(result["decision"], "")
    amt   = f"${result['amount']:,.0f}"      if result.get("amount")        else "N/A"
    score = str(int(result["credit_score"])) if result.get("credit_score")  else "N/A"

    print("=" * 65)
    print(f"  Input        : {result['user_input']}")
    print(f"  Amount       : {amt}")
    print(f"  Credit Score : {score}")
    print(f"  Section      : {result['section']}")
    print(f"  Rule         : {result['matched_rule']}")
    print(f"  Condition    : {result['condition_met']}")
    print(f"  Decision     : {icon} {result['decision']}")
    print(f"  Risk Level   : {result['risk_level']}")
    print(f"  Reasoning    :")
    print(f"    {result['reasoning']}")
    print("=" * 35 + "\n")



In [117]:
# ─────────────────────────────────────────────
# STEP 8: USER QUERY FUNCTION
# ─────────────────────────────────────────────

# Load rules from DataFrame
_rules = load_rules_from_df(df)

# Build pipeline from your already-loaded model and tokenizer
_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    max_new_tokens=150,
    do_sample=False,
)


def user_query(query: str):
    """
    Process a single loan request query end-to-end.

    Args:
        query: Natural language loan request including amount and optionally credit score.
               e.g. "I need a loan of 25000. My credit score is 680."

    Usage:
        user_query("I need a loan of 3000.")
        user_query("I want 25000 loan, my credit score is 720.")
        user_query("Apply for 50000, score is 580.")
    """
    result = process_loan_application(_pipe, query, _rules)
    print_result(result)



In [None]:
# ─────────────────────────────────────────────
# TEST CASES — run each line individually
# ─────────────────────────────────────────────

# ✅ APPROVED (3 cases)
# user_query("I need a loan of 3000 for furniture.")
# user_query("Can I borrow $1500? My credit score is 750.")
# user_query("I want a loan of 15000 and my credit score is 720.")

# 🔍 MANUAL_REVIEW (3 cases)
# user_query("I need 10000 for medical bills. My score is 650.")
# user_query("Requesting a loan of 25000, credit score is 710.")
# user_query("I want 18000 loan, score is 580.")

# ❌ REJECTED (3 cases)
# user_query("I need 30000 but my credit score is 620.")
# user_query("Apply for 50000 loan. My credit score is 550.")
# user_query("I want 200000 for real estate.")

# ❓ NEED_MORE_INFORMATION (fallback cases)
# user_query("I need a loan please.")
# user_query("My credit score is 700.")

In [119]:
user_query("I need a loan of 3000 for furniture.")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : I need a loan of 3000 for furniture.
  Amount       : $3,000
  Credit Score : N/A
  Section      : Loan
  Rule         : Small loan auto approved
  Condition    : amount <= 5000
  Decision     : ✅ APPROVED
  Risk Level   : Low
  Reasoning    :
    Your application for a $3,000 loan for furniture has been approved based on our rule engine's criteria for small loans, with a low risk level and an approved status.



In [120]:
user_query("Can I borrow $1500? My credit score is 750.")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : Can I borrow $1500? My credit score is 750.
  Amount       : $1,500
  Credit Score : 750
  Section      : Loan
  Rule         : Small loan auto approved
  Condition    : amount <= 5000
  Decision     : ✅ APPROVED
  Risk Level   : Low
  Reasoning    :
    Your application for a $1,500 small loan has been approved based on your credit score of 750, aligning with our rule for small loans under $5,000, and is categorized as low risk.



In [121]:
user_query("I need 30000 but my credit score is 620.")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : I need 30000 but my credit score is 620.
  Amount       : $30,000
  Credit Score : 620
  Section      : Loan
  Rule         : High loan with low credit rejected
  Condition    : amount > 20000 AND credit_score < 700
  Decision     : ❌ REJECTED
  Risk Level   : High
  Reasoning    :
    The loan application for $30,000 was rejected due to a credit score below the threshold of 700, aligning with the rule for high loan amounts with low credit scores, indicating a high risk level.



In [122]:
user_query("Apply for 50000 loan. My credit score is 550.")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : Apply for 50000 loan. My credit score is 550.
  Amount       : $50,000
  Credit Score : 550
  Section      : Loan
  Rule         : High loan with low credit rejected
  Condition    : amount > 20000 AND credit_score < 700
  Decision     : ❌ REJECTED
  Risk Level   : High
  Reasoning    :
    The loan application for $50,000 was rejected due to a high risk level, as the applicant's credit score of 550 did not meet the criteria of being above 700 for such a substantial loan amount.



In [123]:
user_query("I want 200000 for real estate.")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : I want 200000 for real estate.
  Amount       : $200,000
  Credit Score : N/A
  Section      : Loan
  Rule         : Very large loan rejection
  Condition    : amount > 100000
  Decision     : ❌ REJECTED
  Risk Level   : High
  Reasoning    :
    The loan application for $200,000 in real estate has been rejected due to the high risk level associated with the very large loan amount, as per the rule engine's assessment.



In [124]:
user_query("I need a loan please.")

  Input        : I need a loan please.
  Amount       : N/A
  Credit Score : N/A
  Section      : N/A
  Rule         : N/A
  Condition    : N/A
  Decision     : ❓ NEED_MORE_INFORMATION
  Risk Level   : Unknown
  Reasoning    :
    We were unable to reach a decision because required information is missing or unclear. Please provide both your requested loan amount and your credit score.



In [125]:
user_query("My credit score is 700.")

  Input        : My credit score is 700.
  Amount       : N/A
  Credit Score : 700
  Section      : N/A
  Rule         : N/A
  Condition    : N/A
  Decision     : ❓ NEED_MORE_INFORMATION
  Risk Level   : Unknown
  Reasoning    :
    We were unable to reach a decision because required information is missing or unclear. Please provide both your requested loan amount and your credit score.



In [126]:
user_query("I want to buy new car and its price is $16500 and my score is 921")

Both `max_new_tokens` (=150) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  Input        : I want to buy new car and its price is $16500 and my score is 921
  Amount       : $16,500
  Credit Score : 921
  Section      : Loan
  Rule         : Medium loan review
  Condition    : 5000 < amount <= 20000
  Decision     : 🔍 MANUAL_REVIEW
  Risk Level   : Medium
  Reasoning    :
    Your application for a $16,500 car loan with a credit score of 921 has been flagged for a manual review due to the loan amount falling within the medium risk category as per our rule engine's assessment.



In [127]:
user_query("I need a big loan but my credit is bad.")

  Input        : I need a big loan but my credit is bad.
  Amount       : N/A
  Credit Score : N/A
  Section      : N/A
  Rule         : N/A
  Condition    : N/A
  Decision     : ❓ NEED_MORE_INFORMATION
  Risk Level   : Unknown
  Reasoning    :
    We were unable to reach a decision because required information is missing or unclear. Please provide both your requested loan amount and your credit score.



### AND Logic
Conditions are split on the AND keyword, parsed into individual fragments, and evaluated one by one. All fragments must return True for a rule to match. If any fragment fails, the rule is immediately dismissed. No eval() is used at any point.
### Rule Priority Strategy
All rules are evaluated and every match is collected. The highest risk-level match wins — High beats Medium, Medium beats Low. Ties are broken by row order in the DataFrame. This ensures the most conservative decision always prevails.
### Fallback Behavior
NEED_MORE_INFORMATION is returned when the input contains no extractable variables, or when no rule produces a definitive match. The LLM is skipped entirely in this case and the applicant is prompted to resubmit with complete details.
### System Limitations
Variable extraction depends on keyword proximity and can misclassify ambiguous phrasing. OR conditions and nested logic are not supported. The priority strategy always escalates to the strictest outcome, which may be overly conservative for some use cases. LLM explanations are non-deterministic in wording, which may be a concern in regulated environments.