## 🧠 AI Agent for Loan Data Insights using OpenAI API

This notebook demonstrates how to build an AI-powered data assistant that answers natural language questions about loan applicants using OpenAI’s GPT model and Pandas. The agent reads structured data and provides real-time insights such as average loan amount, top income earner, and more.


In [6]:
# Import necessary libraries
from openai import OpenAI

# Initialize OpenAI client

client = OpenAI(api_key='AI_AGENT')


In [2]:
# Load the dataset
import pandas as pd

df = pd.read_csv('loan_prediction.csv')

# Display basic info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


In [3]:
# Function to generate a summary of the dataset structure
def create_data_summary(df):
    summary = f"The dataset has {df.shape[0]} rows and {df.shape[1]} columns.\n"
    summary += "Columns:\n"
    for col in df.columns:
        summary += f"- {col} (type: {df[col].dtype})\n"
    return summary

In [4]:
# Define the AI Agent logic with conditional handling for specific queries
def ai_agent(user_query, df):
    user_query = user_query.lower()

    if "average loan amount" in user_query:
        avg_loan = df['LoanAmount'].mean()
        return f"The average loan amount applied for by all applicants is ₹{avg_loan:.2f} thousand."

    elif "highest applicant income" in user_query:
        max_income = df['ApplicantIncome'].max()
        top_applicant = df[df['ApplicantIncome'] == max_income]['Loan_ID'].values[0]
        return f"The applicant with the highest income has ₹{max_income} and their Loan ID is {top_applicant}."

    elif "self-employed" in user_query or "self employed" in user_query:
        # Safely count 'Yes' values in 'Self_Employed' column, even with missing data
        self_employed_count = df['Self_Employed'].value_counts().get('Yes', 0)
        return f"The number of applicants who are self-employed is {self_employed_count}."

    else:
        # Default fallback to GPT-4o if it's a generic query
        data_context = create_data_summary(df)
        prompt = f"""
        You are a data expert AI agent.
        Here is a dataset summary:
        {data_context}
        User's question: '{user_query}'
        Think step-by-step and give a clear final answer.
        """
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2,
            max_tokens=500
        )
        return response.choices[0].message.content


In [5]:
# Launch the interactive loop for querying the AI agent
print("Welcome to Loan Review AI Agent!")
print("You can ask anything about the loan applicants data.")
print("Type 'exit' to quit.")

while True:
    user_input = input("\nYour question: ")
    if user_input.lower() == "exit":
        break
    response = ai_agent(user_input, df)
    print("\nAI Agent Response:")
    print(response)

Welcome to Loan Review AI Agent!
You can ask anything about the loan applicants data.
Type 'exit' to quit.

Your question: What is the average loan amount applied for by all applicants?

AI Agent Response:
The average loan amount applied for by all applicants is ₹146.41 thousand.

Your question: Who has the highest applicant income?

AI Agent Response:
The applicant with the highest income has ₹81000 and their Loan ID is LP002317.

Your question: How many applicants are self-employed?

AI Agent Response:
The number of applicants who are self-employed is 82.

Your question: exit


## 📌 Notes

- This is a demo project using OpenAI GPT-4o and pandas.
- It uses basic keyword parsing; more advanced intent recognition can be added.
- Extendable to other datasets and deployed as a Streamlit app.

## 🤝 Contributions
Feel free to fork this project or raise issues if you'd like to improve or extend it!
