# Building an AI Agent using OpenAI API

## Overview

This comprehensive guide will teach you how to build intelligent AI agents using the OpenAI API. You'll learn to create agents that can understand data context, process natural language queries, and provide intelligent responses using GPT models.

### What You'll Learn

- How to set up and configure the OpenAI API
- How to build an AI agent that processes structured data
- How to create interactive query systems
- Best practices for prompt engineering
- Handling edge cases and errors
- Troubleshooting common issues

### What is an AI Agent?

An AI agent is an autonomous system that can:
- **Understand context**: Process and comprehend data structures
- **Reason**: Analyze information step-by-step
- **Respond**: Generate natural language answers to user queries
- **Interact**: Engage in conversational exchanges

### Use Cases

- Data analysis assistants
- Customer support chatbots
- Business intelligence tools
- Educational tutoring systems
- Research assistants

---

## Table of Contents

1. [Prerequisites](#prerequisites)
2. [OpenAI API Setup](#openai-api-setup)
3. [Project Setup](#project-setup)
4. [Loading and Understanding Data](#loading-and-understanding-data)
5. [Building the AI Agent](#building-the-ai-agent)
6. [Creating Interactive Interface](#creating-interactive-interface)
7. [Testing and Examples](#testing-and-examples)
8. [Advanced Features](#advanced-features)
9. [Edge Cases and Error Handling](#edge-cases-and-error-handling)
10. [Troubleshooting Guide](#troubleshooting-guide)
11. [Exercises](#exercises)
12. [Summary](#summary)


## Prerequisites

Before we begin, make sure you have:

1. **Python 3.8+** installed on your system
2. **OpenAI API Account** with access to GPT models
3. **Basic knowledge** of Python and pandas
4. **Required packages** (we'll install these in the next section)

### Required Python Packages

- `openai` - OpenAI Python SDK
- `pandas` - Data manipulation and analysis
- `python-dotenv` - Environment variable management (recommended)


In [None]:
# Install required packages (uncomment if needed)
# !pip install openai pandas python-dotenv


## OpenAI API Setup

### Step 1: Create an OpenAI Account

1. Visit [OpenAI's website](https://platform.openai.com/)
2. Click "Sign Up" or "Log In" if you already have an account
3. Complete the registration process

### Step 2: Get Your API Key

1. Once logged in, navigate to the **API Keys** section:
   - Click on your profile icon (top right)
   - Select "API Keys" from the dropdown menu
   - Or go directly to: https://platform.openai.com/api-keys

2. Create a new API key:
   - Click "Create new secret key"
   - Give it a name (e.g., "AI Agent Project")
   - **Important**: Copy the key immediately - you won't be able to see it again!

3. Save your API key securely:
   - Never commit API keys to version control (Git)
   - Store it in environment variables or a `.env` file
   - Keep it private and don't share it publicly

### Step 3: Understanding API Pricing

- OpenAI charges based on **tokens** used (input + output)
- Different models have different pricing:
  - GPT-4o: More expensive but more capable
  - GPT-3.5-turbo: More affordable, good for many tasks
- Check current pricing at: https://openai.com/pricing
- Monitor your usage in the OpenAI dashboard: https://platform.openai.com/usage

### Step 4: Setting Up API Key in Your Code

**Method 1: Environment Variables (Recommended)**

Create a `.env` file in your project directory:
```
OPENAI_API_KEY=your-api-key-here
```

**Method 2: Direct Assignment (For Testing Only)**

‚ö†Ô∏è **Warning**: Only use this for local testing. Never commit API keys to version control!


In [1]:
# Import required libraries
import os
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv  # Optional: for .env file support

# ============================================================================
# API KEY LOADING METHOD
# ============================================================================
# Choose one of the following methods:
#   - 'DOTENV': Load from .env file (recommended for development)
#   - 'ENV': Load from environment variables (recommended for production)
#   - 'DIRECT': Direct assignment (for testing only - NOT for production!)
# ============================================================================
LOAD_METHOD = 'DOTENV'  # Change this to 'ENV', 'DIRECT', or 'DOTENV'

# Initialize OpenAI client based on LOAD_METHOD
client = None
api_key = None

try:
    if LOAD_METHOD == 'DOTENV':
        # Method 1: Load from .env file
        # Create a .env file in your project root with: OPENAI_API_KEY=your-key-here
        load_dotenv()
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            raise ValueError("API key not found in .env file. Please create a .env file with OPENAI_API_KEY=your-key")
        client = OpenAI(api_key=api_key)
        print("‚úÖ OpenAI client initialized using .env file method")
        
    elif LOAD_METHOD == 'ENV':
        # Method 2: Load from environment variables
        # Set it in your terminal: export OPENAI_API_KEY='your-key-here'
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            raise ValueError("API key not found in environment variables. Please set OPENAI_API_KEY")
        client = OpenAI(api_key=api_key)
        print("‚úÖ OpenAI client initialized using environment variable method")
        
    elif LOAD_METHOD == 'DIRECT':
        # Method 3: Direct assignment (FOR TESTING ONLY!)
        # ‚ö†Ô∏è WARNING: Never commit this to version control!
        # Replace 'YOUR_OPENAI_API_KEY_HERE' with your actual API key
        api_key = 'YOUR_OPENAI_API_KEY_HERE'
        if api_key == 'YOUR_OPENAI_API_KEY_HERE':
            raise ValueError("Please replace 'YOUR_OPENAI_API_KEY_HERE' with your actual API key")
        client = OpenAI(api_key=api_key)
        print("‚úÖ OpenAI client initialized using direct assignment method")
        print("‚ö†Ô∏è  WARNING: Direct assignment is for testing only. Use ENV or DOTENV for production!")
        
    else:
        raise ValueError(f"Invalid LOAD_METHOD: {LOAD_METHOD}. Use 'ENV', 'DIRECT', or 'DOTENV'")
    
    print(f"Last 5 characters of API Key: {api_key[-5:]}")
        
except ValueError as e:
    print(f"‚ùå Configuration Error: {e}")
    print("\nüìù Setup Instructions:")
    print("  - For DOTENV: Create a .env file with OPENAI_API_KEY=your-key")
    print("  - For ENV: Run 'export OPENAI_API_KEY=your-key' in terminal")
    print("  - For DIRECT: Replace 'YOUR_OPENAI_API_KEY_HERE' with your actual key")
    raise
    
except Exception as e:
    print(f"‚ùå Error initializing OpenAI client: {e}")
    print("\nPlease check:")
    print("  1. Your API key is valid")
    print("  2. You have internet connection")
    print("  3. Your OpenAI account has sufficient credits")
    raise


‚úÖ OpenAI client initialized using .env file method
Last 5 characters of API Key: OYloA


## Project Setup

### Understanding the Project Structure

We'll build an AI agent that:
1. **Loads structured data** (loan prediction dataset)
2. **Summarizes the data** to understand its structure
3. **Accepts natural language queries** from users
4. **Uses GPT models** to analyze and respond intelligently
5. **Provides conversational answers** based on the data

### Why Summarize Data Instead of Sending All Data?

- **Token Limits**: GPT models have token limits (e.g., GPT-4o has 128k tokens)
- **Cost Efficiency**: Sending entire datasets is expensive
- **Performance**: Smaller prompts process faster
- **Focus**: Summaries help the model understand structure without overwhelming detail

### Key Concepts

**Prompt Engineering**: The art of crafting effective prompts to get desired responses from LLMs.

**Context Window**: The maximum amount of text (tokens) a model can process in one request.

**Temperature**: Controls randomness in responses (0.0 = deterministic, 1.0 = creative).

**Max Tokens**: Limits the length of the model's response.


## Loading and Understanding Data

### About the Dataset

We'll use a loan prediction dataset that contains information about loan applicants. This dataset typically includes:
- Applicant demographics
- Financial information
- Loan details
- Approval status

### Download the Dataset

You can download the dataset from:
- Kaggle: Search for "loan prediction dataset"
  - https://www.kaggle.com/datasets?search=load+prediction+dataset
  - https://www.kaggle.com/datasets/deeplumiere/load-pred-dataset
- Or use any CSV file with structured data for practice

**Note**: If you don't have the dataset, we'll create a sample dataset for demonstration purposes.


In [2]:
# Load the loan prediction dataset from the data folder
import numpy as np
import os

# Path to the dataset (notebook is in src/, data is in ../data/)
data_path = '../data/Load-Prediction-Data/loan_prediction.csv'

# Check if file exists, otherwise create sample data
if os.path.exists(data_path):
    # Option 1: Load from CSV file
    df = pd.read_csv(data_path)
    print("‚úÖ Dataset loaded from CSV file successfully!")
    print(f"   File path: {data_path}")
else:
    # Option 2: Create a sample dataset for demonstration (if file not found)
    print("‚ö†Ô∏è  Dataset file not found. Creating sample data for demonstration...")
    print(f"   Expected path: {data_path}")
    print("   If you have the dataset, please ensure it's in the correct location.")
    
    np.random.seed(42)
    n_samples = 500

    data = {
        'Loan_ID': [f'LP{i:04d}' for i in range(1, n_samples + 1)],
        'Gender': np.random.choice(['Male', 'Female'], n_samples),
        'Married': np.random.choice(['Yes', 'No'], n_samples),
        'Dependents': np.random.choice(['0', '1', '2', '3+'], n_samples),
        'Education': np.random.choice(['Graduate', 'Not Graduate'], n_samples),
        'Self_Employed': np.random.choice(['Yes', 'No'], n_samples),
        'ApplicantIncome': np.random.randint(1500, 81000, n_samples),
        'CoapplicantIncome': np.random.randint(0, 50000, n_samples),
        'LoanAmount': np.random.randint(9, 700, n_samples),
        'Loan_Amount_Term': np.random.choice([12, 36, 60, 84, 120, 180, 240, 300, 360], n_samples),
        'Credit_History': np.random.choice([0, 1], n_samples, p=[0.2, 0.8]),
        'Property_Area': np.random.choice(['Urban', 'Rural', 'Semiurban'], n_samples),
        'Loan_Status': np.random.choice(['Y', 'N'], n_samples, p=[0.7, 0.3])
    }

    df = pd.DataFrame(data)
    print("‚úÖ Sample dataset created successfully!")

print(f"\nDataset shape: {df.shape}")
print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")
print("\nFirst few rows:")
df.head()


‚úÖ Dataset loaded from CSV file successfully!
   File path: ../data/Load-Prediction-Data/loan_prediction.csv

Dataset shape: (20000, 22)
Rows: 20000, Columns: 22

First few rows:


Unnamed: 0,age,gender,marital_status,education_level,annual_income,monthly_income,employment_status,debt_to_income_ratio,credit_score,loan_amount,...,loan_term,installment,grade_subgrade,num_of_open_accounts,total_credit_limit,current_balance,delinquency_history,public_records,num_of_delinquencies,loan_paid_back
0,59,Male,Married,Master's,24240.19,2020.02,Employed,0.074,743,17173.72,...,36,581.88,B5,7,40833.47,24302.07,1,0,1,1
1,72,Female,Married,Bachelor's,20172.98,1681.08,Employed,0.219,531,22663.89,...,60,573.17,F1,5,27968.01,10803.01,1,0,3,1
2,49,Female,Single,High School,26181.8,2181.82,Employed,0.234,779,3631.36,...,60,76.32,B4,2,15502.25,4505.44,0,0,0,1
3,35,Female,Single,High School,11873.84,989.49,Employed,0.264,809,14939.23,...,36,468.07,A5,7,18157.79,5525.63,4,0,5,1
4,63,Other,Single,Other,25326.44,2110.54,Employed,0.26,663,16551.71,...,60,395.5,D5,1,17467.56,3593.91,2,0,2,1


In [3]:
# Explore the dataset
print("Dataset Information:")
print("=" * 50)
print(f"Total rows: {len(df)}")
print(f"Total columns: {len(df.columns)}")
print("\nColumn names and data types:")
print(df.dtypes)
print("\n" + "=" * 50)
print("\nBasic Statistics:")
print(df.describe())
print("\n" + "=" * 50)
print("\nMissing values:")
print(df.isnull().sum())


Dataset Information:
Total rows: 20000
Total columns: 22

Column names and data types:
age                       int64
gender                   object
marital_status           object
education_level          object
annual_income           float64
monthly_income          float64
employment_status        object
debt_to_income_ratio    float64
credit_score              int64
loan_amount             float64
loan_purpose             object
interest_rate           float64
loan_term                 int64
installment             float64
grade_subgrade           object
num_of_open_accounts      int64
total_credit_limit      float64
current_balance         float64
delinquency_history       int64
public_records            int64
num_of_delinquencies      int64
loan_paid_back            int64
dtype: object


Basic Statistics:
                age  annual_income  monthly_income  debt_to_income_ratio  \
count  20000.000000   20000.000000    20000.000000          20000.000000   
mean      48.027000   4

## Creating Data Summary Function

### Why Create a Summary?

Instead of sending the entire dataset to the AI model (which would be expensive and hit token limits), we create a summary that includes:
- Dataset dimensions
- Column names and data types
- Basic statistics (optional)

This gives the AI agent enough context to understand the data structure and answer questions intelligently.


In [4]:
def create_data_summary(df):
    """
    Create a comprehensive summary of the dataset for the AI agent.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        The dataset to summarize
    
    Returns:
    --------
    str : A formatted string containing dataset summary
    """
    summary = f"The dataset has {df.shape[0]} rows and {df.shape[1]} columns.\n\n"
    
    summary += "Columns and their data types:\n"
    for col in df.columns:
        dtype = str(df[col].dtype)
        summary += f"- {col} (type: {dtype})\n"
    
    # Add some basic statistics for numeric columns
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    if len(numeric_cols) > 0:
        summary += "\nNumeric columns statistics:\n"
        for col in numeric_cols[:5]:  # Limit to first 5 numeric columns
            summary += f"- {col}: min={df[col].min()}, max={df[col].max()}, mean={df[col].mean():.2f}\n"
    
    # Add value counts for categorical columns
    categorical_cols = df.select_dtypes(include=['object']).columns
    if len(categorical_cols) > 0:
        summary += "\nCategorical columns sample values:\n"
        for col in categorical_cols[:3]:  # Limit to first 3 categorical columns
            unique_vals = df[col].unique()[:5]  # First 5 unique values
            summary += f"- {col}: {', '.join(map(str, unique_vals))}\n"
    
    return summary

# Test the summary function
summary = create_data_summary(df)
print("Data Summary:")
print("=" * 60)
print(summary)


Data Summary:
The dataset has 20000 rows and 22 columns.

Columns and their data types:
- age (type: int64)
- gender (type: object)
- marital_status (type: object)
- education_level (type: object)
- annual_income (type: float64)
- monthly_income (type: float64)
- employment_status (type: object)
- debt_to_income_ratio (type: float64)
- credit_score (type: int64)
- loan_amount (type: float64)
- loan_purpose (type: object)
- interest_rate (type: float64)
- loan_term (type: int64)
- installment (type: float64)
- grade_subgrade (type: object)
- num_of_open_accounts (type: int64)
- total_credit_limit (type: float64)
- current_balance (type: float64)
- delinquency_history (type: int64)
- public_records (type: int64)
- num_of_delinquencies (type: int64)
- loan_paid_back (type: int64)

Numeric columns statistics:
- age: min=21, max=75, mean=48.03
- annual_income: min=6000.0, max=400000.0, mean=43549.64
- monthly_income: min=500.0, max=33333.33, mean=3629.14
- debt_to_income_ratio: min=0.01, ma

## Building the AI Agent

### Understanding the AI Agent Function

The AI agent function:
1. Takes a user query and the dataset
2. Creates a data summary
3. Constructs a prompt with context and query
4. Sends the prompt to OpenAI's GPT model
5. Returns the model's response

### Prompt Engineering Best Practices

1. **Be Clear and Specific**: Clearly define the agent's role
2. **Provide Context**: Give enough information for the model to understand
3. **Set Expectations**: Tell the model how to structure its response
4. **Use Examples**: Show the model what kind of output you want
5. **Iterate**: Refine prompts based on results

### Model Parameters Explained

- **model**: Which GPT model to use (gpt-4o-mini, gpt-4o, gpt-3.5-turbo, etc.)
- **temperature**: Controls randomness (0.0-2.0)
  - 0.0-0.3: More deterministic, factual
  - 0.4-0.7: Balanced
  - 0.8-2.0: More creative, varied
- **max_tokens**: Maximum length of response
- **messages**: Conversation history (role: system/user/assistant)


In [10]:
def ai_agent(user_query, df, model="gpt-4o-mini", temperature=0.2, max_tokens=500):
    """
    AI Agent function that processes user queries about the dataset.
    
    Parameters:
    -----------
    user_query : str
        The user's question about the data
    df : pandas.DataFrame
        The dataset to analyze
    model : str, optional
        OpenAI model to use (default: "gpt-4o-mini")
    temperature : float, optional
        Model temperature (0.0-2.0), default 0.2 for more deterministic responses
    max_tokens : int, optional
        Maximum response length, default 500
    
    Returns:
    --------
    str : The AI agent's response
    """
    # Create data summary
    data_context = create_data_summary(df)
    
    # Construct the prompt
    prompt = f"""You are a data expert AI agent specialized in analyzing structured datasets.

You have been provided with this dataset summary:
{data_context}

Now, based on the user's question:
'{user_query}'

Instructions:
1. Think step-by-step about how to answer this question
2. Assume you can access and analyze the dataset like a Data Scientist would using Pandas
3. Consider what columns and operations would be needed
4. Provide a clear, concise, and accurate answer
5. If the question cannot be answered with the available data, explain why

Give a clear, final answer with your reasoning."""
    
    try:
        # Make API call to OpenAI
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful data analysis assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        # Extract the answer
        answer = response.choices[0].message.content
        return answer
    
    except Exception as e:
        return f"Error: {str(e)}. Please check your API key and connection."


In [None]:
# Test the AI agent with a sample query
print("Testing AI Agent...")
print("=" * 60)
test_query = "What is the average loan amount?"
# Uncomment below line to run the agent
# response = ai_agent(test_query, df, model="gpt-4o-mini")
# print(f"Query: {test_query}")
# print(f"\nResponse:\n{response}")

Testing AI Agent...
Query: What is the average loan amount?

Response:
To answer the question "What is the average loan amount?", we can follow these steps:

1. **Identify the Relevant Column**: The question specifically asks about the loan amount, which corresponds to the `loan_amount` column in the dataset.

2. **Understand the Data Type**: The `loan_amount` column is of type `float64`, which means it contains numerical values that can be used for calculations.

3. **Calculate the Average**: To find the average loan amount, we will sum all the values in the `loan_amount` column and then divide by the number of entries (rows) in that column.

4. **Use Pandas for Calculation**: Assuming we have access to the dataset as a Pandas DataFrame, we can use the following code snippet to calculate the average loan amount:

   ```python
   average_loan_amount = df['loan_amount'].mean()
   ```

5. **Final Answer**: Since the dataset summary does not provide the average loan amount directly, we wo

In [11]:
def interactive_agent(df):
    """
    Interactive loop for querying the AI agent.
    
    Parameters:
    -----------
    df : pandas.DataFrame
        The dataset to query
    """
    print("=" * 60)
    print("Welcome to Loan Review AI Agent!")
    print("=" * 60)
    print("You can ask anything about the loan applicants data.")
    print("Type 'exit' or 'quit' to end the session.")
    print("Type 'help' for example questions.")
    print("=" * 60)
    
    while True:
        user_input = input("\nüí¨ Your question: ").strip()
        
        if not user_input:
            print("Please enter a question.")
            continue
        
        if user_input.lower() in ['exit', 'quit', 'q']:
            print("\nüëã Thank you for using the AI Agent. Goodbye!")
            break
        
        if user_input.lower() == 'help':
            print("\nüìù Example questions you can ask:")
            print("  - What is the average loan amount?")
            print("  - How many applicants are self-employed?")
            print("  - What is the highest applicant income?")
            print("  - How many loans were approved?")
            print("  - What is the distribution of property areas?")
            continue
        
        print("\nü§î Processing your question...")
        try:
            response = ai_agent(user_input, df)
            print("\nü§ñ AI Agent Response:")
            print("-" * 60)
            print(response)
            print("-" * 60)
        except Exception as e:
            print(f"\n‚ùå Error: {e}")
            print("Please try again or check your API configuration.")


In [None]:
# Uncomment to run interactive session
# interactive_agent(df)

Welcome to Loan Review AI Agent!
You can ask anything about the loan applicants data.
Type 'exit' or 'quit' to end the session.
Type 'help' for example questions.

ü§î Processing your question...

ü§ñ AI Agent Response:
------------------------------------------------------------
To answer the question "how many loans are approved?", we need to identify the relevant column in the dataset that indicates whether a loan has been approved or not. 

### Step-by-Step Analysis:

1. **Identify Relevant Column**: 
   - The dataset summary includes a column named `loan_paid_back`, which likely indicates whether a loan was successfully paid back. However, it does not explicitly state whether this column directly correlates with loan approval. Typically, a loan that is paid back would imply that it was approved, but we need to confirm this assumption.

2. **Understanding the `loan_paid_back` Column**:
   - We need to check the values in the `loan_paid_back` column. If it is a binary indicator (e

## Testing and Examples

Let's test the AI agent with various types of questions to see how it handles different query patterns.


In [None]:
# Example 1: Statistical Question
print("Example 1: Statistical Analysis")
print("=" * 60)
query1 = "What is the average loan amount applied for by all applicants?"
print(f"Query: {query1}\n")
response1 = ai_agent(query1, df)
print(f"Response: {response1}\n")

# Verify with actual calculation
actual_avg = df['LoanAmount'].mean()
print(f"Actual average (for verification): {actual_avg:.2f}")


In [None]:
# Example 2: Maximum Value Question
print("Example 2: Finding Maximum Value")
print("=" * 60)
query2 = "Who has the highest applicant income?"
print(f"Query: {query2}\n")
response2 = ai_agent(query2, df)
print(f"Response: {response2}\n")

# Verify with actual calculation
max_income = df['ApplicantIncome'].max()
max_income_row = df[df['ApplicantIncome'] == max_income]
print(f"Actual maximum income (for verification): {max_income}")
print(f"Loan ID with max income: {max_income_row['Loan_ID'].values[0]}")


In [None]:
# Example 3: Counting/Filtering Question
print("Example 3: Counting Records")
print("=" * 60)
query3 = "How many applicants are self-employed?"
print(f"Query: {query3}\n")
response3 = ai_agent(query3, df)
print(f"Response: {response3}\n")

# Verify with actual calculation
self_employed_count = len(df[df['Self_Employed'] == 'Yes'])
print(f"Actual count (for verification): {self_employed_count}")


In [None]:
# Example 4: Conditional Analysis
print("Example 4: Conditional Analysis")
print("=" * 60)
query4 = "How many loans were approved?"
print(f"Query: {query4}\n")
response4 = ai_agent(query4, df)
print(f"Response: {response4}\n")

# Verify with actual calculation
approved_count = len(df[df['Loan_Status'] == 'Y'])
print(f"Actual approved loans (for verification): {approved_count}")


In [None]:
# Example 5: Distribution Question
print("Example 5: Distribution Analysis")
print("=" * 60)
query5 = "What is the distribution of property areas in the dataset?"
print(f"Query: {query5}\n")
response5 = ai_agent(query5, df)
print(f"Response: {response5}\n")

# Verify with actual calculation
property_dist = df['Property_Area'].value_counts()
print(f"Actual distribution (for verification):\n{property_dist}")


## Advanced Features

### Enhanced AI Agent with Better Error Handling

Let's create an improved version of the AI agent with:
- Better error handling
- Response validation
- Cost tracking
- Response caching (optional)


In [None]:
def enhanced_ai_agent(user_query, df, model="gpt-4o", temperature=0.2, max_tokens=500, verbose=True):
    """
    Enhanced AI Agent with better error handling and response tracking.
    
    Parameters:
    -----------
    user_query : str
        The user's question about the data
    df : pandas.DataFrame
        The dataset to analyze
    model : str, optional
        OpenAI model to use
    temperature : float, optional
        Model temperature
    max_tokens : int, optional
        Maximum response length
    verbose : bool, optional
        Whether to print additional information
    
    Returns:
    --------
    dict : Dictionary containing response, metadata, and error info
    """
    data_context = create_data_summary(df)
    
    prompt = f"""You are a data expert AI agent specialized in analyzing structured datasets.

You have been provided with this dataset summary:
{data_context}

Now, based on the user's question:
'{user_query}'

Instructions:
1. Think step-by-step about how to answer this question
2. Assume you can access and analyze the dataset like a Data Scientist would using Pandas
3. Consider what columns and operations would be needed
4. Provide a clear, concise, and accurate answer
5. If the question cannot be answered with the available data, explain why

Give a clear, final answer with your reasoning."""
    
    result = {
        'query': user_query,
        'response': None,
        'error': None,
        'tokens_used': None,
        'model_used': model,
        'success': False
    }
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful data analysis assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        result['response'] = response.choices[0].message.content
        result['tokens_used'] = response.usage.total_tokens
        result['success'] = True
        
        if verbose:
            print(f"‚úÖ Query processed successfully")
            print(f"üìä Tokens used: {result['tokens_used']}")
            print(f"ü§ñ Model: {result['model_used']}")
        
    except Exception as e:
        result['error'] = str(e)
        result['response'] = f"Error processing query: {str(e)}"
        
        if verbose:
            print(f"‚ùå Error: {e}")
    
    return result

# Test the enhanced agent
print("Testing Enhanced AI Agent")
print("=" * 60)
test_result = enhanced_ai_agent("What is the average applicant income?", df)
print(f"\nQuery: {test_result['query']}")
print(f"\nResponse:\n{test_result['response']}")
print(f"\nMetadata: {test_result}")


In [None]:
def analyze_data(query, df):
    """
    Perform basic data analysis based on common query patterns.
    This function attempts to extract the answer from the data directly.
    """
    query_lower = query.lower()
    
    # Pattern matching for common queries
    if 'average' in query_lower and 'loan' in query_lower and 'amount' in query_lower:
        return f"The average loan amount is {df['LoanAmount'].mean():.2f}"
    
    elif 'highest' in query_lower or 'maximum' in query_lower:
        if 'income' in query_lower:
            max_val = df['ApplicantIncome'].max()
            return f"The highest applicant income is {max_val}"
        elif 'loan' in query_lower:
            max_val = df['LoanAmount'].max()
            return f"The highest loan amount is {max_val}"
    
    elif 'how many' in query_lower or 'count' in query_lower:
        if 'self-employed' in query_lower or 'self employed' in query_lower:
            count = len(df[df['Self_Employed'] == 'Yes'])
            return f"There are {count} self-employed applicants"
        elif 'approved' in query_lower:
            count = len(df[df['Loan_Status'] == 'Y'])
            return f"There are {count} approved loans"
    
    return None  # Could not extract answer automatically

def hybrid_ai_agent(user_query, df):
    """
    Hybrid approach: Try to get answer from data first, then use AI for explanation.
    """
    # Try to get direct answer
    direct_answer = analyze_data(user_query, df)
    
    data_context = create_data_summary(df)
    
    if direct_answer:
        prompt = f"""You are a data expert AI agent.

Dataset summary:
{data_context}

The user asked: '{user_query}'

Based on data analysis, the answer is: {direct_answer}

Please provide a clear, natural language explanation of this result."""
    else:
        prompt = f"""You are a data expert AI agent.

Dataset summary:
{data_context}

User question: '{user_query}'

Think step-by-step and provide a clear answer."""
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2,
            max_tokens=500
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Test hybrid approach
print("Testing Hybrid AI Agent")
print("=" * 60)
hybrid_response = hybrid_ai_agent("What is the average loan amount?", df)
print(f"Response: {hybrid_response}")


In [None]:
import time
from openai import APIError, RateLimitError, APIConnectionError

def robust_ai_agent(user_query, df, model="gpt-4o", temperature=0.2, max_tokens=500, max_retries=3):
    """
    Robust AI Agent with comprehensive error handling.
    
    Parameters:
    -----------
    user_query : str
        The user's question
    df : pandas.DataFrame
        The dataset
    model : str
        Model to use
    temperature : float
        Temperature setting
    max_tokens : int
        Maximum tokens
    max_retries : int
        Maximum retry attempts for transient errors
    
    Returns:
    --------
    dict : Response with status and message
    """
    # Validate inputs
    if not user_query or not user_query.strip():
        return {
            'success': False,
            'response': 'Please provide a valid question.',
            'error_type': 'invalid_input'
        }
    
    if df.empty:
        return {
            'success': False,
            'response': 'The dataset is empty. Cannot answer questions.',
            'error_type': 'empty_dataset'
        }
    
    data_context = create_data_summary(df)
    
    prompt = f"""You are a data expert AI agent.

Dataset summary:
{data_context}

User question: '{user_query}'

Provide a clear, step-by-step answer. If the question cannot be answered with the available data, explain why."""
    
    # Retry logic for transient errors
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": "You are a helpful data analysis assistant."},
                    {"role": "user", "content": prompt}
                ],
                temperature=temperature,
                max_tokens=max_tokens
            )
            
            return {
                'success': True,
                'response': response.choices[0].message.content,
                'tokens_used': response.usage.total_tokens,
                'model': model,
                'attempt': attempt + 1
            }
        
        except RateLimitError as e:
            wait_time = 2 ** attempt  # Exponential backoff
            if attempt < max_retries - 1:
                print(f"‚ö†Ô∏è Rate limit hit. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                return {
                    'success': False,
                    'response': 'Rate limit exceeded. Please try again later.',
                    'error_type': 'rate_limit',
                    'error': str(e)
                }
        
        except APIConnectionError as e:
            wait_time = 2 ** attempt
            if attempt < max_retries - 1:
                print(f"‚ö†Ô∏è Connection error. Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
                continue
            else:
                return {
                    'success': False,
                    'response': 'Connection error. Please check your internet connection.',
                    'error_type': 'connection_error',
                    'error': str(e)
                }
        
        except APIError as e:
            error_code = getattr(e, 'code', None)
            if error_code == 'invalid_api_key':
                return {
                    'success': False,
                    'response': 'Invalid API key. Please check your OpenAI API key.',
                    'error_type': 'authentication_error',
                    'error': str(e)
                }
            elif error_code == 'insufficient_quota':
                return {
                    'success': False,
                    'response': 'Insufficient API quota. Please check your OpenAI account.',
                    'error_type': 'quota_error',
                    'error': str(e)
                }
            else:
                return {
                    'success': False,
                    'response': f'API error: {str(e)}',
                    'error_type': 'api_error',
                    'error': str(e)
                }
        
        except Exception as e:
            return {
                'success': False,
                'response': f'Unexpected error: {str(e)}',
                'error_type': 'unknown_error',
                'error': str(e)
            }
    
    return {
        'success': False,
        'response': 'Failed after multiple retry attempts.',
        'error_type': 'max_retries_exceeded'
    }

# Test error handling
print("Testing Robust AI Agent")
print("=" * 60)

# Test with valid query
result1 = robust_ai_agent("What is the average loan amount?", df)
print(f"Valid query result: {result1['success']}")
if result1['success']:
    print(f"Response: {result1['response'][:100]}...")

# Test with empty query
result2 = robust_ai_agent("", df)
print(f"\nEmpty query result: {result2}")


## Understanding 429 Quota Errors

### The Confusion: Budget vs. Quota vs. Rate Limits

If you're seeing a **429 error** saying "You exceeded your current quota" but your dashboard shows **$0 used out of $5 budgeted**, you're encountering a common confusion between different types of limits:

#### 1. **Account Budget** (What you see in Dashboard)
- This is your **spending limit** (e.g., $5 for January)
- Shows how much money you've spent
- This is **NOT** the same as quota

#### 2. **API Quota** (What causes 429 errors)
- This is a **usage limit** separate from budget
- Can be related to:
  - **Account tier** (free tier has lower quotas)
  - **Payment method** (accounts without valid payment may have restricted quotas)
  - **Model-specific limits** (newer models like gpt-4o may have stricter quotas)
  - **Rate limits** (requests per minute/hour)

#### 3. **Rate Limits** (Requests per time period)
- How many requests you can make per minute/hour/day
- Different for each model
- Separate from quota and budget

### Common Causes of 429 "Quota Exceeded" Errors

1. **No Payment Method Added**
   - Even with a budget set, if no payment method is on file, quotas are very limited
   - **Solution**: Add a payment method in OpenAI dashboard ‚Üí Billing ‚Üí Payment methods

2. **Free Tier Limitations**
   - Free tier accounts have very low quotas
   - **Solution**: Upgrade to a paid plan

3. **Model-Specific Quotas**
   - Some models (especially newer ones like gpt-4o) have stricter quotas
   - **Solution**: Try using `gpt-3.5-turbo` which has higher quotas

4. **Rate Limiting**
   - Too many requests too quickly
   - **Solution**: Add delays between requests, use exponential backoff

5. **Account Verification Issues**
   - Unverified accounts have lower quotas
   - **Solution**: Complete account verification

### How to Fix 429 Quota Errors

#### Step 1: Check Your Account Status
1. Go to: https://platform.openai.com/account/usage
2. Check if you have a payment method added
3. Verify your account is fully set up

#### Step 2: Add Payment Method (If Missing)
1. Go to: https://platform.openai.com/account/billing/payment-methods
2. Add a credit card or other payment method
3. Even if you have a budget, a payment method is often required for higher quotas

#### Step 3: Check Rate Limits
1. Go to: https://platform.openai.com/account/rate-limits
2. See your current rate limits for each model
3. Adjust your code to respect these limits

#### Step 4: Use a Different Model (Temporary Fix)
If you need to continue working immediately, switch to a model with higher quotas:

```python
# Instead of gpt-4o, use gpt-3.5-turbo which has higher quotas
response = ai_agent(query, df, model="gpt-3.5-turbo")
```

#### Step 5: Implement Rate Limiting in Your Code
Add delays between requests to avoid hitting rate limits:

```python
import time

def rate_limited_agent(query, df, delay=1):
    """Agent with built-in rate limiting."""
    time.sleep(delay)  # Wait between requests
    return ai_agent(query, df)
```

### Quick Diagnostic Function

Use this function to check your account status:


In [None]:
def diagnose_quota_issue():
    """
    Diagnostic function to help identify quota/rate limit issues.
    """
    print("üîç Diagnosing OpenAI API Quota Issues...")
    print("=" * 60)
    
    try:
        # Test 1: Check if API key is set
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            print("‚ùå API key not found")
            return
        
        print("‚úÖ API key found")
        
        # Test 2: Try a simple request with gpt-3.5-turbo (usually has higher quotas)
        print("\nüìä Testing with gpt-3.5-turbo (higher quota model)...")
        try:
            test_client = OpenAI(api_key=api_key)
            response = test_client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": "Say 'test'"}],
                max_tokens=5
            )
            print("‚úÖ gpt-3.5-turbo works! Your account has quota for this model.")
            print("üí° Tip: Use gpt-3.5-turbo instead of gpt-4o if you're hitting quota limits")
        except Exception as e:
            error_str = str(e)
            if '429' in error_str or 'quota' in error_str.lower() or 'rate_limit' in error_str.lower():
                print("‚ùå Quota/Rate limit error detected")
                print(f"   Error: {error_str[:200]}")
                print("\nüîß Solutions:")
                print("   1. Add a payment method: https://platform.openai.com/account/billing/payment-methods")
                print("   2. Check rate limits: https://platform.openai.com/account/rate-limits")
                print("   3. Wait a few minutes and try again")
                print("   4. Use gpt-3.5-turbo instead of gpt-4o")
            else:
                print(f"‚ùå Other error: {e}")
        
        # Test 3: Try with gpt-4o (if gpt-3.5-turbo worked)
        print("\nüìä Testing with gpt-4o (may have stricter quotas)...")
        try:
            response = test_client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Say 'test'"}],
                max_tokens=5
            )
            print("‚úÖ gpt-4o also works!")
        except Exception as e:
            error_str = str(e)
            if '429' in error_str or 'quota' in error_str.lower():
                print("‚ö†Ô∏è  gpt-4o has quota restrictions")
                print("üí° Use gpt-3.5-turbo for now, or add payment method for higher gpt-4o quotas")
            else:
                print(f"‚ùå Error with gpt-4o: {e}")
        
        print("\n" + "=" * 60)
        print("üìù Next Steps:")
        print("   1. Visit: https://platform.openai.com/account/usage")
        print("   2. Check: https://platform.openai.com/account/rate-limits")
        print("   3. Ensure payment method is added")
        print("   4. Consider using gpt-3.5-turbo for development")
        
    except Exception as e:
        print(f"‚ùå Diagnostic failed: {e}")

# Uncomment to run diagnostic
# diagnose_quota_issue()


In [None]:
# Utility function to test API connection
def test_openai_connection():
    """Test if OpenAI API is properly configured."""
    print("Testing OpenAI API Connection...")
    print("=" * 60)
    
    try:
        # Test 1: Check API key
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            print("‚ùå API key not found in environment variables")
            return False
        print(f"‚úÖ API key found (length: {len(api_key)})")
        
        # Test 2: Initialize client
        client = OpenAI(api_key=api_key)
        print("‚úÖ OpenAI client initialized")
        
        # Test 3: List available models
        try:
            models = client.models.list()
            print(f"‚úÖ Can access models API ({len(list(models))} models available)")
        except Exception as e:
            print(f"‚ö†Ô∏è Could not list models: {e}")
        
        # Test 4: Simple completion
        try:
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": "Say 'test successful' if you can read this."}],
                max_tokens=10
            )
            print(f"‚úÖ Test completion successful: {response.choices[0].message.content}")
            return True
        except Exception as e:
            print(f"‚ùå Test completion failed: {e}")
            return False
            
    except Exception as e:
        print(f"‚ùå Connection test failed: {e}")
        return False

# Uncomment to run connection test
# test_openai_connection()


## Exercises

### Exercise 1: Basic Agent Setup

**Task**: Set up your OpenAI API key and test a simple query.

**Steps**:
1. Get your OpenAI API key
2. Set it up using environment variables or .env file
3. Test the connection using the `test_openai_connection()` function
4. Run a simple query: "What columns are in the dataset?"

**Expected Output**: A response describing the dataset columns.

---

### Exercise 2: Custom Query Function

**Task**: Create a function that asks the AI agent a question and prints the response nicely formatted.

**Requirements**:
- Function should take a query string as input
- Format the output with clear separators
- Include the query and response
- Handle errors gracefully

**Template**:
```python
def ask_question(query):
    # Your code here
    pass

# Test it
ask_question("How many rows are in the dataset?")
```

---

### Exercise 3: Batch Questions

**Task**: Create a function that processes multiple questions at once.

**Requirements**:
- Accept a list of questions
- Process each question
- Return a dictionary mapping questions to answers
- Include error handling

**Template**:
```python
def batch_questions(questions_list, df):
    # Your code here
    pass

# Test it
questions = [
    "What is the average loan amount?",
    "How many applicants are graduates?",
    "What percentage of loans were approved?"
]
results = batch_questions(questions, df)
```

---

### Exercise 4: Enhanced Data Summary

**Task**: Improve the `create_data_summary()` function to include more detailed statistics.

**Requirements**:
- Add missing value counts
- Include unique value counts for categorical columns
- Add correlation information for numeric columns
- Make it more informative for the AI agent

---

### Exercise 5: Query Validation

**Task**: Create a function that validates user queries before sending to the AI.

**Requirements**:
- Check if query is empty
- Check if query is too long (e.g., > 500 characters)
- Detect potentially harmful queries
- Provide helpful error messages

**Template**:
```python
def validate_query(query):
    # Your validation logic
    # Return (is_valid, error_message)
    pass
```

---

### Exercise 6: Response Caching

**Task**: Implement a simple caching mechanism to avoid redundant API calls.

**Requirements**:
- Cache responses for identical queries
- Use a dictionary to store query-response pairs
- Add a function to clear cache
- Optional: Add cache expiration

**Template**:
```python
query_cache = {}

def cached_ai_agent(query, df, use_cache=True):
    # Check cache first
    # If not in cache, call API and store result
    # Return cached or new response
    pass
```

---

### Exercise 7: Cost Tracking

**Task**: Create a cost tracker that monitors API usage.

**Requirements**:
- Track number of requests
- Track total tokens used
- Calculate estimated cost (you'll need to look up current pricing)
- Display usage statistics

**Template**:
```python
class CostTracker:
    def __init__(self):
        self.requests = 0
        self.total_tokens = 0
        # Add more tracking variables
    
    def add_usage(self, tokens):
        # Update tracking
        pass
    
    def get_stats(self):
        # Return usage statistics
        pass
```

---

### Exercise 8: Multi-Model Comparison

**Task**: Compare responses from different GPT models.

**Requirements**:
- Test the same query with gpt-3.5-turbo and gpt-4o
- Compare response quality
- Compare response time
- Compare token usage

**Template**:
```python
def compare_models(query, df):
    models = ["gpt-3.5-turbo", "gpt-4o"]
    results = {}
    for model in models:
        # Test with each model
        # Store results
        pass
    return results
```

---

### Exercise 9: Interactive Improvements

**Task**: Enhance the interactive loop with more features.

**Requirements**:
- Add command history (use arrow keys)
- Add ability to save conversation
- Add ability to export responses
- Add query suggestions

---

### Exercise 10: Real Data Analysis Integration

**Task**: Create a version that performs actual data analysis and includes results in the prompt.

**Requirements**:
- Detect query type (statistical, filtering, aggregation, etc.)
- Perform the actual pandas operation
- Include results in the prompt
- Let AI provide natural language explanation

**Example**:
```python
def smart_ai_agent(query, df):
    # Analyze query to determine what operation is needed
    # Perform the operation
    # Include result in prompt
    # Get AI explanation
    pass
```


### Exercise Solutions (Try First, Then Check!)

<details>
<summary>Click to view Exercise 2 Solution</summary>

```python
def ask_question(query, df):
    """Ask a question and print formatted response."""
    print("=" * 60)
    print(f"‚ùì Question: {query}")
    print("=" * 60)
    
    try:
        response = ai_agent(query, df)
        print(f"ü§ñ Answer:\n{response}")
        print("=" * 60)
    except Exception as e:
        print(f"‚ùå Error: {e}")
        print("=" * 60)

# Test
ask_question("How many rows are in the dataset?", df)
```

</details>

<details>
<summary>Click to view Exercise 3 Solution</summary>

```python
def batch_questions(questions_list, df):
    """Process multiple questions at once."""
    results = {}
    
    for i, question in enumerate(questions_list, 1):
        print(f"Processing question {i}/{len(questions_list)}: {question}")
        try:
            answer = ai_agent(question, df)
            results[question] = {
                'answer': answer,
                'status': 'success'
            }
        except Exception as e:
            results[question] = {
                'answer': None,
                'status': 'error',
                'error': str(e)
            }
    
    return results

# Test
questions = [
    "What is the average loan amount?",
    "How many applicants are graduates?",
    "What percentage of loans were approved?"
]
results = batch_questions(questions, df)

# Display results
for q, r in results.items():
    print(f"\nQ: {q}")
    print(f"A: {r['answer']}")
```

</details>


## Additional Examples and Use Cases

### Example 1: Financial Analysis Agent

Let's create a specialized agent for financial analysis:


In [None]:
def financial_analysis_agent(query, df):
    """Specialized agent for financial analysis queries."""
    # Calculate key financial metrics
    total_income = df['ApplicantIncome'].sum()
    avg_income = df['ApplicantIncome'].mean()
    total_loans = df['LoanAmount'].sum()
    avg_loan = df['LoanAmount'].mean()
    approval_rate = (df['Loan_Status'] == 'Y').mean() * 100
    
    financial_context = f"""
Financial Summary:
- Total Applicant Income: {total_income:,.2f}
- Average Applicant Income: {avg_income:,.2f}
- Total Loan Amount: {total_loans:,.2f}
- Average Loan Amount: {avg_loan:,.2f}
- Loan Approval Rate: {approval_rate:.2f}%
"""
    
    data_context = create_data_summary(df)
    
    prompt = f"""You are a financial analysis expert AI agent.

Dataset Summary:
{data_context}

Financial Metrics:
{financial_context}

User Question: '{query}'

Provide a detailed financial analysis with insights and recommendations."""
    
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3,
            max_tokens=600
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Test financial analysis agent
print("Financial Analysis Agent Test")
print("=" * 60)
financial_query = "What insights can you provide about loan approval patterns?"
financial_response = financial_analysis_agent(financial_query, df)
print(f"Query: {financial_query}\n")
print(f"Response:\n{financial_response}")


### Example 2: Comparison Queries

Handle queries that require comparing different groups:


In [None]:
# Example: Compare different groups
comparison_query = "Compare loan approval rates between urban and rural property areas"

# Calculate actual statistics
urban_approved = (df[df['Property_Area'] == 'Urban']['Loan_Status'] == 'Y').mean() * 100
rural_approved = (df[df['Property_Area'] == 'Rural']['Loan_Status'] == 'Y').mean() * 100
semiurban_approved = (df[df['Property_Area'] == 'Semiurban']['Loan_Status'] == 'Y').mean() * 100

comparison_stats = f"""
Approval Rates by Property Area:
- Urban: {urban_approved:.2f}%
- Rural: {rural_approved:.2f}%
- Semiurban: {semiurban_approved:.2f}%
"""

data_context = create_data_summary(df)

prompt = f"""You are a data analysis expert.

Dataset Summary:
{data_context}

Actual Statistics:
{comparison_stats}

User Question: '{comparison_query}'

Provide a detailed comparison with insights."""

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,
        max_tokens=500
    )
    print("Comparison Analysis:")
    print("=" * 60)
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error: {e}")


## Best Practices

### 1. Prompt Engineering

- **Be Specific**: Clearly define the agent's role and capabilities
- **Provide Context**: Give enough information without overwhelming
- **Set Expectations**: Tell the model what format you want
- **Use Examples**: Show the model what good responses look like
- **Iterate**: Refine prompts based on results

### 2. Cost Optimization

- **Use Appropriate Models**: Use cheaper models for simple tasks
- **Limit Token Usage**: Set reasonable max_tokens
- **Cache Responses**: Avoid redundant API calls
- **Batch Processing**: Group similar queries when possible
- **Monitor Usage**: Track your API usage regularly

### 3. Error Handling

- **Always Use Try-Except**: Wrap API calls in error handling
- **Implement Retries**: Use exponential backoff for transient errors
- **Validate Inputs**: Check queries before sending to API
- **Provide Fallbacks**: Have backup responses for errors
- **Log Errors**: Keep track of what went wrong

### 4. Security

- **Never Commit API Keys**: Use environment variables
- **Use .gitignore**: Exclude .env files from version control
- **Rotate Keys**: Change API keys periodically
- **Set Usage Limits**: Configure limits in OpenAI dashboard
- **Monitor Access**: Check for unauthorized usage

### 5. Performance

- **Optimize Prompts**: Shorter prompts = faster + cheaper
- **Use Streaming**: For long responses, use streaming API
- **Parallel Requests**: Process multiple queries concurrently (with rate limit awareness)
- **Response Caching**: Cache identical queries


## Summary

### What We've Learned

1. **OpenAI API Setup**: How to get and configure your API key
2. **Basic AI Agent**: Creating a simple agent that answers questions about data
3. **Data Summarization**: Efficiently summarizing datasets for AI consumption
4. **Prompt Engineering**: Crafting effective prompts for better responses
5. **Error Handling**: Building robust agents that handle edge cases
6. **Advanced Features**: Enhanced agents with better capabilities
7. **Troubleshooting**: Common issues and their solutions

### Key Takeaways

- AI agents can understand context and provide natural language responses
- Proper prompt engineering is crucial for good results
- Error handling and validation are essential for production use
- Cost management is important when using paid APIs
- Security best practices protect your API keys and data

### Next Steps

1. **Experiment**: Try different prompts and models
2. **Extend**: Add more features like caching, logging, etc.
3. **Deploy**: Create a web interface or API for your agent
4. **Optimize**: Improve performance and reduce costs
5. **Learn More**: Explore other OpenAI features (embeddings, fine-tuning, etc.)

### Resources

- [OpenAI API Documentation](https://platform.openai.com/docs)
- [OpenAI Python SDK](https://github.com/openai/openai-python)
- [OpenAI Pricing](https://openai.com/pricing)
- [Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)

---

## Conclusion

Congratulations! You've learned how to build an AI agent using the OpenAI API. You now have the knowledge to:
- Set up and configure the OpenAI API
- Build intelligent agents that understand data
- Handle errors and edge cases
- Optimize for cost and performance
- Troubleshoot common issues

Keep experimenting and building! The field of AI agents is rapidly evolving, and there's always more to learn.

**Happy Coding! üöÄ**
