# Simple agent system for PDF documents

This notebook demonstrates how to create a simple chatbot that can either generate images or answer questions.

1. Load the necessary libraries and environment variables.
2. Create the textual chatbot.
3. Create the image generation chatbot.
4. Ask the user for input and generate a response.


Please try not to run this notebook too many times for the same thing as each image generation costs some credits.

## Prerequisites

Before running this notebook, you need:
- An OpenAI API key

In [None]:
# Install required packages
!pip install llama-index llama-index-llms-openai python-dotenv openai nest-asyncio nbconvert requests

# Verify installations
import importlib

def check_package(package_name):
    try:
        importlib.import_module(package_name)
        return True
    except ImportError:
        return False

packages = {
    "llama_index": "llama-index core",
    "llama_index.llms.openai": "llama-index-llms-openai",
    "dotenv": "python-dotenv",
    "openai": "OpenAI API",
    "nest_asyncio": "nest-asyncio", 
    "nbconvert": "nbconvert",
    "requests": "requests",
}

all_installed = True
for package, display_name in packages.items():
    installed = check_package(package)
    print(f"{display_name}: {'✅ Installed' if installed else '❌ Not installed'}")
    all_installed = all_installed and installed

if all_installed:
    print("\n✅ All required packages are installed!")
else:
    print("\n⚠️ Some packages are missing. Run the installation command again.")

In [None]:
import os
from dotenv import load_dotenv
import nest_asyncio

# Apply nest_asyncio to allow nested event loops (needed for some async operations)
nest_asyncio.apply()

# Load environment variables from .env file
load_dotenv()

# Get API keys from environment variables or set them directly
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
# If environment variables are not loaded, you can set them here
# OPENAI_API_KEY = "your-openai-api-key"

# Set environment variables
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY or ""

# Verify API key is set
if not OPENAI_API_KEY:
    print("⚠️ Warning: OPENAI_API_KEY is not set")
else:
    print("✅ API key is set")

# Configure the 2 LLMs

Setup the OpenAI language model to be used by our LLMs.

## Textual LLM

In [None]:
from llama_index.llms.openai import OpenAI

textualChatbot = OpenAI(model="gpt-4.1-nano", api_key=OPENAI_API_KEY)

response = textualChatbot.complete("Hello, I am a language model. ")
print("LLM Test Response:", response.text)

## Image chatbot

This section creates a chatbot that can generate images based on user input. It uses the OpenAI API to generate images and display them in the notebook.

This section is not built with LlamaIndex, but uses the default OpenAI API to generate images. It is a simple chatbot that can generate images based on user input. It uses the OpenAI API to generate images and display them in the notebook.

In [None]:
from openai import OpenAI
import os
import datetime
import base64
from IPython.display import Image, display

# Create an OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

# Step 1: Create the prompt as a variable
prompt = (
    "Create an artistic illustration of OpenAI and LlamaIndex working together. "
    "The OpenAI logo (a hexagonal knot) is connected with digital circuits to a "
    "stylized llama representing LlamaIndex, all in a futuristic AI research lab."
)

# Step 2: Generate image using OpenAI GPT-Image-1
print("Generating image with GPT-Image-1...")
image_response = client.images.generate(
    model="gpt-image-1",
    prompt=prompt,
    n=1,
    quality="medium",
    size="1024x1024",
)

# Step 3: Get the base64 encoded image and decode it
image_base64 = image_response.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Step 4: Save the image to a file with timestamp
# Create timestamp for filename
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
image_dir = "image"
os.makedirs(image_dir, exist_ok=True)  # Create the image directory if it doesn't exist
image_path = os.path.join(image_dir, f"prompt-{timestamp}.png")

# Save the image directly from the bytes
with open(image_path, "wb") as f:
    f.write(image_bytes)
print(f"Image saved to {image_path}")

# Display the image in the notebook if in an IPython environment
try:
    display(Image(image_path))
except ImportError:
    print("IPython not available for displaying the image")


# Combine the 2 LLMs with a basic ask the user


In [None]:
import os
import datetime
import base64
from openai import OpenAI

# Step 1: Ask the user for the mode of the chatbot
def get_chatbot_mode():
    mode = input("Enter the mode of the chatbot (textual/visual): ").strip().lower()
    if mode not in ["textual", "visual"]:
        print("Invalid mode. Please enter 'textual' or 'visual'.")
        return get_chatbot_mode()
    return mode
chatbot_mode = get_chatbot_mode()

# Step 2: Launch the desired chatbot mode based on user input
if chatbot_mode == "textual":
    # Get the prompt from the user
    prompt = input("Enter the prompt for the chatbot: ")
    # perform the query
    response = textualChatbot.complete(prompt)
    # Print the response
    print("Chatbot Response:", response.text)
    
elif chatbot_mode == "visual":
    # Create an OpenAI client
    client = OpenAI(api_key=OPENAI_API_KEY)
    
    # Get the prompt from the user
    prompt = input("Enter the prompt for the visual chatbot: ")
    
    # Generate image using OpenAI GPT-Image-1
    print("Generating image with GPT-Image-1...")
    image_response = client.images.generate(
        model="gpt-image-1",
        prompt=prompt,
        n=1,
        quality="medium",
        size="1024x1024",
    )

    # Get the base64 encoded image and decode it
    image_base64 = image_response.data[0].b64_json
    image_bytes = base64.b64decode(image_base64)

    
    # Create timestamp for filename
    timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    image_dir = "image"
    os.makedirs(image_dir, exist_ok=True)
    
    # Create the image path
    image_path = os.path.join(image_dir, f"prompt-{timestamp}.png")
    
    # Save the image directly from the bytes
    with open(image_path, "wb") as f:
        f.write(image_bytes)
    print(f"Image saved to {image_path}")
    
    # Display the image in the notebook if in an IPython environment
    try:
        from IPython.display import Image, display
        display(Image(image_path))
    except ImportError:
        print("IPython not available for displaying the image")



# Bonus: Agent-Based Implementation with Tools

This bonus section demonstrates how to build a more sophisticated agent that can automatically decide whether to generate text or images based on the user's query. The agent uses LlamaIndex's function calling capabilities to select the appropriate tool.

Features:
- **Intelligent tool selection**: The agent analyzes the user query and automatically chooses between text generation and image creation
- **Function calling**: Uses LlamaIndex's function calling mechanism
- **Structured responses**: Properly formatted outputs for both text and image generation

In [None]:
# Install additional packages for agent functionality
!pip install llama-index-core llama-index-agent-openai

# Verify agent-related packages
import importlib

def check_agent_package(package_name):
    try:
        importlib.import_module(package_name)
        return True
    except ImportError:
        return False

agent_packages = {
    "llama_index.core.tools": "llama-index core tools",
    "llama_index.agent.openai": "llama-index-agent-openai",
}

print("Checking agent packages:")
for package, display_name in agent_packages.items():
    installed = check_agent_package(package)
    print(f"{display_name}: {'✅ Installed' if installed else '❌ Not installed'}")

print("\n✅ Agent packages check complete!")

## Creating the Tools

We'll create two tools:
1. **Text Generation Tool**: Generates text responses using the OpenAI language model
2. **Image Generation Tool**: Creates images based on text prompts using OpenAI's image generation API

In [None]:
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI

# Create the text generation function
def generate_text_response(query: str) -> str:
    """
    Generate a text response to the user's query using OpenAI's language model.
    
    Args:
        query (str): The user's question or prompt
        
    Returns:
        str: Generated text response
    """
    try:
        # Create the OpenAI LLM instance for text generation
        text_llm = OpenAI(model="gpt-4.1-nano", api_key=OPENAI_API_KEY)
        response = text_llm.complete(query)
        return f"Text Response: {response.text}"
    except Exception as e:
        return f"Error generating text response: {str(e)}"

# Create the text generation tool
text_tool = FunctionTool.from_defaults(
    fn=generate_text_response,
    name="text_generator",
    description="Generate detailed text responses, explanations, stories, or answer questions. Use this for queries that require textual information, explanations, or conversational responses."
)

print("✅ Text generation tool created successfully!")

In [None]:
import os
import datetime
import base64
import openai  # Import the openai module instead of the class

# Create the image generation function
def generate_image(prompt: str) -> str:
    """
    Generate an image based on the provided text prompt using OpenAI's image generation model.
    
    Args:
        prompt (str): Description of the image to generate
        
    Returns:
        str: Path to the generated image file and confirmation message
    """
    try:
        # Create OpenAI client using the module
        client = openai.OpenAI(api_key=OPENAI_API_KEY)
        
        # Generate image using OpenAI
        print(f"Generating image for prompt: '{prompt}'...")
        image_response = client.images.generate(
            model="gpt-image-1",
            prompt=prompt,
            n=1,
            quality="medium",
            size="1024x1024",
        )
        print("Image generation completed.")
        
        # Get the base64 encoded image and decode it
        image_base64 = image_response.data[0].b64_json
        image_bytes = base64.b64decode(image_base64)
        
        # Create timestamp for filename
        timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        image_dir = "image"
        os.makedirs(image_dir, exist_ok=True)
        
        # Create the image path
        image_path = os.path.join(image_dir, f"agent-generated-{timestamp}.png")
        
        # Save the image
        with open(image_path, "wb") as f:
            f.write(image_bytes)
        
        # Try to display the image in the notebook
        try:
            from IPython.display import Image, display
            display(Image(image_path))
        except ImportError:
            pass
        
        return f"Image generated successfully! Saved to: {image_path}"
        
    except Exception as e:
        return f"Error generating image: {str(e)}"

# Create the image generation tool
image_tool = FunctionTool.from_defaults(
    fn=generate_image,
    name="image_generator", 
    description="Generate images, artwork, illustrations, or visual content based on text descriptions. Use this for requests that ask for visual content, pictures, images, drawings, or artwork."
)

print("✅ Image generation tool created successfully!")

## Creating the Intelligent Agent

Now we'll create an agent that can automatically choose between text and image generation based on the user's query. The agent will analyze the intent and select the appropriate tool.

In [None]:
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI

# Create the LLM for the agent (with function calling capabilities)
agent_llm = OpenAI(
    model="gpt-4.1-nano", 
    api_key=OPENAI_API_KEY,
    temperature=0.3  
)

# Create the agent with both tools
agent = OpenAIAgent.from_tools(
    tools=[text_tool, image_tool],
    llm=agent_llm,
    verbose=True,
    system_prompt="""You are an intelligent assistant that can either generate text responses or create images based on user queries.

Analyze the user's request carefully and choose the appropriate tool:

1. Use the TEXT GENERATOR tool for:
   - Questions requiring explanations, information, or analysis
   - Requests for stories, essays, or written content
   - Conversational responses
   - Technical or factual questions
   - Any query that needs a textual response

2. Use the IMAGE GENERATOR tool for:
   - Requests to create, draw, generate, or make images/pictures
   - Descriptions that ask for visual content or artwork
   - Requests containing words like "image", "picture", "draw", "create art", "visualize", "illustration"
   - Any query that explicitly asks for visual output

Always choose the most appropriate tool based on the user's intent. If the user asks for both text and images, prioritize based on the primary intent of their message."""
)

print("✅ Intelligent agent created successfully!")
print("Agent can now automatically choose between text and image generation!")

## Testing the Agent

Let's test the agent with some example queries to see how it automatically selects the appropriate tool:

In [None]:
# Test 1: Text-based query
print("Text Query")
print("=" * 50)
text_query = "Explain the benefits of artificial intelligence in education"
text_response = agent.chat(text_query)
print(f"Query: {text_query}")
print(f"Response: {text_response}")
print("\n")

In [None]:
# Test 2: Image generation query
print("Image Query")
print("=" * 50)
image_query = "Create an image of a futuristic classroom with AI teaching assistants"
# Uncomment the next line to test image generation
# image_response = agent.chat(image_query)
print(f"Query: {image_query}")
print(f"Response: {image_response}")
print("\n")

## Interactive Agent Usage

Now you can interact with the agent! It will automatically determine whether your query needs text or image generation:

In [None]:
# Interactive agent session
def chat_with_agent():
    """
    Interactive function to chat with the intelligent agent.
    The agent will automatically choose between text and image generation.
    """
    print("Welcome to the Intelligent Agent!")
    print("I can generate text responses or create images based on your queries.")
    print("Type 'quit' to exit.\n")
    
    while True:
        # Get user input
        user_query = input("Your query: ").strip()
        
        # Check for exit condition
        if user_query.lower() in ['quit', 'exit', 'bye']:
            print("Goodbye! Thanks for using the intelligent agent!")
            break
            
        if not user_query:
            print("Please enter a valid query.")
            continue
            
        try:
            # Let the agent process the query and choose the appropriate tool
            print(f"\nProcessing: '{user_query}'")
            print("-" * 60)
            
            response = agent.chat(user_query)
            print(f"\nAgent Response: {response}")
            print("=" * 60)
            print()
            
        except Exception as e:
            print(f"❌ Error processing query: {str(e)}")
            print()

# Run the interactive session
# Uncomment the line below to start the interactive chat
#chat_with_agent()

## Summary of Bonus Agent Features

### What We Built
This bonus section demonstrates an advanced **intelligent agent system** that automatically chooses between text and image generation tools based on user queries.
