# ‚úàÔ∏è FlightAI Multimodal Assistant

## Why I Built This

Customer service in the airline industry has always been a challenge. Travelers need quick answers, but traditional systems require navigating complex menus, waiting on hold, or filling out lengthy forms. 

What if customers could simply **talk** to an AI assistant that:
- Understands their needs naturally
- Searches real-time flight data from PostgreSQL
- Shows them destinations visually with AI-generated images
- Speaks responses back for hands-free access

That's exactly what **FlightAI Multimodal Assistant** does. Built during the **Andela LLM Engineering program**, this project demonstrates how AI can revolutionize customer interactions in the travel industry.

Instead of:
- ‚ùå Navigating complex booking systems
- ‚ùå Waiting for human agents  
- ‚ùå Searching through multiple pages
- ‚ùå Reading long text responses

Customers can now:
- ‚úÖ Ask naturally: *"What flights go to Paris?"*
- ‚úÖ Book instantly: *"Book a flight to Tokyo for John Doe"*
- ‚úÖ See destinations: *"Show me what London looks like"*
- ‚úÖ Hear responses: *"Read me the flight information with audio"*

This isn't just a demo‚Äîit's a blueprint for the future of customer service in travel.

---

## What This Does

FlightAI helps travelers find and book flights through natural conversation. You tell it what you need, and it:
- üîç Searches the PostgreSQL database intelligently
- üí≥ Books flights directly through conversation
- üñºÔ∏è Generates beautiful destination images
- üîä Speaks flight information back to you

**Tech:** OpenAI GPT-4o-mini (via OpenRouter) ‚Ä¢ DALL-E ‚Ä¢ TTS ‚Ä¢ PostgreSQL ‚Ä¢ Gradio UI ‚Ä¢ Function Calling

---

**Note:** This is a demonstration. Image and audio generation require OpenAI API access (may work via OpenRouter depending on your plan).

## Step 1: Dependencies

We need `psycopg2-binary` to connect to PostgreSQL. This cell installs it if missing.

**Note:** If installation succeeds, restart the kernel before continuing.

In [None]:
# Install psycopg2-binary if needed
import sys
import subprocess

# Check if already installed
try:
    import psycopg2
    print("‚úì psycopg2-binary is already installed")
except ImportError:
    print("Installing psycopg2-binary...")
    result = subprocess.run(
        [sys.executable, "-m", "pip", "install", "psycopg2-binary", "--break-system-packages", "--quiet"],
        capture_output=True,
        text=True
    )
    if result.returncode == 0:
        print("‚úì Installation successful!")
        print("\n‚ö†Ô∏è  IMPORTANT: Restart the kernel now (Kernel ‚Üí Restart Kernel)")
        print("   Then run the import cell again.")
    else:
        print(f"‚úó Installation failed: {result.stderr}")
        raise RuntimeError("Failed to install psycopg2-binary")

## Step 2: Setup

Loading API keys securely, setting up the OpenAI client via OpenRouter, and connecting to PostgreSQL.

**The foundation:** Everything starts here‚ÄîAPI access, database connection, and the AI client.

In [None]:
# Imports and initialization

import os
import json
import sys
import re
import tempfile
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr

# Import psycopg2 - ensure kernel is using the correct Python environment
try:
    import psycopg2
    print("‚úì psycopg2 imported successfully")
except ImportError:
    print(f"‚ö†Ô∏è  psycopg2 not found in current Python: {sys.executable}")
    print("\nüìã To fix this:")
    print("1. Run the installation cell above")
    print("2. ‚ö†Ô∏è  RESTART THE KERNEL (Kernel ‚Üí Restart Kernel) - This is required!")
    print("3. Then run this cell again")
    raise ImportError(
        "psycopg2-binary is not available. "
        "Please run the installation cell, then RESTART THE KERNEL, then run this cell again."
    )

load_dotenv(override=True)

# OpenRouter configuration
api_key = os.getenv('OPENROUTER_API_KEY')
base_url = os.getenv('OPENROUTER_BASE_URL', 'https://openrouter.ai/api/v1')
MODEL = os.getenv('OPENROUTER_MODEL', 'openai/gpt-4o-mini')

if api_key:
    print(f"‚úì OpenRouter API Key loaded (begins with {api_key[:8]}...)")
else:
    print("‚ö†Ô∏è  OpenRouter API Key not set")

client = OpenAI(base_url=base_url, api_key=api_key)

# Database connection helper
def get_db_connection():
    """Create a connection to PostgreSQL database"""
    return psycopg2.connect(
        host=os.getenv('DB_HOST', 'localhost'),
        port=os.getenv('DB_PORT', '5432'),
        database=os.getenv('DB_NAME', 'andela_ai_engineering_bootcamp'),
        user=os.getenv('DB_USER', 'postgres'),
        password=os.getenv('DB_PASSWORD')
    )

print("‚úì Setup complete")

## Step 3: Database Functions

These are the core functions that interact with PostgreSQL. When someone asks *"What flights go to Paris?"*, the AI translates that into:
```python
get_ticket_price(destination_city="Paris")
```

**The magic:** The AI figures out which function to call and what parameters to use - no rigid commands needed.

In [None]:
# Database functions

def get_ticket_price(destination_city):
    """Search for available flights to a destination city"""
    conn = get_db_connection()
    cur = conn.cursor()
    cur.execute("""
        SELECT flight_number, origin, destination, price, departure_time, arrival_time
        FROM flights
        WHERE LOWER(destination) = LOWER(%s)
        ORDER BY departure_time
        LIMIT 5
    """, (destination_city,))
    flights = cur.fetchall()
    cur.close()
    conn.close()
    
    if flights:
        result = f"Flights to {destination_city}:\n"
        for flight in flights:
            result += f"- {flight[0]}: {flight[1]} ‚Üí {flight[2]}, ${float(flight[3])}, Departs: {flight[4]}\n"
        return result
    return f"Sorry, we don't have flights to {destination_city} available."

def book_flight(destination_city, passenger_name):
    """Book a flight for a passenger"""
    conn = get_db_connection()
    cur = conn.cursor()
    
    cur.execute("""
        SELECT flight_id, flight_number, price
        FROM flights
        WHERE LOWER(destination) = LOWER(%s)
        ORDER BY departure_time
        LIMIT 1
    """, (destination_city,))
    flight = cur.fetchone()
    
    if flight:
        flight_id, flight_number, price = flight
        cur.execute("""
            INSERT INTO bookings (flight_id, passenger_name, status)
            VALUES (%s, %s, 'confirmed')
            RETURNING booking_id
        """, (flight_id, passenger_name))
        booking_id = cur.fetchone()[0]
        conn.commit()
        cur.close()
        conn.close()
        return f"Booking confirmed! {passenger_name}, your flight {flight_number} to {destination_city} is reserved. Booking ID: {booking_id}, Price: ${float(price)}"
    
    cur.close()
    conn.close()
    return f"Sorry, we cannot book flights to {destination_city} at this time."

print("‚úì Database functions defined")

## Step 4: Multimodal Functions

**Images:** When customers want to see destinations, we generate beautiful travel images using DALL-E.

**Audio:** The AI can speak its responses using OpenAI's text-to-speech API.

**The experience:** Multimodal responses make interactions richer - text, images, and audio all working together.

In [None]:
# Multimodal functions

def generate_destination_image(destination_city, description=""):
    """Generate an image of a destination city using DALL-E"""
    try:
        prompt = f"Beautiful travel destination image of {destination_city}, professional photography, vibrant colors, travel brochure style"
        if description:
            prompt += f", {description}"
        
        # Use OpenAI's image generation API
        try:
            response = client.images.generate(
                prompt=prompt,
                size="1024x1024",
                quality="standard",
                n=1
            )
            image_url = response.data[0].url
            return f"IMAGE_URL:{image_url}"
        except Exception as e:
            # Fallback: return a placeholder message
            return f"Image generation requested for {destination_city}. (Note: Image generation may require direct OpenAI API access)"
    except Exception as e:
        return f"Error generating image: {str(e)}"

def generate_audio_response(text):
    """Generate audio from text using TTS"""
    try:
        # Try using OpenAI's TTS API directly (may need separate OpenAI client)
        # First, try with the current client (OpenRouter)
        try:
            response = client.audio.speech.create(
                model="tts-1",
                voice="alloy",
                input=text[:500]  # Limit text length
            )
        except Exception as api_error:
            # If OpenRouter doesn't support TTS, try direct OpenAI API
            openai_api_key = os.getenv('OPENAI_API_KEY')
            if openai_api_key:
                from openai import OpenAI as OpenAIClient
                openai_client = OpenAIClient(api_key=openai_api_key)
                response = openai_client.audio.speech.create(
                    model="tts-1",
                    voice="alloy",
                    input=text[:500]
                )
            else:
                raise api_error
        
        # Save audio to a temporary file
        temp_dir = tempfile.gettempdir()
        audio_path = os.path.join(temp_dir, f"audio_{abs(hash(text)) % 100000}.mp3")
        
        response.stream_to_file(audio_path)
        
        # Verify file was created
        if os.path.exists(audio_path) and os.path.getsize(audio_path) > 0:
            return f"AUDIO_PATH:{audio_path}"
        else:
            return f"Error: Audio file was not created properly"
            
    except Exception as e:
        error_msg = str(e)
        # Provide helpful error message
        if "audio" in error_msg.lower() or "speech" in error_msg.lower():
            return f"Error: TTS API not available. Please ensure you have OpenAI API access for audio generation. Error: {error_msg}"
        return f"Error generating audio: {error_msg}"

print("‚úì Multimodal functions defined")

## Step 5: Teaching the AI to Use Tools

Instead of the AI just talking, we teach it to actually *search* the database, *book* flights, *generate* images, and *speak* responses.

This schema defines all the tools the AI can use. When someone says *"Show me flights to Paris"*, the AI translates that into:
```python
get_ticket_price(destination_city="Paris")
```

**This was the breakthrough.** No more writing custom queries - the AI figures it out.

In [None]:
# Tool definitions

price_tool = {
    "type": "function",
    "function": {
        "name": "get_ticket_price",
        "description": "Get available flights and prices to a destination city from the database.",
        "parameters": {
            "type": "object",
            "properties": {
                "destination_city": {
                    "type": "string",
                    "description": "The destination city name"
                }
            },
            "required": ["destination_city"]
        }
    }
}

booking_tool = {
    "type": "function",
    "function": {
        "name": "book_flight",
        "description": "Book a flight to a destination for a passenger. Creates a booking record in the database.",
        "parameters": {
            "type": "object",
            "properties": {
                "destination_city": {
                    "type": "string",
                    "description": "The destination city name"
                },
                "passenger_name": {
                    "type": "string",
                    "description": "Full name of the passenger"
                }
            },
            "required": ["destination_city", "passenger_name"]
        }
    }
}

image_tool = {
    "type": "function",
    "function": {
        "name": "generate_destination_image",
        "description": "Generate a beautiful travel image of a destination city. Use this when customers ask to see images of destinations or want visual inspiration.",
        "parameters": {
            "type": "object",
            "properties": {
                "destination_city": {
                    "type": "string",
                    "description": "The destination city name to generate an image for"
                },
                "description": {
                    "type": "string",
                    "description": "Optional additional description for the image"
                }
            },
            "required": ["destination_city"]
        }
    }
}

audio_tool = {
    "type": "function",
    "function": {
        "name": "generate_audio_response",
        "description": "Generate audio narration from text. ALWAYS use this tool when customers request audio, say 'read', 'speak', 'tell me with audio', 'audio version', or any variation asking for spoken/heard information. Pass the complete response text that should be spoken.",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {
                    "type": "string",
                    "description": "The complete text response to convert to speech. Include all relevant flight information, prices, and details."
                }
            },
            "required": ["text"]
        }
    }
}

tools = [price_tool, booking_tool, image_tool, audio_tool]

print(f"‚úì {len(tools)} tools registered")

## Step 6: The AI's Instructions

Here's where prompt engineering matters *a lot*.

The AI needs to know:
- When to use which tool (search, book, generate image, generate audio)
- How to interpret natural language requests
- To be proactive with multimodal features when helpful
- To always generate audio when explicitly requested

**The lesson:** Clear instructions = better tool usage. This prompt guides the AI to use tools effectively.

In [None]:
# System prompt

system_message = """You are a helpful assistant for FlightAI airline.
Give short, courteous answers. Always be accurate. If you don't know something, say so.

IMPORTANT: When customers request audio, "read", "speak", "tell me with audio", or similar phrases, you MUST use the generate_audio_response tool to create an audio version of your response.

Use the available tools to:
- Check prices and book flights when customers ask
- Generate images of destinations when customers want to see visual inspiration
- ALWAYS generate audio when customers explicitly request audio, "read", "speak", or "tell me with audio"
Be proactive in offering images and audio when they would enhance the customer experience."""

# Tool call handler

def handle_tool_calls(message):
    """Execute tool calls and return formatted responses"""
    responses = []
    for tool_call in message.tool_calls:
        func_name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)
        
        if func_name == "get_ticket_price":
            result = get_ticket_price(args["destination_city"])
        elif func_name == "book_flight":
            result = book_flight(args["destination_city"], args["passenger_name"])
        elif func_name == "generate_destination_image":
            result = generate_destination_image(
                args["destination_city"],
                args.get("description", "")
            )
        elif func_name == "generate_audio_response":
            result = generate_audio_response(args["text"])
        else:
            result = "Unknown tool"
        
        responses.append({
            "role": "tool",
            "content": result,
            "tool_call_id": tool_call.id
        })
    return responses

print("‚úì System prompt and tool handler defined")

## Step 7: Making it Work (and Not Crash)

This is the engine room. When the AI wants to search, book, or generate content, this code:
1. Validates the request
2. Calls the right function
3. Handles errors gracefully (no crashes!)
4. Extracts multimodal content (images, audio)
5. Returns a structured response

**Defensive programming:** Things break. This code expects problems and handles them elegantly.

In [None]:
# Main chat function

def chat(message, history):
    """Process user message and return multimodal response"""
    history = [{"role": h["role"], "content": h["content"]} for h in history]
    messages = [{"role": "system", "content": system_message}] + history + [{"role": "user", "content": message}]
    
    # Check if user explicitly requests audio
    audio_keywords = ["audio", "read", "speak", "tell me with audio", "audio version", "hear"]
    user_wants_audio = any(keyword in message.lower() for keyword in audio_keywords)
    
    response = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
    
    image_urls = []
    audio_paths = []
    
    # Handle tool calls iteratively
    while response.choices[0].finish_reason == "tool_calls":
        tool_message = response.choices[0].message
        tool_responses = handle_tool_calls(tool_message)
        
        # Extract image URLs and audio paths from tool responses
        for tool_resp in tool_responses:
            content = tool_resp.get("content", "")
            if content.startswith("IMAGE_URL:"):
                image_urls.append(content.replace("IMAGE_URL:", ""))
            elif content.startswith("AUDIO_PATH:"):
                audio_paths.append(content.replace("AUDIO_PATH:", ""))
        
        messages.append(tool_message)
        messages.extend(tool_responses)
        response = client.chat.completions.create(model=MODEL, messages=messages, tools=tools)
    
    # Extract text response
    text_response = response.choices[0].message.content or ""
    
    # If user requested audio but no audio was generated, generate it now
    audio_error = None
    if user_wants_audio and not audio_paths and text_response:
        try:
            audio_result = generate_audio_response(text_response)
            if audio_result.startswith("AUDIO_PATH:"):
                audio_paths.append(audio_result.replace("AUDIO_PATH:", ""))
            elif audio_result.startswith("Error"):
                audio_error = audio_result
        except Exception as e:
            audio_error = f"Error generating audio: {str(e)}"
    
    # Check tool responses for audio errors
    for msg in messages:
        if msg.get("role") == "tool":
            content = msg.get("content", "")
            if "Error" in content and ("audio" in content.lower() or "tts" in content.lower()):
                audio_error = content
    
    # Return multimodal response
    result = {"text": text_response}
    if image_urls:
        result["images"] = image_urls
    if audio_paths:
        result["audio"] = audio_paths[0]  # Return first audio file
    if audio_error:
        result["audio_error"] = audio_error
    
    return result

print("‚úì Chat function defined")

## Step 8: The User Interface

Finally, we build the Gradio interface. This is where customers interact with the AI.

**The experience:** Clean, modern UI that handles text, images, and audio seamlessly. Everything comes together here.

In [None]:
# Gradio interface

with gr.Blocks(title="FlightAI Assistant") as demo:
    gr.Markdown("# FlightAI Customer Support Assistant")
    gr.Markdown("**Multimodal Features:** Generate images of destinations and audio responses!")
    
    chatbot = gr.Chatbot(
        type="messages",
        height=600,
        show_label=False
    )
    
    msg = gr.Textbox(
        label="Message",
        placeholder="Try: 'Show me an image of Paris' or 'Tell me about flights to Tokyo with audio'...",
        scale=4
    )
    
    submit_btn = gr.Button("Send", variant="primary", scale=1)
    clear_btn = gr.Button("Clear", scale=1)
    
    def respond(message, chat_history):
        """Handle user message and format multimodal response"""
        if not message:
            return chat_history, ""
        
        # Convert Gradio history format
        history = []
        for msg_item in chat_history:
            if msg_item["role"] == "user":
                history.append({"role": "user", "content": msg_item["content"]})
            elif msg_item["role"] == "assistant":
                history.append({"role": "assistant", "content": msg_item["content"]})
        
        # Get response
        response = chat(message, history)
        
        # Format response with multimodal content
        if isinstance(response, dict):
            text = response.get("text", "")
            images = response.get("images", [])
            audio_path = response.get("audio")
            audio_error = response.get("audio_error")
            
            # Build response content
            content_parts = [text] if text else []
            
            # Add images
            if images:
                content_parts.append("\n\n**üñºÔ∏è Generated Images:**")
                for img_url in images:
                    content_parts.append(f"![Destination Image]({img_url})")
            
            # Add audio with HTML audio player
            if audio_path and os.path.exists(audio_path):
                content_parts.append(f"\n\n**üîä Audio Response:**")
                content_parts.append(f'<audio controls><source src="file://{audio_path}" type="audio/mpeg">Your browser does not support the audio element.</audio>')
                content_parts.append(f"\n*Audio file: {audio_path}*")
            elif audio_error:
                # Show error message if audio generation failed
                content_parts.append(f"\n\n**‚ö†Ô∏è Audio Generation:**")
                content_parts.append(f"*{audio_error}*")
                content_parts.append("\n*Note: Audio generation requires OpenAI API access. You can add OPENAI_API_KEY to your .env file for direct TTS access.*")
            
            final_content = "\n".join(content_parts)
            
            chat_history.append({"role": "user", "content": message})
            chat_history.append({"role": "assistant", "content": final_content})
        else:
            chat_history.append({"role": "user", "content": message})
            chat_history.append({"role": "assistant", "content": str(response)})
        
        return chat_history, ""
    
    msg.submit(respond, [msg, chatbot], [chatbot, msg])
    submit_btn.click(respond, [msg, chatbot], [chatbot, msg])
    clear_btn.click(lambda: ([], ""), None, [chatbot, msg])

print("‚úì Gradio interface ready")
print("\nüöÄ Launching FlightAI Assistant...")
demo.launch()

---

## Conclusion

### The Journey

Building this multimodal airline assistant taught me something important: **AI doesn't just automate tasks‚Äîit reimagines how people interact with complex systems.**

I started this project thinking about the frustration of booking flights: multiple tabs, confusing forms, waiting on hold. What if customers could simply *ask* for what they need and get it instantly?

Through the Andela LLM Engineering program, I learned that combining:
- **Natural language understanding** (GPT-4o via OpenRouter)
- **Real-time database access** (PostgreSQL)
- **Multimodal responses** (text, images, audio)
- **Intelligent tool calling**

...creates something that feels less like software and more like a helpful travel agent.

### What Surprised Me

1. **How well tool calling works** - The AI consistently chooses the right tools without explicit instructions
2. **The power of multimodal responses** - Adding images and audio transformed a simple chatbot into an engaging experience
3. **Database integration is seamless** - PostgreSQL + AI = natural language database interface
4. **Error handling makes all the difference** - Graceful failures keep the experience smooth

### Technical Highlights

- **Function calling** eliminated rigid command structures
- **PostgreSQL integration** enabled real-time flight data access
- **Multimodal capabilities** (DALL-E images, TTS audio) created richer interactions
- **Defensive programming** ensured robustness even when APIs fail
- **Gradio interface** made it accessible and user-friendly

### Future Directions

If I were to take this further:
- **Voice input** - Let customers call and speak their requests
- **Real airline APIs** - Connect to live flight data and booking systems
- **Payment integration** - Enable actual flight purchases
- **Multi-language support** - Serve international travelers in their native languages
- **Personalization** - Remember preferences and booking history
- **Mobile optimization** - Make it work seamlessly on phones

### The Broader Impact

This pattern extends far beyond airlines. Imagine:
- **Hotels** - "Find me a beachfront room in Miami under $200"
- **Restaurants** - "Book a table for 4 at an Italian place tonight"
- **Events** - "Get me 2 tickets to the concert this weekend"
- **Services** - "Schedule a haircut for tomorrow afternoon"

**The common thread:** Natural conversation beats forms every time.

### Final Thoughts

This project showed me that AI + good engineering = solutions that feel magical but are built on solid foundations. The database does the heavy lifting, AI makes it accessible, and multimodal features make it delightful.

**For the travel industry and beyond:** This is a blueprint for how AI can transform customer service‚Äîmaking complex systems feel simple, natural, and human.

---

*Built during Week 2 of the Andela LLM Engineering Program*

**FlightAI Assistant** - Where conversation meets intelligent booking.