# AI Sommelier - Wine Recommendation Agent

This notebook implements an AI Sommelier using OpenAI's Assistants API v2 with file_search capability.

## Features
- **Personalized wine recommendations** based on food, taste preferences, budget, and occasion
- **PDF-grounded responses** - All recommendations cite a wine catalog PDF
- **Stateful conversations** - Multi-turn dialogue with context preservation
- **Classic pairing principles** - Matches intensity, balances acidity, fat, spice, sweetness, and tannin

## Setup Requirements
1. **OpenAI API Key**: Store in Google Colab Secrets as `OPENAI_API_KEY`
   - Click the key icon üîë in the left sidebar
   - Add new secret: Name = `OPENAI_API_KEY`, Value = your API key
2. **Wine PDF**: Upload `Vinhos baba d_urso.pdf` to this Colab session or Google Drive
3. **First-time setup**: Run all cells in order to create the vector store
4. **Subsequent uses**: You can reuse the vector store ID to avoid re-indexing

## Cost Estimates
- Vector store storage: ~$0.10/GB/day
- File search queries: ~$0.03/GB per query
- Model usage: Standard GPT-4o rates apply

---

## 1Ô∏è‚É£ Installation & Imports

Install required packages and import dependencies.

In [None]:
# Install required packages
!pip install -q openai python-dotenv

# Import dependencies
from openai import OpenAI
import time
import json
from google.colab import userdata
import os

# Display versions for reproducibility
import openai
print(f"OpenAI SDK version: {openai.__version__}")
print("‚úÖ Packages installed successfully")

## 2Ô∏è‚É£ Configuration & API Key Setup

Initialize the OpenAI client with your API key from Colab Secrets.

In [None]:
# Retrieve API key from Colab Secrets
try:
    api_key = userdata.get('OPENAI_API_KEY')
    if not api_key:
        raise ValueError("API key is empty")
    print("‚úÖ API key retrieved from Colab Secrets")
except Exception as e:
    print("‚ùå Error: Could not retrieve OPENAI_API_KEY from Colab Secrets")
    print("Please add your OpenAI API key to Colab Secrets:")
    print("  1. Click the key icon üîë in the left sidebar")
    print("  2. Add new secret: Name = 'OPENAI_API_KEY', Value = your API key")
    raise

# Initialize OpenAI client
client = OpenAI(api_key=api_key)

# Configuration constants
MODEL = "gpt-4o"  # Best model for file_search capability
TEMPERATURE = 0.7  # Balanced creativity and consistency
MAX_TOKENS = 2000  # Maximum response length

print(f"‚úÖ OpenAI client initialized with model: {MODEL}")

## 3Ô∏è‚É£ Vector Store Setup (One-Time)

Upload the wine catalog PDF and create a vector store for file_search.

**IMPORTANT**: 
- First run: Leave `VECTOR_STORE_ID = None` to create a new vector store
- After creation, copy the printed ID and paste it here to reuse in future sessions
- This avoids re-indexing costs and setup time

In [None]:
# Configure vector store ID (set to None for first-time setup)
VECTOR_STORE_ID = None  # Replace with your vector store ID after first run, e.g., "vs_abc123..."

# PDF file path - Update this to match your PDF location
WINE_PDF_PATH = "/content/Vinhos baba d_urso.pdf"  # Default Colab upload location
# Alternative: Google Drive path after mounting
# WINE_PDF_PATH = "/content/drive/MyDrive/Vinhos baba d_urso.pdf"

if VECTOR_STORE_ID is None:
    print("üîÑ Creating new vector store...")
    
    # Check if PDF exists
    if not os.path.exists(WINE_PDF_PATH):
        print(f"‚ùå Error: PDF not found at {WINE_PDF_PATH}")
        print("\nPlease upload 'Vinhos baba d_urso.pdf' using one of these methods:")
        print("  1. Drag and drop the PDF into the Files panel (left sidebar)")
        print("  2. Mount Google Drive and update WINE_PDF_PATH variable")
        print("\nTo mount Google Drive, run: from google.colab import drive; drive.mount('/content/drive')")
        raise FileNotFoundError(f"Wine catalog PDF not found at {WINE_PDF_PATH}")
    
    # Create vector store
    vector_store = client.beta.vector_stores.create(
        name="Wine Catalog - Vinhos Baba d'Urso"
    )
    
    # Upload PDF to vector store
    with open(WINE_PDF_PATH, "rb") as pdf_file:
        file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
            vector_store_id=vector_store.id,
            files=[pdf_file]
        )
    
    print(f"‚úÖ Vector store created successfully!")
    print(f"\nüìã SAVE THIS ID FOR FUTURE USE:")
    print(f"   VECTOR_STORE_ID = \"{vector_store.id}\"")
    print(f"\nFile batch status: {file_batch.status}")
    print(f"Files processed: {file_batch.file_counts.completed}/{file_batch.file_counts.total}")
    
    VECTOR_STORE_ID = vector_store.id
else:
    print(f"‚ôªÔ∏è  Reusing existing vector store: {VECTOR_STORE_ID}")
    try:
        vector_store = client.beta.vector_stores.retrieve(VECTOR_STORE_ID)
        print(f"‚úÖ Vector store retrieved: {vector_store.name}")
        print(f"   Files: {vector_store.file_counts.completed}")
    except Exception as e:
        print(f"‚ùå Error retrieving vector store: {e}")
        print("   Set VECTOR_STORE_ID = None to create a new one")
        raise

## 4Ô∏è‚É£ AI Sommelier Agent Definition

Create the sommelier assistant with instructions and file_search capability.

In [None]:
# Agent instructions from prompt.txt
SOMMELIER_INSTRUCTIONS = """You are an expert AI Sommelier with formal wine education and restaurant-level tasting experience.
Your role is to guide users through personalized wine recommendations, pairings, education, and virtual tastings.
You have access to a knowledge base built from a PDF containing wine catalog entries, tasting notes, regions, grape varieties, pricing, and pairing guidance.
You MUST ground all wine facts and recommendations in passages retrieved from the PDF via file_search.

PRIMARY OBJECTIVE
Provide accurate, practical wine recommendations tailored to the user's food, taste preferences, budget, and occasion.

RETRIEVAL RULES
- Always use file_search before recommending wines.
- Base recommendations ONLY on information present in retrieved PDF content.
- Do NOT invent wines, vintages, regions, prices, or tasting notes.
- If the PDF lacks required info, say what's missing and ask exactly ONE concise follow-up question.

RECOMMENDATION FORMAT
Recommend 2‚Äì4 options. For EACH wine include:
- Wine name (exactly as in the PDF)
- Country / region
- Grape variety or blend
- Style profile: body, acidity, tannin, sweetness
- Why it matches the user's food/preferences
- Serving temperature
- Optional: decanting/glassware note ONLY if supported by the PDF

PAIRING LOGIC
Apply classic pairing principles:
- Match intensity (light with light, bold with bold)
- Balance acidity, fat, spice, sweetness, and tannin
- Spicy food: prioritize acidity, aromatics, and lower alcohol (if supported)
- Fatty/grilled dishes: structure/tannin or acidity as appropriate

CONSTRAINTS
- Respect budget strictly.
- Respect exclusions (e.g., "no sweet", "no heavy oak", allergies).
- Keep the tone polished and concise.
- Do not mention embeddings, vectors, or internal tooling.

FALLBACK
If no exact match exists:
- Recommend the closest stylistic alternatives found in the PDF
- Explain the limitation of the source material
- Ask ONE targeted question

OUTPUT STRUCTURE
- 1‚Äì2 sentence summary
- Bullet list of recommendations
- Optional single follow-up question (only if needed)
"""

# Create the assistant
assistant = client.beta.assistants.create(
    name="AI Sommelier",
    instructions=SOMMELIER_INSTRUCTIONS,
    model=MODEL,
    temperature=TEMPERATURE,
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [VECTOR_STORE_ID]
        }
    }
)

print(f"‚úÖ AI Sommelier assistant created")
print(f"   Assistant ID: {assistant.id}")
print(f"   Model: {assistant.model}")
print(f"   Tools: {[tool.type for tool in assistant.tools]}")

## 5Ô∏è‚É£ Session Management Functions

Functions to manage conversation threads and retrieve responses.

In [None]:
# Global variable to store thread ID
thread_id = None

def create_conversation():
    """Create a new conversation thread."""
    global thread_id
    thread = client.beta.threads.create()
    thread_id = thread.id
    print(f"üÜï New conversation started (Thread ID: {thread_id})")
    return thread_id

def send_message(message):
    """Send a message and get the sommelier's response."""
    global thread_id
    
    if thread_id is None:
        create_conversation()
    
    # Add user message to thread
    client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=message
    )
    
    # Create and poll the run
    run = client.beta.threads.runs.create_and_poll(
        thread_id=thread_id,
        assistant_id=assistant.id,
        timeout=60  # 60 second timeout
    )
    
    # Check run status
    if run.status == 'completed':
        return get_response()
    elif run.status == 'failed':
        return f"‚ùå Error: Run failed - {run.last_error}"
    elif run.status == 'expired':
        return "‚ùå Error: Request timed out. Please try again."
    else:
        return f"‚ùå Unexpected status: {run.status}"

def get_response():
    """Retrieve the latest assistant response from the thread."""
    messages = client.beta.threads.messages.list(
        thread_id=thread_id,
        order="desc",
        limit=1
    )
    
    if messages.data:
        message = messages.data[0]
        if message.role == "assistant":
            return format_response(message)
    
    return "No response received."

def format_response(message):
    """Format the assistant's response, including file_search citations."""
    response_text = ""
    citations = []
    
    # Extract text and annotations
    for content in message.content:
        if content.type == "text":
            text_value = content.text.value
            annotations = content.text.annotations
            
            # Collect citations from file_search
            for idx, annotation in enumerate(annotations):
                if annotation.type == "file_citation":
                    citation_num = len(citations) + 1
                    citations.append(f"[{citation_num}] {annotation.file_citation.quote}")
                    # Replace annotation with citation number
                    text_value = text_value.replace(annotation.text, f" [{citation_num}]")
            
            response_text += text_value
    
    # Append citations if present
    if citations:
        response_text += "\n\nüìö **Sources from wine catalog:**\n"
        response_text += "\n".join(citations)
    
    return response_text

def reset_conversation():
    """Reset the conversation to start fresh."""
    global thread_id
    thread_id = None
    print("üîÑ Conversation reset. Next message will start a new thread.")

def show_conversation_history():
    """Display the full conversation history."""
    if thread_id is None:
        print("No active conversation.")
        return
    
    messages = client.beta.threads.messages.list(
        thread_id=thread_id,
        order="asc"
    )
    
    print("\n" + "="*60)
    print("CONVERSATION HISTORY")
    print("="*60)
    
    for msg in messages.data:
        role = "üßë You" if msg.role == "user" else "üç∑ Sommelier"
        content = msg.content[0].text.value if msg.content else "[No content]"
        print(f"\n{role}:")
        print(content[:500] + ("..." if len(content) > 500 else ""))
    
    print("\n" + "="*60)

print("‚úÖ Session management functions loaded")

## 6Ô∏è‚É£ Interactive Chat Loop

Start chatting with the AI Sommelier!

### Commands:
- Type your wine question or pairing request
- `quit`, `exit`, or `bye` - End conversation
- `history` - Show full conversation
- `reset` - Start a new conversation thread

### Example Queries:
- "Suggest a wine for grilled salmon with lemon, budget under $30"
- "I want a bold red for ribeye steak, no sweet wines"
- "Pair a wine with spicy Thai curry"
- "What's a good Portuguese white wine for summer?"

In [None]:
print("üç∑ AI Sommelier Chat - Ready to recommend wines!\n")
print("Type 'quit' to exit, 'history' to view conversation, 'reset' to start fresh\n")
print("="*60)

# Initialize conversation
if thread_id is None:
    create_conversation()

# Main chat loop
while True:
    try:
        user_input = input("\nüßë You: ").strip()
        
        if not user_input:
            continue
        
        # Handle commands
        if user_input.lower() in ['quit', 'exit', 'bye']:
            print("\nüç∑ Thank you for using AI Sommelier. Cheers! ü•Ç")
            break
        
        if user_input.lower() == 'history':
            show_conversation_history()
            continue
        
        if user_input.lower() == 'reset':
            reset_conversation()
            create_conversation()
            continue
        
        # Send message and get response
        print("\nüç∑ Sommelier: ", end="")
        print("(searching wine catalog...)")
        
        response = send_message(user_input)
        print(f"\n{response}")
        
    except KeyboardInterrupt:
        print("\n\nüç∑ Chat interrupted. Type 'quit' to exit properly.")
    except Exception as e:
        print(f"\n‚ùå Error: {e}")
        print("Please try again or type 'reset' to start a new conversation.")

## 7Ô∏è‚É£ Automated Testing & Examples

Run automated tests to verify the sommelier's capabilities.

In [None]:
# Test queries demonstrating agent capabilities
test_queries = [
    "Suggest a wine for grilled salmon with lemon, budget under $30",
    "I want a bold red for ribeye steak, no sweet wines",
    "Pair a wine with spicy Thai curry",
    "What Portuguese wines do you have for seafood?",
    "Recommend a wine for a romantic dinner, around $40-50"
]

# Reset conversation for clean testing
reset_conversation()
create_conversation()

print("üß™ Running automated tests...\n")
print("="*60)

for i, query in enumerate(test_queries, 1):
    print(f"\n\nüìù TEST {i}/{len(test_queries)}")
    print(f"Query: {query}")
    print("-" * 60)
    
    try:
        response = send_message(query)
        print(f"\nüç∑ Response:\n{response}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
    
    # Small delay between requests
    if i < len(test_queries):
        time.sleep(2)

print("\n" + "="*60)
print("‚úÖ Testing complete!")

## üõ†Ô∏è Troubleshooting & Utilities

### Common Issues

**API Key Error**
- Ensure `OPENAI_API_KEY` is added to Colab Secrets (üîë icon in sidebar)
- Verify the key is valid and has credits

**PDF Not Found**
- Upload `Vinhos baba d_urso.pdf` to Colab session via Files panel
- Or mount Google Drive: `from google.colab import drive; drive.mount('/content/drive')`
- Update `WINE_PDF_PATH` variable to match PDF location

**Vector Store Error**
- If retrieval fails, set `VECTOR_STORE_ID = None` and re-run cell 3
- Check OpenAI dashboard for vector store status

**No Wine Recommendations**
- Verify PDF contains wine information (not empty/corrupted)
- Check if query matches content in PDF (agent can't invent wines)
- Try broader queries if specific wines aren't found

**Timeout Errors**
- Large PDFs may take longer to search (increase timeout in `send_message`)
- Network issues - retry the request

### Utility Functions

In [None]:
# Check assistant details
def check_assistant_status():
    """Display current assistant configuration."""
    try:
        assistant_info = client.beta.assistants.retrieve(assistant.id)
        print("üìä Assistant Status:")
        print(f"   ID: {assistant_info.id}")
        print(f"   Name: {assistant_info.name}")
        print(f"   Model: {assistant_info.model}")
        print(f"   Tools: {[t.type for t in assistant_info.tools]}")
        print(f"   Vector Stores: {assistant_info.tool_resources.file_search.vector_store_ids}")
    except Exception as e:
        print(f"‚ùå Error: {e}")

# Check vector store details
def check_vector_store_status():
    """Display vector store information."""
    try:
        vs = client.beta.vector_stores.retrieve(VECTOR_STORE_ID)
        print("üìö Vector Store Status:")
        print(f"   ID: {vs.id}")
        print(f"   Name: {vs.name}")
        print(f"   Status: {vs.status}")
        print(f"   Files: {vs.file_counts.completed}/{vs.file_counts.total}")
        print(f"   Created: {vs.created_at}")
    except Exception as e:
        print(f"‚ùå Error: {e}")

# Quick test function
def quick_test(query="What wines do you have?"):
    """Send a quick test query."""
    print(f"Testing with query: {query}\n")
    response = send_message(query)
    print(response)
    return response

print("‚úÖ Utility functions loaded")
print("\nAvailable utilities:")
print("  - check_assistant_status()")
print("  - check_vector_store_status()")
print("  - quick_test('your query here')")
print("  - reset_conversation()")
print("  - show_conversation_history()")

## üìñ Usage Guide

### Getting Started
1. **First Time**: Run cells 1-4 in order to set up the system
2. **Save Vector Store ID**: After cell 3, copy the ID and paste it back for future use
3. **Start Chatting**: Run cell 6 to begin interactive conversation

### How to Ask for Recommendations

**Include Key Details:**
- **Food/Occasion**: "Grilled salmon", "romantic dinner", "spicy curry"
- **Budget**: "Under $30", "around $50", "no budget limit"
- **Preferences**: "Bold reds", "no sweet wines", "crisp whites"
- **Constraints**: "No oak", "vegetarian pairing", "low alcohol"

**Example Good Queries:**
```
Suggest a wine for grilled salmon with lemon, budget $25-35, prefer dry whites
I'm cooking ribeye steak tonight, what bold red would pair well?
Need a sparkling wine for celebration, around $40
Pair a wine with spicy Thai cuisine, I don't like sweet wines
What Portuguese wines work well with seafood?
```

### Understanding Responses

The sommelier will provide:
- **2-4 wine recommendations** from the catalog
- **Wine details**: Name, region, grape variety, style profile
- **Pairing rationale**: Why it matches your food/preferences
- **Serving notes**: Temperature, decanting if needed
- **Citations**: References to the wine catalog PDF

### Advanced Features

**Multi-turn Conversations:**
```
You: Suggest a wine for pasta with tomato sauce
Sommelier: [recommendations]
You: What about something more full-bodied?
Sommelier: [adjusted recommendations based on context]
```

**Wine Education:**
```
Tell me about wines from the Douro region
What's the difference between these grape varieties?
Explain the tasting notes for [specific wine]
```

### Cost Management

- **Vector Store**: ~$0.10/GB/day (minimal for small PDF)
- **Queries**: ~$0.03/GB per search (pennies per query)
- **Model**: Standard GPT-4o rates (~$0.01-0.03 per query)
- **Tip**: Reuse `VECTOR_STORE_ID` to avoid re-indexing

### Updating the Wine Catalog

To add a new/updated PDF:
1. Set `VECTOR_STORE_ID = None` in cell 3
2. Update `WINE_PDF_PATH` with new PDF location
3. Re-run cell 3 to create new vector store
4. Save the new vector store ID

---

**Happy wine pairing! üç∑ü•Ç**