# Pipeline 5: Interactive User Clarification

## Overview
This notebook implements an interactive Haystack pipeline that asks clarifying questions to help identify a user's LOCATION and KEY WORDS for business searches. It uses conversational AI to extract precise search parameters through natural dialogue.

## What This Pipeline Does
1. Engages in conversation with the user
2. Asks targeted clarifying questions about location and preferences
3. Extracts location and keywords from user responses
4. Validates and confirms extracted information
5. Returns structured search parameters ready for Pipeline 1

## Use Cases
- Interactive business search refinement
- Ambiguous query clarification
- User preference extraction
- Conversational search interface

## Pipeline Architecture
```
Initial Query ‚Üí Question Generator ‚Üí User Response ‚Üí Information Extractor ‚Üí Validation ‚Üí Final Parameters
```

## Integration Points
- **Input**: Vague or incomplete user queries
- **Output**: Location and keywords for Pipeline 1

## Setup and Environment Variables

Ensure your `.env` file contains:
```
OPENAI_API_KEY=your_openai_key_here
```

In [None]:
# Import required libraries
from dotenv import load_dotenv
import os
from haystack import Pipeline, component, Document
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from typing import List, Dict, Any, Optional
import json

# Load environment variables
load_dotenv(".env")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print("‚úì Environment variables loaded successfully")

## Prompt Templates for Interactive Dialogue

Define prompt templates for asking clarifying questions and extracting information.

In [None]:
# Prompt template for generating clarifying questions
CLARIFICATION_QUESTIONS_TEMPLATE = """
You are a helpful assistant helping users find businesses on Yelp.

User's Initial Query: {{ user_query }}

Current Information Extracted:
- Location: {{ location if location else "NOT SPECIFIED" }}
- Keywords: {{ keywords if keywords else "NOT SPECIFIED" }}

Your task: Generate 1-2 specific, friendly clarifying questions to help identify:
1. A specific LOCATION (city, neighborhood, or area) if not already clear
2. KEY WORDS describing what type of business or specific preferences if vague

Rules:
- If location is missing or vague, prioritize asking for location first
- If keywords are vague (e.g., "food", "restaurant"), ask for more specifics (cuisine type, atmosphere, etc.)
- Keep questions natural and conversational
- Don't ask questions if information is already clear
- If both location and keywords are clear, respond with "READY" and summarize what you understand

Format: Just provide the questions naturally, one per line. Or respond with "READY: [summary]" if no more questions needed.
"""

# Prompt template for extracting information from user responses
INFORMATION_EXTRACTION_TEMPLATE = """
You are an information extraction assistant.

Conversation History:
{{ conversation_history }}

Latest User Response: {{ user_response }}

Task: Extract structured information from the user's response:
1. LOCATION: Any city, neighborhood, or geographic area mentioned
2. KEYWORDS: Business types, cuisine, features, or preferences mentioned

Previous Extracted Info:
- Location: {{ current_location if current_location else "None" }}
- Keywords: {{ current_keywords if current_keywords else "None" }}

Rules:
- Update location ONLY if user mentions a new or more specific location
- Add new keywords without removing previous ones unless contradicted
- Return empty if nothing new is mentioned
- Be conservative - only extract clear, explicit mentions

Return a JSON object with this exact format:
{
  "location": "extracted location or empty string",
  "keywords": ["keyword1", "keyword2"],
  "confidence": "high/medium/low"
}
"""

print("‚úì Prompt templates defined")

## Custom Component 1: Clarifying Question Generator

This component generates targeted questions to extract missing information.

In [None]:
@component
class ClarifyingQuestionGenerator:
    """
    Generates clarifying questions based on current information state.
    
    This component:
    1. Analyzes what information is missing (location, keywords)
    2. Generates natural, targeted questions
    3. Determines if enough information has been collected
    
    Input:
        - user_query (str): Initial or current user query
        - location (Optional[str]): Currently extracted location
        - keywords (Optional[List[str]]): Currently extracted keywords
    
    Output:
        - questions (str): Clarifying questions or READY signal
        - is_ready (bool): Whether we have enough information
    """
    
    def __init__(self, api_key: str):
        """
        Initialize the question generator.
        
        Args:
            api_key: OpenAI API key
        """
        self.prompt_builder = PromptBuilder(template=CLARIFICATION_QUESTIONS_TEMPLATE)
        self.generator = OpenAIGenerator(
            api_key=api_key,
            model="gpt-4o-mini",
            generation_kwargs={"temperature": 0.7}
        )
    
    @component.output_types(questions=str, is_ready=bool)
    def run(
        self,
        user_query: str,
        location: Optional[str] = "",
        keywords: Optional[List[str]] = None
    ) -> Dict[str, Any]:
        """
        Generate clarifying questions.
        
        Args:
            user_query: User's query or latest response
            location: Currently extracted location
            keywords: Currently extracted keywords
            
        Returns:
            Dictionary with questions and ready status
        """
        keywords = keywords or []
        
        # Build prompt
        prompt_result = self.prompt_builder.run(
            user_query=user_query,
            location=location,
            keywords=", ".join(keywords) if keywords else ""
        )
        
        # Generate questions
        llm_result = self.generator.run(prompt=prompt_result['prompt'])
        response = llm_result['replies'][0] if llm_result['replies'] else ""
        
        # Check if ready
        is_ready = response.startswith("READY")
        
        return {
            "questions": response,
            "is_ready": is_ready
        }

print("‚úì ClarifyingQuestionGenerator component defined")

## Custom Component 2: Information Extractor

This component extracts location and keywords from user responses.

In [None]:
@component
class InformationExtractor:
    """
    Extracts location and keywords from user responses.
    
    This component:
    1. Analyzes user responses for location mentions
    2. Identifies business-related keywords and preferences
    3. Updates extracted information incrementally
    4. Returns structured data with confidence scores
    
    Input:
        - user_response (str): User's response to questions
        - conversation_history (str): Previous conversation context
        - current_location (Optional[str]): Previously extracted location
        - current_keywords (Optional[List[str]]): Previously extracted keywords
    
    Output:
        - location (str): Updated location
        - keywords (List[str]): Updated keywords
        - confidence (str): Confidence level (high/medium/low)
    """
    
    def __init__(self, api_key: str):
        """
        Initialize the information extractor.
        
        Args:
            api_key: OpenAI API key
        """
        self.prompt_builder = PromptBuilder(template=INFORMATION_EXTRACTION_TEMPLATE)
        self.generator = OpenAIGenerator(
            api_key=api_key,
            model="gpt-4o-mini",
            generation_kwargs={"temperature": 0.3}  # Lower temp for more consistent extraction
        )
    
    @component.output_types(location=str, keywords=List[str], confidence=str)
    def run(
        self,
        user_response: str,
        conversation_history: str = "",
        current_location: str = "",
        current_keywords: Optional[List[str]] = None
    ) -> Dict[str, Any]:
        """
        Extract information from user response.
        
        Args:
            user_response: User's latest response
            conversation_history: Previous conversation
            current_location: Previously extracted location
            current_keywords: Previously extracted keywords
            
        Returns:
            Dictionary with updated location, keywords, and confidence
        """
        current_keywords = current_keywords or []
        
        # Build prompt
        prompt_result = self.prompt_builder.run(
            user_response=user_response,
            conversation_history=conversation_history,
            current_location=current_location,
            current_keywords=", ".join(current_keywords) if current_keywords else ""
        )
        
        # Extract information
        llm_result = self.generator.run(prompt=prompt_result['prompt'])
        response_text = llm_result['replies'][0] if llm_result['replies'] else "{}"
        
        # Parse JSON response
        try:
            # Extract JSON from response (handle markdown code blocks)
            if "```json" in response_text:
                json_start = response_text.find("```json") + 7
                json_end = response_text.find("```", json_start)
                json_text = response_text[json_start:json_end].strip()
            elif "```" in response_text:
                json_start = response_text.find("```") + 3
                json_end = response_text.find("```", json_start)
                json_text = response_text[json_start:json_end].strip()
            else:
                json_text = response_text
            
            extracted = json.loads(json_text)
            
            # Update location if new one provided
            new_location = extracted.get("location", "")
            final_location = new_location if new_location else current_location
            
            # Merge keywords
            new_keywords = extracted.get("keywords", [])
            final_keywords = list(set(current_keywords + new_keywords))
            
            confidence = extracted.get("confidence", "low")
            
        except json.JSONDecodeError:
            # If parsing fails, keep current values
            final_location = current_location
            final_keywords = current_keywords
            confidence = "low"
        
        return {
            "location": final_location,
            "keywords": final_keywords,
            "confidence": confidence
        }

print("‚úì InformationExtractor component defined")

## Interactive Conversation Manager

This class manages the conversation flow and coordinates the pipeline components.

In [None]:
class InteractiveConversationManager:
    """
    Manages the interactive conversation flow for extracting search parameters.
    
    This class orchestrates the conversation, tracks state, and determines
    when enough information has been collected.
    """
    
    def __init__(self, api_key: str):
        """
        Initialize the conversation manager.
        
        Args:
            api_key: OpenAI API key
        """
        self.question_generator = ClarifyingQuestionGenerator(api_key=api_key)
        self.info_extractor = InformationExtractor(api_key=api_key)
        
        self.location = ""
        self.keywords = []
        self.conversation_history = []
        self.is_ready = False
    
    def start_conversation(self, initial_query: str) -> str:
        """
        Start the conversation with an initial query.
        
        Args:
            initial_query: User's initial search query
            
        Returns:
            First clarifying question or confirmation
        """
        self.conversation_history.append(f"User: {initial_query}")
        
        # Generate initial questions
        result = self.question_generator.run(
            user_query=initial_query,
            location=self.location,
            keywords=self.keywords
        )
        
        self.is_ready = result['is_ready']
        response = result['questions']
        
        self.conversation_history.append(f"Assistant: {response}")
        
        return response
    
    def process_response(self, user_response: str) -> str:
        """
        Process user's response and continue conversation.
        
        Args:
            user_response: User's response to the question
            
        Returns:
            Next question or confirmation message
        """
        self.conversation_history.append(f"User: {user_response}")
        
        # Extract information from response
        extraction_result = self.info_extractor.run(
            user_response=user_response,
            conversation_history="\n".join(self.conversation_history[-4:]),  # Last 4 messages
            current_location=self.location,
            current_keywords=self.keywords
        )
        
        # Update state
        self.location = extraction_result['location']
        self.keywords = extraction_result['keywords']
        
        # Generate next question
        question_result = self.question_generator.run(
            user_query=user_response,
            location=self.location,
            keywords=self.keywords
        )
        
        self.is_ready = question_result['is_ready']
        response = question_result['questions']
        
        self.conversation_history.append(f"Assistant: {response}")
        
        return response
    
    def get_search_parameters(self) -> Dict[str, Any]:
        """
        Get the final search parameters.
        
        Returns:
            Dictionary with location and keywords
        """
        return {
            "location": self.location,
            "keywords": self.keywords,
            "is_ready": self.is_ready
        }
    
    def reset(self):
        """Reset the conversation state."""
        self.location = ""
        self.keywords = []
        self.conversation_history = []
        self.is_ready = False

print("‚úì InteractiveConversationManager class defined")

## Test the Interactive System

Let's simulate a conversation where the user provides a vague query and we clarify it.

In [None]:
# Initialize the conversation manager
manager = InteractiveConversationManager(api_key=OPENAI_API_KEY)

# Scenario 1: Vague query - user wants food but no location or specifics
print("="*60)
print("SCENARIO 1: Vague Query")
print("="*60)

initial_query = "I'm looking for a good place to eat"
print(f"\nUser: {initial_query}")

response = manager.start_conversation(initial_query)
print(f"\nAssistant: {response}")
print(f"\nExtracted so far - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

In [None]:
# Simulate user response with location but still vague on type
user_response = "I'm in Madison, Wisconsin"
print(f"\nUser: {user_response}")

response = manager.process_response(user_response)
print(f"\nAssistant: {response}")
print(f"\nExtracted so far - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

In [None]:
# Simulate user providing more specific preferences
user_response = "I want Mexican food, preferably something casual with good margaritas"
print(f"\nUser: {user_response}")

response = manager.process_response(user_response)
print(f"\nAssistant: {response}")
print(f"\nExtracted - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

# Get final search parameters
if manager.is_ready:
    params = manager.get_search_parameters()
    print(f"\n{'='*60}")
    print("FINAL SEARCH PARAMETERS:")
    print(f"{'='*60}")
    print(f"Location: {params['location']}")
    print(f"Keywords: {', '.join(params['keywords'])}")
    print(f"\nThese parameters are ready for Pipeline 1!")

## Test Scenario 2: Specific Query That Needs No Clarification

Test with a query that already has clear location and keywords.

In [None]:
# Reset for new scenario
manager.reset()

print("\n" + "="*60)
print("SCENARIO 2: Specific Query")
print("="*60)

initial_query = "Best Italian pizza restaurants in San Francisco with outdoor seating"
print(f"\nUser: {initial_query}")

response = manager.start_conversation(initial_query)
print(f"\nAssistant: {response}")
print(f"\nExtracted - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

if manager.is_ready:
    params = manager.get_search_parameters()
    print(f"\n{'='*60}")
    print("FINAL SEARCH PARAMETERS:")
    print(f"{'='*60}")
    print(f"Location: {params['location']}")
    print(f"Keywords: {', '.join(params['keywords'])}")

## Test Scenario 3: Query with Location but Vague Preferences

Test with a query that has location but needs keyword clarification.

In [None]:
# Reset for new scenario
manager.reset()

print("\n" + "="*60)
print("SCENARIO 3: Has Location, Needs Keywords")
print("="*60)

initial_query = "Looking for something good in downtown Seattle"
print(f"\nUser: {initial_query}")

response = manager.start_conversation(initial_query)
print(f"\nAssistant: {response}")
print(f"\nExtracted - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

In [None]:
# User clarifies preferences
user_response = "I want a coffee shop with good WiFi, somewhere I can work for a few hours"
print(f"\nUser: {user_response}")

response = manager.process_response(user_response)
print(f"\nAssistant: {response}")
print(f"\nExtracted - Location: '{manager.location}', Keywords: {manager.keywords}")
print(f"Ready: {manager.is_ready}")

if manager.is_ready:
    params = manager.get_search_parameters()
    print(f"\n{'='*60}")
    print("FINAL SEARCH PARAMETERS:")
    print(f"{'='*60}")
    print(f"Location: {params['location']}")
    print(f"Keywords: {', '.join(params['keywords'])}")

## Integration with Pipeline 1

Demonstrate how to use the extracted parameters with Pipeline 1.

In [None]:
def prepare_pipeline1_query(location: str, keywords: List[str]) -> str:
    """
    Convert extracted parameters into a query for Pipeline 1.
    
    Args:
        location: Extracted location
        keywords: Extracted keywords
        
    Returns:
        Formatted query string for Pipeline 1
    """
    keyword_string = " ".join(keywords)
    query = f"{keyword_string} in {location}"
    return query

# Example usage
manager.reset()

# Simulate a complete conversation
print("="*60)
print("COMPLETE WORKFLOW: Pipeline 5 ‚Üí Pipeline 1")
print("="*60)

initial_query = "I need somewhere to eat"
print(f"\nUser: {initial_query}")
response = manager.start_conversation(initial_query)
print(f"Assistant: {response}")

user_response = "I'm in Chicago"
print(f"\nUser: {user_response}")
response = manager.process_response(user_response)
print(f"Assistant: {response}")

user_response = "Thai food, something upscale for a date night"
print(f"\nUser: {user_response}")
response = manager.process_response(user_response)
print(f"Assistant: {response}")

# Get final parameters
params = manager.get_search_parameters()
print(f"\n{'='*60}")
print("EXTRACTED PARAMETERS:")
print(f"{'='*60}")
print(f"Location: {params['location']}")
print(f"Keywords: {', '.join(params['keywords'])}")

# Prepare for Pipeline 1
pipeline1_query = prepare_pipeline1_query(params['location'], params['keywords'])
print(f"\n{'='*60}")
print("PIPELINE 1 INPUT:")
print(f"{'='*60}")
print(f"Query: {pipeline1_query}")
print(f"\nThis query can now be sent to Pipeline 1 for business search!")

## Complete Integration Example

Here's how all pipelines work together in the full workflow.

In [None]:
def complete_workflow_example():
    """
    Demonstrates the complete workflow from interactive clarification to recommendations.
    """
    print("="*80)
    print("COMPLETE YELP NAVIGATOR WORKFLOW")
    print("="*80)
    
    print("\nüìç STEP 1: Interactive Clarification (Pipeline 5)")
    print("-" * 80)
    print("Purpose: Extract clear location and keywords from user conversation")
    print("\nExample conversation:")
    print("  User: 'I want food'")
    print("  Assistant: 'Where are you located?'")
    print("  User: 'Madison, WI'")
    print("  Assistant: 'What type of food are you looking for?'")
    print("  User: 'Mexican, casual'")
    print("\n  Extracted: location='Madison, WI', keywords=['Mexican', 'casual']")
    
    print("\n\nüîç STEP 2: Business Search with NER (Pipeline 1)")
    print("-" * 80)
    print("Purpose: Search Yelp for businesses matching criteria")
    print("\nInput: 'Mexican casual in Madison, WI'")
    print("Output: List of businesses with IDs and aliases")
    print("  - The Old Fashioned (ID: RJNAeNA-209sctUO0dmwuA)")
    print("  - Taqueria Guadalajara (ID: xyz123...)")
    print("  - etc.")
    
    print("\n\nüìã STEP 3: Business Details (Pipeline 2)")
    print("-" * 80)
    print("Purpose: Get detailed information and website content")
    print("\nInput: Business IDs and aliases from Pipeline 1")
    print("Output: Documents with:")
    print("  - Price range, rating, location coordinates")
    print("  - Website content")
    print("  - Contact information")
    
    print("\n\n‚≠ê STEP 4: Reviews & Sentiment Analysis (Pipeline 3)")
    print("-" * 80)
    print("Purpose: Analyze customer reviews and sentiment")
    print("\nInput: Business IDs from Pipeline 1")
    print("Output: Aggregated documents with:")
    print("  - Highest-rated reviews (positive sentiment)")
    print("  - Lowest-rated reviews (negative sentiment)")
    print("  - Sentiment distribution")
    
    print("\n\nüí° STEP 5: Recommendations (Pipeline 4)")
    print("-" * 80)
    print("Purpose: Generate personalized recommendations")
    print("\nInput: Review documents + user preferences")
    print("Output: Personalized recommendations with:")
    print("  - Theme analysis")
    print("  - Pros and cons")
    print("  - Best suited for...")
    print("  - Final recommendation (Yes/No)")
    
    print("\n" + "="*80)
    print("END-TO-END RESULT: Intelligent business recommendations based on")
    print("user conversation, business data, and sentiment-analyzed reviews")
    print("="*80)

complete_workflow_example()

## Summary

### What We Built
- **Pipeline 5** provides an interactive conversational interface for clarifying user intent
- Extracts location and keywords through natural dialogue
- Integrates seamlessly with Pipeline 1 to start the complete workflow

### Key Features
- **Adaptive Questions**: Asks only what's needed based on current information
- **Incremental Extraction**: Builds understanding across multiple conversation turns
- **Confidence Tracking**: Monitors extraction quality
- **Ready Detection**: Knows when enough information has been collected

### Integration with Complete System

```
User Query (vague)
    ‚Üì
Pipeline 5: Interactive Clarification
    ‚îú‚îÄ‚Üí Extract Location
    ‚îî‚îÄ‚Üí Extract Keywords
    ‚Üì
Pipeline 1: Business Search with NER
    ‚îî‚îÄ‚Üí Returns business IDs and aliases
    ‚Üì
Pipeline 2: Business Details (optional)
    ‚îî‚îÄ‚Üí Returns detailed business info
    ‚Üì
Pipeline 3: Reviews & Sentiment
    ‚îî‚îÄ‚Üí Returns analyzed reviews
    ‚Üì
Pipeline 4: Recommendations
    ‚îî‚îÄ‚Üí Returns personalized recommendations
```

### Usage Example
```python
# Initialize conversation manager
manager = InteractiveConversationManager(api_key=OPENAI_API_KEY)

# Start conversation
response = manager.start_conversation("I want food")
print(response)  # Asks clarifying questions

# Continue until ready
while not manager.is_ready:
    user_input = input("You: ")
    response = manager.process_response(user_input)
    print(f"Assistant: {response}")

# Get parameters for Pipeline 1
params = manager.get_search_parameters()
query = f"{' '.join(params['keywords'])} in {params['location']}"

# Now run Pipeline 1-4 with the extracted parameters...
```

### Benefits
- Handles ambiguous user queries gracefully
- Reduces failed searches due to missing information
- Improves user experience with natural conversation
- Ensures downstream pipelines have quality input data