# Pipeline 4: Summary, Themes, and Recommendations

## Overview
This notebook implements a Haystack pipeline that generates summaries, identifies key themes, and provides recommendations based on business reviews from Pipeline 3. It uses language models to analyze review content and generate insights.

## What This Pipeline Does
1. Accepts aggregated review documents from Pipeline 3
2. Analyzes review content to identify themes and patterns
3. Generates comprehensive summaries
4. Provides personalized recommendations based on user requests
5. Returns structured insights and recommendations

## Use Cases
- Business recommendation generation
- Review theme extraction
- Customer experience analysis
- Decision support for users choosing businesses

## Pipeline Architecture
```
Review Documents → Theme Extraction → Summary Generation → Recommendation Engine → Final Insights
```

## Required Components
- Language Model (OpenAI GPT or similar)
- Document processing components
- Custom recommendation logic

## Setup and Environment Variables

Ensure your `.env` file contains:
```
RAPID_API_KEY=your_key_here
OPENAI_API_KEY=your_openai_key_here
```

In [None]:
# Import required libraries
import requests
from dotenv import load_dotenv
import os
from haystack import Pipeline, component, Document
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from typing import List, Dict, Any
import json

# Load environment variables
load_dotenv(".env")
RAPID_API_KEY = os.getenv("RAPID_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print("✓ Environment variables loaded successfully")

## Custom Component 1: Review Theme Extractor

This component analyzes review documents to identify key themes mentioned across multiple reviews.

In [None]:
@component
class ReviewThemeExtractor:
    """
    Extracts common themes from review documents.
    
    This component:
    1. Analyzes review content from aggregated documents
    2. Identifies recurring themes in positive and negative reviews
    3. Prepares structured theme data for LLM processing
    
    Input:
        - documents (List[Document]): Aggregated review documents from Pipeline 3
    
    Output:
        - theme_data (List[Dict]): Structured theme information per business
    """
    
    @component.output_types(theme_data=List[Dict])
    def run(self, documents: List[Document]) -> Dict[str, List[Dict]]:
        """
        Extract themes from review documents.
        
        Args:
            documents: Aggregated review documents with metadata
            
        Returns:
            Dictionary with theme_data containing structured theme information
        """
        theme_data = []
        
        for doc in documents:
            biz_id = doc.meta.get('business_id', 'unknown')
            
            # Extract positive themes from highest-rated reviews
            positive_themes = []
            for review in doc.meta.get('highest_rated_reviews', []):
                text = review.get('text', '')
                # Extract key phrases (simplified - in production, use NLP techniques)
                if len(text) > 50:
                    positive_themes.append(text[:200])
            
            # Extract negative themes from lowest-rated reviews
            negative_themes = []
            for review in doc.meta.get('lowest_rated_reviews', []):
                text = review.get('text', '')
                if len(text) > 50:
                    negative_themes.append(text[:200])
            
            # Prepare structured data
            theme_info = {
                "business_id": biz_id,
                "total_reviews": doc.meta.get('total_reviews', 0),
                "positive_count": doc.meta.get('positive_count', 0),
                "negative_count": doc.meta.get('negative_count', 0),
                "neutral_count": doc.meta.get('neutral_count', 0),
                "positive_themes": positive_themes,
                "negative_themes": negative_themes,
                "sentiment_ratio": (
                    doc.meta.get('positive_count', 0) / doc.meta.get('total_reviews', 1)
                    if doc.meta.get('total_reviews', 0) > 0 else 0
                )
            }
            theme_data.append(theme_info)
        
        return {"theme_data": theme_data}

print("✓ ReviewThemeExtractor component defined")

## Prompt Templates for LLM Analysis

Define prompt templates for theme analysis and recommendation generation.

In [None]:
# Prompt template for theme identification and summary
THEME_SUMMARY_TEMPLATE = """
You are an expert at analyzing customer reviews and identifying key themes.

Business Review Data:
- Total Reviews: {{ total_reviews }}
- Positive Reviews: {{ positive_count }} ({{ positive_ratio }}%)
- Negative Reviews: {{ negative_count }} ({{ negative_ratio }}%)
- Neutral Reviews: {{ neutral_count }}

Positive Review Excerpts:
{% for theme in positive_themes %}
- {{ theme }}
{% endfor %}

Negative Review Excerpts:
{% for theme in negative_themes %}
- {{ theme }}
{% endfor %}

Task: Analyze the reviews and provide:
1. **Key Positive Themes** (3-5 bullet points): What customers love most
2. **Key Negative Themes** (3-5 bullet points): Common complaints or concerns
3. **Overall Summary** (2-3 sentences): General customer sentiment and experience

Format your response clearly with the three sections above.
"""

# Prompt template for recommendation generation
RECOMMENDATION_TEMPLATE = """
You are a helpful assistant providing restaurant/business recommendations based on customer reviews.

User Request: {{ user_request }}

Business Analysis Summary:
{{ business_summary }}

Task: Based on the user's request and the business analysis, provide:
1. **Recommendation** (Yes/No with confidence level): Should the user visit this business?
2. **Reasoning** (3-4 sentences): Why or why not, citing specific themes from reviews
3. **Best For**: Type of customer who would enjoy this business most
4. **Watch Out For**: Any concerns or caveats the user should know

Be honest and balanced in your recommendation. Consider both positive and negative feedback.
"""

print("✓ Prompt templates defined")

## Custom Component 2: Theme Summary Generator

This component uses an LLM to generate summaries and identify themes from review data.

In [None]:
@component
class ThemeSummaryGenerator:
    """
    Generates theme summaries using an LLM.
    
    This component:
    1. Takes theme data from ReviewThemeExtractor
    2. Constructs prompts for LLM analysis
    3. Generates structured summaries with identified themes
    4. Returns enriched data with LLM-generated insights
    
    Input:
        - theme_data (List[Dict]): Theme information per business
    
    Output:
        - summaries (List[Dict]): Theme summaries and analysis per business
    """
    
    def __init__(self, api_key: str):
        """
        Initialize the theme summary generator.
        
        Args:
            api_key: OpenAI API key
        """
        self.prompt_builder = PromptBuilder(template=THEME_SUMMARY_TEMPLATE)
        self.generator = OpenAIGenerator(
            api_key=api_key,
            model="gpt-4o-mini",
            generation_kwargs={"temperature": 0.7}
        )
    
    @component.output_types(summaries=List[Dict])
    def run(self, theme_data: List[Dict]) -> Dict[str, List[Dict]]:
        """
        Generate theme summaries for each business.
        
        Args:
            theme_data: Theme information from extractor
            
        Returns:
            Dictionary with summaries containing LLM-generated analysis
        """
        summaries = []
        
        for data in theme_data:
            # Calculate percentages
            total = data['total_reviews']
            positive_ratio = round(data['positive_count'] / total * 100, 1) if total > 0 else 0
            negative_ratio = round(data['negative_count'] / total * 100, 1) if total > 0 else 0
            
            # Build prompt
            prompt_result = self.prompt_builder.run(
                total_reviews=data['total_reviews'],
                positive_count=data['positive_count'],
                negative_count=data['negative_count'],
                neutral_count=data['neutral_count'],
                positive_ratio=positive_ratio,
                negative_ratio=negative_ratio,
                positive_themes=data['positive_themes'],
                negative_themes=data['negative_themes']
            )
            
            # Generate summary
            llm_result = self.generator.run(prompt=prompt_result['prompt'])
            
            # Prepare summary data
            summary = {
                "business_id": data['business_id'],
                "total_reviews": data['total_reviews'],
                "sentiment_distribution": {
                    "positive": data['positive_count'],
                    "negative": data['negative_count'],
                    "neutral": data['neutral_count']
                },
                "llm_analysis": llm_result['replies'][0] if llm_result['replies'] else "No analysis available",
                "sentiment_ratio": data['sentiment_ratio']
            }
            summaries.append(summary)
        
        return {"summaries": summaries}

print("✓ ThemeSummaryGenerator component defined")

## Custom Component 3: Recommendation Engine

This component generates personalized recommendations based on user requests and business summaries.

In [None]:
@component
class RecommendationEngine:
    """
    Generates personalized business recommendations.
    
    This component:
    1. Takes user request and business summaries
    2. Uses LLM to generate tailored recommendations
    3. Provides reasoning and caveats
    4. Returns structured recommendation data
    
    Input:
        - summaries (List[Dict]): Business summaries with LLM analysis
        - user_request (str): User's specific request or preferences
    
    Output:
        - recommendations (List[Dict]): Personalized recommendations per business
    """
    
    def __init__(self, api_key: str):
        """
        Initialize the recommendation engine.
        
        Args:
            api_key: OpenAI API key
        """
        self.prompt_builder = PromptBuilder(template=RECOMMENDATION_TEMPLATE)
        self.generator = OpenAIGenerator(
            api_key=api_key,
            model="gpt-4o-mini",
            generation_kwargs={"temperature": 0.7}
        )
    
    @component.output_types(recommendations=List[Dict])
    def run(self, summaries: List[Dict], user_request: str) -> Dict[str, List[Dict]]:
        """
        Generate recommendations for each business.
        
        Args:
            summaries: Business summaries from ThemeSummaryGenerator
            user_request: User's specific request or preferences
            
        Returns:
            Dictionary with personalized recommendations
        """
        recommendations = []
        
        for summary in summaries:
            # Build prompt
            prompt_result = self.prompt_builder.run(
                user_request=user_request,
                business_summary=summary['llm_analysis']
            )
            
            # Generate recommendation
            llm_result = self.generator.run(prompt=prompt_result['prompt'])
            
            # Prepare recommendation data
            recommendation = {
                "business_id": summary['business_id'],
                "user_request": user_request,
                "sentiment_score": summary['sentiment_ratio'],
                "total_reviews": summary['total_reviews'],
                "sentiment_distribution": summary['sentiment_distribution'],
                "theme_analysis": summary['llm_analysis'],
                "recommendation": llm_result['replies'][0] if llm_result['replies'] else "No recommendation available"
            }
            recommendations.append(recommendation)
        
        return {"recommendations": recommendations}

print("✓ RecommendationEngine component defined")

## Build the Pipeline

Assembling all components:
1. ReviewThemeExtractor - Extract themes from reviews
2. ThemeSummaryGenerator - Generate summaries with LLM
3. RecommendationEngine - Create personalized recommendations

In [None]:
# Initialize pipeline
pipeline = Pipeline()

# Initialize components
theme_extractor = ReviewThemeExtractor()
summary_generator = ThemeSummaryGenerator(api_key=OPENAI_API_KEY)
recommendation_engine = RecommendationEngine(api_key=OPENAI_API_KEY)

# Add components to pipeline
pipeline.add_component("theme_extractor", theme_extractor)
pipeline.add_component("summary_generator", summary_generator)
pipeline.add_component("recommendation_engine", recommendation_engine)

# Connect components
pipeline.connect("theme_extractor.theme_data", "summary_generator.theme_data")
pipeline.connect("summary_generator.summaries", "recommendation_engine.summaries")

print("✓ Pipeline built successfully")
print("\nPipeline structure:")
print("Review Documents → ThemeExtractor → SummaryGenerator → RecommendationEngine → Final Recommendations")

## Integration Test: Run Pipeline 3 → Pipeline 4

Let's run a complete end-to-end test by first running Pipeline 3 to get review data, then feeding it into Pipeline 4.

In [None]:
# First, we need to import and run Pipeline 3 to get review documents
# For this demo, we'll create sample aggregated review documents

# Sample aggregated document (as would come from Pipeline 3)
sample_review_doc = Document(
    content="Business Review Summary (ID: RJNAeNA-209sctUO0dmwuA)",
    meta={
        "business_id": "RJNAeNA-209sctUO0dmwuA",
        "total_reviews": 10,
        "positive_count": 7,
        "neutral_count": 2,
        "negative_count": 1,
        "highest_rated_reviews": [
            {
                "rating": 5,
                "sentiment": "positive",
                "text": "Amazing Wisconsin cheese curds! The service was excellent and the atmosphere was perfect for a casual dinner. Highly recommend the Old Fashioned cocktail.",
                "user": "John D.",
                "url": "https://yelp.com/review1"
            },
            {
                "rating": 5,
                "sentiment": "positive",
                "text": "Great local food and fantastic beer selection. The staff was knowledgeable about all the Wisconsin beers on tap. Perfect spot for tourists and locals alike.",
                "user": "Sarah M.",
                "url": "https://yelp.com/review2"
            },
            {
                "rating": 4,
                "sentiment": "positive",
                "text": "Delicious comfort food with a Wisconsin twist. Portions are generous. Can get crowded during peak hours but worth the wait.",
                "user": "Mike R.",
                "url": "https://yelp.com/review3"
            }
        ],
        "lowest_rated_reviews": [
            {
                "rating": 2,
                "sentiment": "negative",
                "text": "Long wait times even with a reservation. Food was good but overpriced for what you get. Service seemed rushed.",
                "user": "Emily K.",
                "url": "https://yelp.com/review4"
            }
        ]
    }
)

print("✓ Sample review document created")
print(f"Business ID: {sample_review_doc.meta['business_id']}")
print(f"Total Reviews: {sample_review_doc.meta['total_reviews']}")
print(f"Sentiment: {sample_review_doc.meta['positive_count']} positive, {sample_review_doc.meta['negative_count']} negative")

In [None]:
# Test the pipeline with sample data
user_request = "I'm looking for a casual restaurant with authentic Wisconsin food and a good beer selection. I don't mind waiting a bit if the food is worth it."

print(f"User Request: {user_request}")
print("="*60)
print("\nRunning Pipeline 4...")

result = pipeline.run(data={
    "theme_extractor": {
        "documents": [sample_review_doc]
    },
    "recommendation_engine": {
        "user_request": user_request
    }
})

# Display results
recommendations = result['recommendation_engine']['recommendations']
print(f"\n✓ Generated {len(recommendations)} recommendation(s)")

for i, rec in enumerate(recommendations, 1):
    print(f"\n{'='*60}")
    print(f"BUSINESS {i}: {rec['business_id']}")
    print(f"{'='*60}")
    print(f"\nSentiment Score: {rec['sentiment_score']*100:.1f}% positive")
    print(f"Total Reviews Analyzed: {rec['total_reviews']}")
    print(f"Distribution: {rec['sentiment_distribution']['positive']} positive, "
          f"{rec['sentiment_distribution']['neutral']} neutral, "
          f"{rec['sentiment_distribution']['negative']} negative")
    
    print(f"\n--- THEME ANALYSIS ---")
    print(rec['theme_analysis'])
    
    print(f"\n--- PERSONALIZED RECOMMENDATION ---")
    print(rec['recommendation'])

## Test with Different User Request

Let's test with a different type of user request to see how recommendations change.

In [None]:
# Test with a different user request
user_request = "I'm looking for a quick lunch spot with great service. I don't have much time and need fast service."

print(f"User Request: {user_request}")
print("="*60)

result = pipeline.run(data={
    "theme_extractor": {
        "documents": [sample_review_doc]
    },
    "recommendation_engine": {
        "user_request": user_request
    }
})

recommendations = result['recommendation_engine']['recommendations']

for rec in recommendations:
    print(f"\n--- PERSONALIZED RECOMMENDATION ---")
    print(rec['recommendation'])

## Test with Multiple Businesses

Test with multiple business review documents to compare recommendations.

In [None]:
# Create second sample business
sample_review_doc2 = Document(
    content="Business Review Summary (ID: EgtyW19V-64c6PmRuvzSEA)",
    meta={
        "business_id": "EgtyW19V-64c6PmRuvzSEA",
        "total_reviews": 8,
        "positive_count": 5,
        "neutral_count": 2,
        "negative_count": 1,
        "highest_rated_reviews": [
            {
                "rating": 5,
                "sentiment": "positive",
                "text": "Excellent craft beer brewed on-site! The pub atmosphere is lively and fun. Great for groups. Food is solid pub fare with good variety.",
                "user": "Tom H.",
                "url": "https://yelp.com/review5"
            },
            {
                "rating": 4,
                "sentiment": "positive",
                "text": "Love the brewery tours! Staff is friendly and beer selection rotates seasonally. Nice outdoor seating area in summer.",
                "user": "Lisa P.",
                "url": "https://yelp.com/review6"
            }
        ],
        "lowest_rated_reviews": [
            {
                "rating": 3,
                "sentiment": "negative",
                "text": "Can be quite loud, especially on weekends. Service was a bit slow during busy times. Beer is great though.",
                "user": "David L.",
                "url": "https://yelp.com/review7"
            }
        ]
    }
)

# Test with both businesses
user_request = "I want a fun place with great craft beer and a lively atmosphere. I'm going with friends and we like trying local brews."

print(f"User Request: {user_request}")
print("="*60)

result = pipeline.run(data={
    "theme_extractor": {
        "documents": [sample_review_doc, sample_review_doc2]
    },
    "recommendation_engine": {
        "user_request": user_request
    }
})

recommendations = result['recommendation_engine']['recommendations']
print(f"\n✓ Comparing {len(recommendations)} businesses")

for i, rec in enumerate(recommendations, 1):
    print(f"\n{'='*60}")
    print(f"BUSINESS {i}: {rec['business_id']}")
    print(f"Sentiment Score: {rec['sentiment_score']*100:.1f}%")
    print(f"\n{rec['recommendation']}")

## Helper Function: Format Recommendations

Utility function to format recommendations in a user-friendly way.

In [None]:
def format_recommendations(recommendations: List[Dict], business_names: Dict[str, str] = None) -> str:
    """
    Format recommendations into a readable string.
    
    Args:
        recommendations: List of recommendation dictionaries
        business_names: Optional mapping of business_id to business name
    
    Returns:
        Formatted recommendation string
    """
    if not recommendations:
        return "No recommendations available."
    
    output = []
    output.append("=" * 80)
    output.append("BUSINESS RECOMMENDATIONS BASED ON CUSTOMER REVIEWS")
    output.append("=" * 80)
    
    for i, rec in enumerate(recommendations, 1):
        biz_id = rec['business_id']
        biz_name = business_names.get(biz_id, biz_id) if business_names else biz_id
        
        output.append(f"\n{i}. {biz_name}")
        output.append("-" * 80)
        output.append(f"Sentiment Score: {rec['sentiment_score']*100:.1f}% positive")
        output.append(f"Based on {rec['total_reviews']} reviews")
        output.append(f"Distribution: {rec['sentiment_distribution']}")
        output.append("")
        output.append(rec['recommendation'])
        output.append("")
    
    return "\n".join(output)

# Test the formatter
business_names = {
    "RJNAeNA-209sctUO0dmwuA": "The Old Fashioned",
    "EgtyW19V-64c6PmRuvzSEA": "The Great Dane Pub & Brewing Company"
}

formatted = format_recommendations(recommendations, business_names)
print(formatted)

## Summary

### What We Built
- **Pipeline 4** successfully generates summaries, identifies themes, and provides personalized recommendations
- Uses LLMs to analyze review content and generate insights
- Provides balanced recommendations considering both positive and negative feedback

### Key Outputs
Each recommendation contains:
- **Theme Analysis**: LLM-generated identification of positive and negative themes
- **Personalized Recommendation**: Tailored advice based on user's specific request
- **Reasoning**: Explanation citing specific review themes
- **Sentiment Metrics**: Quantitative sentiment distribution

### Integration with Other Pipelines
This pipeline integrates with:
- **Pipeline 3**: Takes aggregated review documents as input
- **Pipeline 2**: Can incorporate business details for richer context
- **Pipeline 1**: Completes the full search-to-recommendation flow

### Complete Workflow Example
```python
# Step 1: Search for businesses (Pipeline 1)
search_result = pipeline1.run(data={"query": "Mexican restaurants Madison"})
business_ids = extract_business_ids(search_result)

# Step 2: Get business details (Pipeline 2) - Optional
details_result = pipeline2.run(data={"business_ids": business_ids})

# Step 3: Fetch and analyze reviews (Pipeline 3)
reviews_result = pipeline3.run(data={"business_ids": business_ids})
review_documents = reviews_result['reviews_aggregator']['documents']

# Step 4: Generate recommendations (Pipeline 4)
recommendations_result = pipeline4.run(data={
    "theme_extractor": {"documents": review_documents},
    "recommendation_engine": {"user_request": "I want authentic Mexican food"}
})

# Display recommendations
recommendations = recommendations_result['recommendation_engine']['recommendations']
print(format_recommendations(recommendations))
```

### Next Steps
- Combine all pipelines into a unified multi-agent system
- Add Pipeline 5 for interactive user clarification
- Store results in a document store for faster retrieval
- Add caching to reduce API calls