# Yelp Navigator - Pipeline Chaining Guide

This notebook demonstrates how to:
1. Form queries for the `business_search/run` endpoint
2. Pass outputs from Pipeline 1 to other pipelines
3. Chain all pipelines together for a complete workflow

## Prerequisites

- cd `ch8/yelp-navigator/`
- **Environment variables configured in `../.env`** (including `RAPID_API_KEY`)
- Hayhooks server running with env vars loaded: `sh start_hayhooks.sh`
- Server should be accessible at `http://localhost:1416`

## ‚ö†Ô∏è Troubleshooting: No Businesses Found?

If you get 0 results, the issue is usually that the Hayhooks server isn't loading the API key:
1. Stop the server (Ctrl+C)
2. Run `sh start_hayhooks.sh` (it now loads `.env` automatically)
3. Rerun the notebook cells

## Setup and Imports

In [1]:
import requests
import json
from pprint import pprint
from typing import Dict, Any

# Base URL for Hayhooks server
BASE_URL = "http://localhost:1416"

# Helper function to print JSON nicely
def print_json(data, max_items=3):
    """Print JSON data in a readable format"""
    print(json.dumps(data, indent=2)[:2000])  # Limit output length

## Test Server Connection

In [2]:
# Check if Hayhooks server is running
try:
    response = requests.get(f"{BASE_URL}/status")
    print("‚úÖ Hayhooks server is running!")
    print(f"Status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("‚ùå Cannot connect to Hayhooks server")
    print("Please start the server with: hayhooks run --pipelines-dir pipelines")

‚úÖ Hayhooks server is running!
Status: 200


---

# Pipeline 1: Business Search

## Understanding the Entry Point

The `business_search/run` endpoint accepts a natural language query and returns business results.

**Request Structure**:
```json
{
  "query_converter": {
    "query": "your natural language search here"
  }
}
```

The key `query_converter` matches the first component in the pipeline (QueryToDocument component).

## Example 1: Simple Search Query

In [3]:
# Form a query for business_search/run
query = {
  "query": "Mexican food in texas"
}

print("Sending query to Pipeline 1 (Business Search)...")
print(f"Query: {query['query']}")
print("\nRequest payload:")
print_json(query)

Sending query to Pipeline 1 (Business Search)...
Query: Mexican food in texas

Request payload:
{
  "query": "Mexican food in texas"
}


In [4]:

# Execute the search
# Form a query for business_search/run
query = {
  "query": "Mexican food in LA"
}


response1 = requests.post(
    f"{BASE_URL}/business_search/run",
    json=query
)

if response1.status_code == 200:
    pipeline1_output = response1.json()
    print("‚úÖ Pipeline 1 succeeded!\n")
    
    # Extract key information
    results = pipeline1_output['result']
    businesses = results.get('businesses', [])
    
    print(f"Query: {results.get('query', 'N/A')}")
    print(f"Extracted Location: {results.get('extracted_location', 'None')}")
    print(f"Extracted Keywords: {results.get('extracted_keywords', [])}")
    print(f"Search Parameters: {results.get('search_params', {})}")
    print(f"\nFound {results.get('result_count', 0)} total results")
    print(f"Returned {len(businesses)} businesses on this page\n")
    
    # Show first 3 businesses
    for i, business in enumerate(businesses[:3], 1):
        print(f"{i}. {business['name']}")
        print(f"   ID: {business['business_id']}")
        print(f"   Alias: {business['alias']}")
        print(f"   Rating: {business['rating']} ({business['review_count']} reviews)")
        print(f"   Price: {business.get('price_range', 'N/A')}")
        print(f"   Categories: {', '.join(business['categories'])}")
        print(f"   Website: {business.get('website', 'N/A')}")
        print(f"   Location: ({business['location']['lat']}, {business['location']['lon']})")
        print()
else:
    print(f"‚ùå Pipeline 1 failed with status {response1.status_code}")
    print(response1.text)

‚úÖ Pipeline 1 succeeded!

Query: Mexican food in LA
Extracted Location: LA
Extracted Keywords: ['food', 'Mexican', 'LA']
Search Parameters: {'location': 'LA', 'query': 'food Mexican LA', 'original_query': 'Mexican food in LA'}

Found 240 total results
Returned 10 businesses on this page

1. Sonoratown
   ID: Ti2Ksp2oPj6rpdp2tQcaVA
   Alias: sonoratown-los-angeles
   Rating: 4.4 (2151 reviews)
   Price: $$
   Categories: Mexican
   Website: http://sonoratown.com
   Location: (34.041648, -118.252245)

2. Tlayuda L.A. Restaurant
   ID: Lop79P2KM9zFUCCaBYz6zA
   Alias: tlayuda-l-a-restaurant-los-angeles-6
   Rating: 4.6 (907 reviews)
   Price: $$
   Categories: Mexican
   Website: http://tlayudala.com
   Location: (34.0906411, -118.3079777)

3. Lenny's Casita
   ID: Zbl6doI-1YkRkJ2Bg8aYBg
   Alias: lennys-casita-los-angeles
   Rating: 4.6 (366 reviews)
   Price: $$
   Categories: Kosher, Mexican
   Website: http://lennyscasita.com
   Location: (34.05506466, -118.38432343)



---

# Understanding Pipeline 1 Output Structure

Pipeline 1 returns a nested structure that will be passed to downstream pipelines.

In [5]:
# Inspect the complete output structure
print("Complete Pipeline 1 Output Structure:\n")
print("Top-level keys:", list(pipeline1_output.keys()))
print("\nresult keys:", list(pipeline1_output['result'].keys()))
print("\nSample business keys:", list(businesses[0].keys()) if businesses else "No businesses")

print("\n" + "="*60)
print("IMPORTANT: This entire structure will be passed to Pipelines 2 & 3")
print("="*60)

Complete Pipeline 1 Output Structure:

Top-level keys: ['result']

result keys: ['query', 'extracted_location', 'extracted_keywords', 'search_params', 'result_count', 'businesses']

Sample business keys: ['business_id', 'name', 'alias', 'rating', 'review_count', 'categories', 'price_range', 'phone', 'website', 'location', 'images']

IMPORTANT: This entire structure will be passed to Pipelines 2 & 3


---

# Pipeline 2: Business Details

## How to Pass Pipeline 1 Output

Pipeline 2 accepts the **complete Pipeline 1 output** directly as `pipeline1_output`.

**Request Structure**:
```json
{
  "pipeline1_output": {
    "result": { ... entire Pipeline 1 output ... }
  }
}
```

In [6]:
# Chain Pipeline 1 output to Pipeline 2
print("Sending Pipeline 1 output to Pipeline 2 (Business Details)...\n")

pipeline2_request = {
    "pipeline1_output": pipeline1_output
}

response2 = requests.post(
    f"{BASE_URL}/business_details/run",
    json=pipeline2_request
)

if response2.status_code == 200:
    pipeline2_output = response2.json()
    print("‚úÖ Pipeline 2 succeeded!\n")
    
else:
    print(f"‚ùå Pipeline 2 failed with status {response2.status_code}")
    print(response2.text)  

Sending Pipeline 1 output to Pipeline 2 (Business Details)...

‚úÖ Pipeline 2 succeeded!

‚úÖ Pipeline 2 succeeded!



In [7]:
# Extract enriched documents
if 'result' in pipeline2_output:
    result = pipeline2_output['result']
    documents = result.get('documents', [])
    print(f"Created {len(documents)} enriched documents")
    print(f"Document count: {result.get('document_count', 'N/A')}")
    print(f"Business count: {result.get('business_count', 'N/A')}")
    print(f"URLs fetched: {len(result.get('urls_fetched', []))}\n")
    
    # Show details for first document
    if documents:
        doc = documents[0]
        metadata = doc.get('metadata', {})
        print("Sample Document:")
        print(f"  Business Name: {metadata.get('business_name', 'N/A')}")
        print(f"  Business ID: {metadata.get('business_id', 'N/A')}")
        print(f"  Price Range: {metadata.get('price_range', 'N/A')}")
        print(f"  Rating: {metadata.get('rating', 'N/A')} ({metadata.get('review_count', 'N/A')} reviews)")
        print(f"  Categories: {', '.join(metadata.get('categories', []))}")
        print(f"  Phone: {metadata.get('phone', 'N/A')}")
        print(f"  Website: {metadata.get('website', 'N/A')}")
        location = metadata.get('location', {})
        print(f"  Coordinates: ({location.get('lat', 'N/A')}, {location.get('lon', 'N/A')})")
        print(f"  Content Length: {doc.get('content_length', 0)} characters")
        if doc.get('content_preview'):
            print(f"  Content Preview: {doc.get('content_preview', '')[:100]}...")
else:
    print("No documents found in output")
    print("Available keys:", list(pipeline2_output.keys()))


Created 8 enriched documents
Document count: 8
Business count: 10
URLs fetched: 8

Sample Document:
  Business Name: Sonoratown
  Business ID: Ti2Ksp2oPj6rpdp2tQcaVA
  Price Range: $$
  Rating: 4.4 (2151 reviews)
  Categories: Mexican
  Phone: (213) 222-5071
  Website: http://sonoratown.com
  Coordinates: (34.041648, -118.252245)
  Content Length: 0 characters


---

# Pipeline 3: Reviews & Sentiment Analysis

## How to Pass Pipeline 1 Output

Pipeline 3 also accepts the **complete Pipeline 1 output** under the `parser` key (same format as Pipeline 2).

In [8]:
# Step 1: Search for coffee shops that will return results
print("=" * 70)
print("STEP 1: Searching for businesses...")
print("=" * 70)

test_query = {"query": "coffee shops in San Francisco"}

response1 = requests.post(
    f"{BASE_URL}/business_search/run",
    json=test_query
)

if response1.status_code == 200:
    pipeline1_output = response1.json()
    result = pipeline1_output['result']
    businesses = result.get('businesses', [])
    
    print(f"‚úÖ Found {result.get('result_count', 0)} total results")
    print(f"   Returned {len(businesses)} businesses")
    print(f"   Query: {result.get('query', 'N/A')}")
    print(f"   Location: {result.get('extracted_location', 'N/A')}")
    
    if businesses:
        print(f"\nFirst 3 businesses:")
        for i, biz in enumerate(businesses[:3], 1):
            print(f"{i}. {biz['name']} - {biz['rating']}‚≠ê ({biz['review_count']} reviews)")
    
    # Step 2: Pass to Pipeline 3 for sentiment analysis
    if len(businesses) > 0:
        print("\n" + "=" * 70)
        print("STEP 2: Analyzing reviews & sentiment...")
        print("=" * 70)
        print("‚è≥ This may take 30-60 seconds...\n")
        
        response3 = requests.post(
            f"{BASE_URL}/business_sentiment/run",
            json={"pipeline1_output": pipeline1_output},
            timeout=120
        )
        
        if response3.status_code == 200:
            pipeline3_output = response3.json()
            sentiment_result = pipeline3_output['result']
            
            print(f"‚úÖ Sentiment analysis complete!")
            print(f"   Businesses analyzed: {sentiment_result.get('business_count', 0)}")
            print(f"   Total reviews analyzed: {sentiment_result.get('total_reviews_analyzed', 0)}")
            
            # Show sentiment for first business
            if sentiment_result.get('businesses'):
                first_biz = sentiment_result['businesses'][0]
                print(f"\nüìä Sample Business Sentiment:")
                print(f"   Business ID: {first_biz['business_id']}")
                print(f"   Total Reviews: {first_biz.get('total_reviews', 0)}")
                print(f"   Overall Sentiment: {first_biz.get('overall_sentiment', 'N/A').upper()}")
                
                sentiment_pct = first_biz.get('sentiment_percentages', {})
                print(f"   Breakdown: {sentiment_pct.get('positive', 0):.1f}% positive, "
                      f"{sentiment_pct.get('neutral', 0):.1f}% neutral, "
                      f"{sentiment_pct.get('negative', 0):.1f}% negative")
        else:
            print(f"‚ùå Pipeline 3 failed: {response3.status_code}")
            print(response3.text)
    else:
        print("\n‚ö†Ô∏è  No businesses to analyze")
else:
    print(f"‚ùå Pipeline 1 failed: {response1.status_code}")
    print(response1.text)

STEP 1: Searching for businesses...
‚úÖ Found 240 total results
   Returned 10 businesses
   Query: coffee shops in San Francisco
   Location: San Francisco

First 3 businesses:
1. Q Specialty Coffee - 4.5‚≠ê (44 reviews)
2. The Coffee Movement - 4.6‚≠ê (249 reviews)
3. Saint Frank Coffee - 4.4‚≠ê (112 reviews)

STEP 2: Analyzing reviews & sentiment...
‚è≥ This may take 30-60 seconds...

‚úÖ Found 240 total results
   Returned 10 businesses
   Query: coffee shops in San Francisco
   Location: San Francisco

First 3 businesses:
1. Q Specialty Coffee - 4.5‚≠ê (44 reviews)
2. The Coffee Movement - 4.6‚≠ê (249 reviews)
3. Saint Frank Coffee - 4.4‚≠ê (112 reviews)

STEP 2: Analyzing reviews & sentiment...
‚è≥ This may take 30-60 seconds...

‚úÖ Sentiment analysis complete!
   Businesses analyzed: 0
   Total reviews analyzed: 0
‚úÖ Sentiment analysis complete!
   Businesses analyzed: 0
   Total reviews analyzed: 0


---

# Complete Workflow: All Pipelines Together

This section demonstrates a complete end-to-end workflow.

In [9]:
def run_complete_workflow(query: str, include_details: bool = True, include_sentiment: bool = True):
    """
    Run the complete pipeline workflow.
    
    Args:
        query: Natural language search query
        include_details: Whether to fetch business details (Pipeline 2)
        include_sentiment: Whether to analyze reviews (Pipeline 3)
    
    Returns:
        Dictionary with results from all pipelines
    """
    print("="*70)
    print(f"RUNNING COMPLETE WORKFLOW")
    print(f"Query: {query}")
    print("="*70 + "\n")
    
    results = {}
    
    # Step 1: Business Search
    print("[1/3] Pipeline 1: Business Search...")
    response1 = requests.post(
        f"{BASE_URL}/business_search/run",
        json={"query": query}
    )
    
    if response1.status_code != 200:
        print(f"‚ùå Pipeline 1 failed: {response1.status_code}")
        return results
    
    pipeline1_output = response1.json()
    results['search'] = pipeline1_output
    
    businesses = pipeline1_output['result'].get('businesses', [])
    print(f"‚úÖ Found {len(businesses)} businesses\n")
    
    # Step 2: Business Details (Optional)
    if include_details:
        print("[2/3] Pipeline 2: Fetching Business Details...")
        response2 = requests.post(
            f"{BASE_URL}/business_details/run",
            json={"pipeline1_output": pipeline1_output}
        )
        
        if response2.status_code == 200:
            results['details'] = response2.json()
            print("‚úÖ Business details fetched\n")
        else:
            print(f"‚ùå Pipeline 2 failed: {response2.status_code}\n")
    
    # Step 3: Review Sentiment Analysis (Optional)
    if include_sentiment:
        print("[3/3] Pipeline 3: Analyzing Reviews & Sentiment...")
        print("‚è≥ This may take 30-60 seconds...")
        response3 = requests.post(
            f"{BASE_URL}/business_sentiment/run",
            json={"pipeline1_output": pipeline1_output},
            timeout=120
        )
        
        if response3.status_code == 200:
            results['sentiment'] = response3.json()
            print("‚úÖ Sentiment analysis completed\n")
        else:
            print(f"‚ùå Pipeline 3 failed: {response3.status_code}\n")
            print(f"   Error: {response3.text}")
    
    print("\n" + "="*70)
    print("WORKFLOW COMPLETE")
    print("="*70)
    
    return results

## Example: Run Complete Workflow

In [10]:
# Run the complete workflow
workflow_results = run_complete_workflow(
    query="Italian restaurants in San Francisco",
    include_details=True,
    include_sentiment=True
)

RUNNING COMPLETE WORKFLOW
Query: Italian restaurants in San Francisco

[1/3] Pipeline 1: Business Search...
‚úÖ Found 10 businesses

[2/3] Pipeline 2: Fetching Business Details...
‚úÖ Found 10 businesses

[2/3] Pipeline 2: Fetching Business Details...
‚úÖ Business details fetched

[3/3] Pipeline 3: Analyzing Reviews & Sentiment...
‚è≥ This may take 30-60 seconds...
‚úÖ Business details fetched

[3/3] Pipeline 3: Analyzing Reviews & Sentiment...
‚è≥ This may take 30-60 seconds...
‚úÖ Sentiment analysis completed


WORKFLOW COMPLETE
‚úÖ Sentiment analysis completed


WORKFLOW COMPLETE


## Display Comprehensive Results

In [11]:
# Display results from all pipelines
if workflow_results:
    print("\n" + "#"*70)
    print("COMPREHENSIVE RESULTS SUMMARY")
    print("#"*70 + "\n")
    
    # Search Results
    if 'search' in workflow_results:
        result = workflow_results['search']['result']
        businesses = result.get('businesses', [])
        print(f"üìç SEARCH RESULTS: {result.get('result_count', 0)} total results")
        print(f"   Location: {result.get('extracted_location', 'N/A')}")
        print(f"   Keywords: {result.get('extracted_keywords', [])}")
        print(f"   Showing {len(businesses)} businesses\n")
        
        for i, business in enumerate(businesses[:5], 1):
            print(f"{i}. {business['name']}")
            print(f"   ‚≠ê Rating: {business['rating']} ({business['review_count']} reviews)")
            print(f"   üí∞ Price: {business.get('price_range', 'N/A')}")
            print(f"   üìû Phone: {business.get('phone', 'N/A')}")
            print(f"   üîó Website: {business.get('website', 'N/A')}")
            print()
    
    # Sentiment Analysis
    if 'sentiment' in workflow_results:
        print("\n" + "-"*70)
        print("üòä SENTIMENT ANALYSIS\n")
        
        sentiment_result = workflow_results['sentiment'].get('result', {})
        businesses_sentiment = sentiment_result.get('businesses', [])
        
        print(f"üìä Summary:")
        print(f"   Total businesses analyzed: {sentiment_result.get('business_count', 0)}")
        print(f"   Total reviews analyzed: {sentiment_result.get('total_reviews_analyzed', 0)}")
        print(f"   Business IDs processed: {len(sentiment_result.get('business_ids_processed', []))}\n")
        
        for i, business in enumerate(businesses_sentiment[:3], 1):
            print(f"{i}. Business ID: {business['business_id']}")
            print(f"   üìù Total Reviews: {business.get('total_reviews', 0)}")
            
            # Sentiment distribution
            sentiment_dist = business.get('sentiment_distribution', {})
            print(f"   Sentiment Distribution: {sentiment_dist.get('positive', 0)} positive, "
                  f"{sentiment_dist.get('neutral', 0)} neutral, "
                  f"{sentiment_dist.get('negative', 0)} negative")
            
            # Sentiment percentages
            sentiment_pct = business.get('sentiment_percentages', {})
            overall = business.get('overall_sentiment', 'unknown')
            print(f"   Overall Sentiment: {overall.upper()}")
            print(f"   Breakdown: {sentiment_pct.get('positive', 0):.1f}% positive, "
                  f"{sentiment_pct.get('neutral', 0):.1f}% neutral, "
                  f"{sentiment_pct.get('negative', 0):.1f}% negative")
            
            # Show top reviews
            top_reviews = business.get('highest_rated_reviews', [])
            if top_reviews:
                print(f"   \n   ‚≠ê Top Review ({top_reviews[0].get('rating', 'N/A')} stars):")
                print(f"      User: {top_reviews[0].get('user', 'N/A')}")
                print(f"      Sentiment: {top_reviews[0].get('sentiment', 'N/A')}")
                print(f"      Text: {top_reviews[0].get('text', '')[:120]}...")
            
            # Show lowest rated reviews if any
            low_reviews = business.get('lowest_rated_reviews', [])
            if low_reviews:
                print(f"   \n   ‚ö†Ô∏è  Lowest Review ({low_reviews[0].get('rating', 'N/A')} stars):")
                print(f"      User: {low_reviews[0].get('user', 'N/A')}")
                print(f"      Sentiment: {low_reviews[0].get('sentiment', 'N/A')}")
                print(f"      Text: {low_reviews[0].get('text', '')[:120]}...")
            print()
    
    print("\n" + "#"*70)
else:
    print("No results available")


######################################################################
COMPREHENSIVE RESULTS SUMMARY
######################################################################

üìç SEARCH RESULTS: 240 total results
   Location: San Francisco
   Keywords: ['Italian', 'San', 'Francisco', 'restaurants']
   Showing 10 businesses

1. San Francisco Pizza
   ‚≠ê Rating: 2.7 (180 reviews)
   üí∞ Price: $$
   üìû Phone: (510) 412-4400
   üîó Website: http://www.sanfranciscopizzaria.com

2. MINA Family Kitchen San Francisco
   ‚≠ê Rating: 3.6 (41 reviews)
   üí∞ Price: None
   üìû Phone: (415) 660-2656
   üîó Website: https://www.michaelmina.net/delivery/

3. Italian-American Social Club of San Francisco
   ‚≠ê Rating: 4.4 (49 reviews)
   üí∞ Price: $$
   üìû Phone: (415) 585-8059
   üîó Website: http://www.iascsf.net

4. Trattoria da Vittorio - San Francisco
   ‚≠ê Rating: 4.3 (1100 reviews)
   üí∞ Price: $$
   üìû Phone: (415) 742-0300
   üîó Website: http://trattoriadavittorio.com



---

# Key Takeaways

## 1. Pipeline Entry Points

- **Pipeline 1**: `{"query": "your search"}`
- **Pipeline 2**: `{"pipeline1_output": {...}}`
- **Pipeline 3**: `{"pipeline1_output": {...}}`

## 2. Pipeline 1 Response Structure

```json
{
  "result": {
    "query": "original query",
    "extracted_location": "location or empty string",
    "extracted_keywords": ["keyword1", "keyword2"],
    "search_params": {
      "location": "search location",
      "query": "formatted query",
      "original_query": "original query"
    },
    "result_count": 170,
    "businesses": [
      {
        "business_id": "...",
        "name": "...",
        "alias": "...",
        "rating": 4.5,
        "review_count": 100,
        "categories": ["Mexican"],
        "price_range": "$$",
        "phone": "...",
        "website": "...",
        "location": {"lat": 37.77, "lon": -122.41},
        "images": ["..."]
      }
    ]
  }
}
```

## 3. Data Flow

```
Natural Language Query
         ‚Üì
   Pipeline 1 (Business Search)
   Input: {"query": "..."}
         ‚Üì
   Complete JSON Output with 'result' key
         ‚Üì
    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îê
    ‚Üì         ‚Üì
Pipeline 2  Pipeline 3
{"pipeline1_output": {...}}
```

## 4. Best Practices

- Always pass the **entire** Pipeline 1 output to downstream pipelines
- Pipeline 2 & 3 expect `pipeline1_output` directly at the top level (not nested under `parser`)
- Pipelines 2 & 3 can run in parallel for efficiency
- Use appropriate timeouts for sentiment analysis (can take 30-60 seconds)
- The entry point for Pipeline 1 is just `{"query": "..."}` - simple and direct

## 5. Common Patterns

```python
# Pattern 1: Sequential execution
p1_output = run_pipeline1({"query": query})
p2_output = run_pipeline2({"pipeline1_output": p1_output})
p3_output = run_pipeline3({"pipeline1_output": p1_output})

# Pattern 2: Parallel execution (faster)
p1_output = run_pipeline1({"query": query})
p2_output, p3_output = run_parallel({"pipeline1_output": p1_output})

# Pattern 3: Conditional execution
p1_output = run_pipeline1({"query": query})
if need_details:
    p2_output = run_pipeline2({"pipeline1_output": p1_output})
if need_sentiment:
    p3_output = run_pipeline3({"pipeline1_output": p1_output})
```

---

# Try Your Own Queries

Use the cells below to experiment with your own queries.

In [None]:
# Customize this cell with your own query
my_query = "vegan restaurants in Los Angeles"

# Run the complete workflow
my_results = run_complete_workflow(
    query=my_query,
    include_details=True,
    include_sentiment=True
)

In [None]:
# Or run just specific pipelines
my_query = "bookstores in Boston"

# Just search
search_only = run_complete_workflow(
    query=my_query,
    include_details=False,
    include_sentiment=False
)