# Yelp Navigator - Pipeline Chaining Guide

This notebook demonstrates how to:
1. Form queries for the `business_search/run` endpoint
2. Pass outputs from Pipeline 1 to other pipelines
3. Chain all pipelines together for a complete workflow

## Prerequisites

- cd `ch8/yelp-navigator/`
- Hayhooks server running: `uv run hayhooks run --pipelines-dir pipelines`
- Server should be accessible at `http://localhost:1416`

## Setup and Imports

In [35]:
import requests
import json
from pprint import pprint
from typing import Dict, Any

# Base URL for Hayhooks server
BASE_URL = "http://localhost:1416"

# Helper function to print JSON nicely
def print_json(data, max_items=3):
    """Print JSON data in a readable format"""
    print(json.dumps(data, indent=2)[:2000])  # Limit output length

## Test Server Connection

In [36]:
# Check if Hayhooks server is running
try:
    response = requests.get(f"{BASE_URL}/status")
    print("‚úÖ Hayhooks server is running!")
    print(f"Status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("‚ùå Cannot connect to Hayhooks server")
    print("Please start the server with: hayhooks run --pipelines-dir pipelines")

‚úÖ Hayhooks server is running!
Status: 200


---

# Pipeline 1: Business Search

## Understanding the Entry Point

The `business_search/run` endpoint accepts a natural language query and returns business results.

**Request Structure**:
```json
{
  "query_converter": {
    "query": "your natural language search here"
  }
}
```

The key `query_converter` matches the first component in the pipeline (QueryToDocument component).

## Example 1: Simple Search Query

In [37]:
# Form a query for business_search/run
query = {
  "query": "Mexican food in texas"
}

print("Sending query to Pipeline 1 (Business Search)...")
print(f"Query: {query['query']}")
print("\nRequest payload:")
print_json(query)

Sending query to Pipeline 1 (Business Search)...
Query: Mexican food in texas

Request payload:
{
  "query": "Mexican food in texas"
}


In [38]:
# Execute the search
# Form a query for business_search/run
query = {
  "query": "Mexican food in texas"
}


response1 = requests.post(
    f"{BASE_URL}/business_search/run",
    json=query
)

if response1.status_code == 200:
    pipeline1_output = response1.json()
    print("‚úÖ Pipeline 1 succeeded!\n")
    
    # Extract key information
    results = pipeline1_output['result']
    businesses = results.get('businesses', [])
    
    print(f"Query: {results.get('query', 'N/A')}")
    print(f"Extracted Location: {results.get('extracted_location', 'None')}")
    print(f"Extracted Keywords: {results.get('extracted_keywords', [])}")
    print(f"Search Parameters: {results.get('search_params', {})}")
    print(f"\nFound {results.get('result_count', 0)} total results")
    print(f"Returned {len(businesses)} businesses on this page\n")
    
    # Show first 3 businesses
    for i, business in enumerate(businesses[:3], 1):
        print(f"{i}. {business['name']}")
        print(f"   ID: {business['business_id']}")
        print(f"   Alias: {business['alias']}")
        print(f"   Rating: {business['rating']} ({business['review_count']} reviews)")
        print(f"   Price: {business.get('price_range', 'N/A')}")
        print(f"   Categories: {', '.join(business['categories'])}")
        print(f"   Website: {business.get('website', 'N/A')}")
        print(f"   Location: ({business['location']['lat']}, {business['location']['lon']})")
        print()
else:
    print(f"‚ùå Pipeline 1 failed with status {response1.status_code}")
    print(response1.text)

‚úÖ Pipeline 1 succeeded!

Query: Mexican food in texas
Extracted Location: 
Extracted Keywords: ['food', 'texas', 'Mexican']
Search Parameters: {'location': 'United States', 'query': 'food texas Mexican', 'original_query': 'Mexican food in texas'}

Found 170 total results
Returned 10 businesses on this page

1. Cielito Lindo
   ID: MNhRp6GhBiwCgGycyI6Mdw
   Alias: cielito-lindo-san-francisco
   Rating: 4.4 (417 reviews)
   Price: $$
   Categories: Mexican
   Website: None
   Location: (37.77592068, -122.4959264)

2. Zona Rosa Mexican Grill
   ID: Qho1it7RMDXgDVOoaUt8kA
   Alias: zona-rosa-mexican-grill-san-francisco
   Rating: 4.4 (68 reviews)
   Price: None
   Categories: Mexican
   Website: None
   Location: (37.7633777, -122.43389504)

3. SanJalisco Mexican Restaurant
   ID: _mWCUrUZf_ytJj2N87On2Q
   Alias: sanjalisco-mexican-restaurant-san-francisco
   Rating: 4.2 (1081 reviews)
   Price: $$
   Categories: Mexican, Seafood, Breakfast & Brunch
   Website: https://sanjaliscomexicanr

## Example 2: Different Query Styles

In [5]:
# Try different query styles - the NER component will extract entities
test_queries = [
    "best Mexican restaurants in Austin, Texas",
    "sushi places near Seattle",
    "pizza in Chicago",
    "coffee shops in Portland, Oregon"
]

print("Testing different query formats:\n")
for query_text in test_queries:
    response = requests.post(
        f"{BASE_URL}/business_search/run",
        json= {"query": query_text}
    )
    
    if response.status_code == 200:
        data = response.json()
        result = data['result']
        count = result.get('result_count', 0)
        location = result.get('extracted_location', 'N/A')
        keywords = result.get('extracted_keywords', [])
        print(f"‚úÖ '{query_text}'")
        print(f"   Found: {count} results")
        print(f"   Location: {location}, Keywords: {keywords}\n")
    else:
        print(f"‚ùå '{query_text}' failed\n")

Testing different query formats:

‚úÖ 'best Mexican restaurants in Austin, Texas'
   Found: 240 results
   Location: Austin, Keywords: ['Austin,', 'restaurants', 'Texas', 'Mexican']

‚úÖ 'best Mexican restaurants in Austin, Texas'
   Found: 240 results
   Location: Austin, Keywords: ['Austin,', 'restaurants', 'Texas', 'Mexican']

‚úÖ 'sushi places near Seattle'
   Found: 240 results
   Location: Seattle, Keywords: ['sushi', 'places', 'near', 'Seattle']

‚úÖ 'sushi places near Seattle'
   Found: 240 results
   Location: Seattle, Keywords: ['sushi', 'places', 'near', 'Seattle']

‚úÖ 'pizza in Chicago'
   Found: 240 results
   Location: Chicago, Keywords: ['Chicago', 'pizza']

‚úÖ 'pizza in Chicago'
   Found: 240 results
   Location: Chicago, Keywords: ['Chicago', 'pizza']

‚úÖ 'coffee shops in Portland, Oregon'
   Found: 240 results
   Location: Portland, Keywords: ['Oregon', 'Portland,', 'shops', 'coffee']

‚úÖ 'coffee shops in Portland, Oregon'
   Found: 240 results
   Location: Portla

---

# Understanding Pipeline 1 Output Structure

Pipeline 1 returns a nested structure that will be passed to downstream pipelines.

In [39]:
# Inspect the complete output structure
print("Complete Pipeline 1 Output Structure:\n")
print("Top-level keys:", list(pipeline1_output.keys()))
print("\nresult keys:", list(pipeline1_output['result'].keys()))
print("\nSample business keys:", list(businesses[0].keys()) if businesses else "No businesses")

print("\n" + "="*60)
print("IMPORTANT: This entire structure will be passed to Pipelines 2 & 3")
print("="*60)

Complete Pipeline 1 Output Structure:

Top-level keys: ['result']

result keys: ['query', 'extracted_location', 'extracted_keywords', 'search_params', 'result_count', 'businesses']

Sample business keys: ['business_id', 'name', 'alias', 'rating', 'review_count', 'categories', 'price_range', 'phone', 'website', 'location', 'images']

IMPORTANT: This entire structure will be passed to Pipelines 2 & 3


---

# Pipeline 2: Business Details

## How to Pass Pipeline 1 Output

Pipeline 2 accepts the **complete Pipeline 1 output** directly as `pipeline1_output`.

**Request Structure**:
```json
{
  "pipeline1_output": {
    "result": { ... entire Pipeline 1 output ... }
  }
}
```

## ‚ö†Ô∏è Important: Restart Hayhooks Server

If you've just updated the pipeline components to match the new Pipeline 1 structure, you need to restart the Hayhooks server:

```bash
# Stop the current server (Ctrl+C in the terminal)
# Then restart with:
uv run hayhooks run --pipelines-dir pipelines
```

This ensures the updated components are loaded with the correct field mappings.

In [40]:
pipeline1_output

{'result': {'query': 'Mexican food in texas',
  'extracted_location': '',
  'extracted_keywords': ['food', 'texas', 'Mexican'],
  'search_params': {'location': 'United States',
   'query': 'food texas Mexican',
   'original_query': 'Mexican food in texas'},
  'result_count': 170,
  'businesses': [{'business_id': 'MNhRp6GhBiwCgGycyI6Mdw',
    'name': 'Cielito Lindo',
    'alias': 'cielito-lindo-san-francisco',
    'rating': 4.4,
    'review_count': 417,
    'categories': ['Mexican'],
    'price_range': '$$',
    'phone': '(415) 742-0959',
    'website': None,
    'location': {'lat': 37.77592068, 'lon': -122.4959264},
    'images': ['https://s3-media0.fl.yelpcdn.com/bphoto/yQ3Yr0nysUlyhQ_8ANBAJw/348s.jpg']},
   {'business_id': 'Qho1it7RMDXgDVOoaUt8kA',
    'name': 'Zona Rosa Mexican Grill',
    'alias': 'zona-rosa-mexican-grill-san-francisco',
    'rating': 4.4,
    'review_count': 68,
    'categories': ['Mexican'],
    'price_range': None,
    'phone': '',
    'website': None,
    'loca

In [41]:
# Chain Pipeline 1 output to Pipeline 2
print("Sending Pipeline 1 output to Pipeline 2 (Business Details)...\n")

pipeline2_request = {
    "pipeline1_output": pipeline1_output
}

response2 = requests.post(
    f"{BASE_URL}/business_details/run",
    json=pipeline2_request
)

if response2.status_code == 200:
    pipeline2_output = response2.json()
    print("‚úÖ Pipeline 2 succeeded!\n")
    
else:
    print(f"‚ùå Pipeline 2 failed with status {response2.status_code}")
    print(response2.text)  

Sending Pipeline 1 output to Pipeline 2 (Business Details)...

‚úÖ Pipeline 2 succeeded!

‚úÖ Pipeline 2 succeeded!



In [32]:
response2.json()

{'result': {'error': "The following component failed to run:\nComponent name: 'metadata_enricher'\nComponent type: 'DocumentMetadataEnricher'\nError: object of type 'NoneType' has no len()",
  'document_count': 0,
  'documents': []}}

In [14]:
# Extract enriched documents
if 'metadata_enricher' in pipeline2_output:
    documents = pipeline2_output['metadata_enricher']['documents']
    print(f"Created {len(documents)} enriched documents\n")
    
    # Show details for first document
    if documents:
        doc = documents[0]
        print("Sample Document:")
        print(f"  Business Name: {doc.get('meta', {}).get('business_name', 'N/A')}")
        print(f"  Price Range: {doc.get('meta', {}).get('price_range', 'N/A')}")
        print(f"  Rating: {doc.get('meta', {}).get('rating', 'N/A')}")
        print(f"  Coordinates: ({doc.get('meta', {}).get('latitude', 'N/A')}, {doc.get('meta', {}).get('longitude', 'N/A')})")
        print(f"  Website Content Length: {len(doc.get('content', ''))} characters")
else:
    print("No documents found in output")
    print("Available keys:", list(pipeline2_output.keys()))


No documents found in output
Available keys: ['result']


In [15]:
pipeline2_output

{'result': {'document_count': 0,
  'business_count': 0,
  'urls_fetched': [],
  'documents': [],
  'raw_documents': []}}

---

# Pipeline 3: Reviews & Sentiment Analysis

## How to Pass Pipeline 1 Output

Pipeline 3 also accepts the **complete Pipeline 1 output** under the `parser` key (same format as Pipeline 2).

In [42]:
# Chain Pipeline 1 output to Pipeline 3
print("Sending Pipeline 1 output to Pipeline 3 (Reviews & Sentiment)...\n")
print("‚è≥ This may take a while as it fetches and analyzes reviews...\n")

pipeline3_request = {
    "pipeline1_output": pipeline1_output
}

response3 = requests.post(
    f"{BASE_URL}/business_sentiment/run",
    json=pipeline3_request,
    timeout=120  # Longer timeout for sentiment analysis
)

if response3.status_code == 200:
    pipeline3_output = response3.json()
    print("‚úÖ Pipeline 3 succeeded!\n")
    
    # Extract review documents
    if 'reviews_aggregator' in pipeline3_output:
        documents = pipeline3_output['reviews_aggregator']['documents']
        print(f"Analyzed reviews for {len(documents)} businesses\n")
        
        # Show sentiment analysis for first business
        if documents:
            doc = documents[0]
            meta = doc.get('meta', {})
            print(f"Business: {meta.get('business_name', 'N/A')}")
            print(f"\nSentiment Distribution:")
            print(f"  Positive: {meta.get('positive_count', 0)}")
            print(f"  Neutral: {meta.get('neutral_count', 0)}")
            print(f"  Negative: {meta.get('negative_count', 0)}")
            
            print(f"\nHighest-Rated Reviews (with positive sentiment):")
            for i, review in enumerate(meta.get('top_positive_reviews', [])[:2], 1):
                print(f"  {i}. Rating: {review.get('rating', 'N/A')} - {review.get('text', '')[:100]}...")
            
            print(f"\nLowest-Rated Reviews (with negative sentiment):")
            for i, review in enumerate(meta.get('bottom_negative_reviews', [])[:2], 1):
                print(f"  {i}. Rating: {review.get('rating', 'N/A')} - {review.get('text', '')[:100]}...")
    else:
        print("No documents found in output")
        print("Available keys:", list(pipeline3_output.keys()))
else:
    print(f"‚ùå Pipeline 3 failed with status {response3.status_code}")
    print(response3.text)

Sending Pipeline 1 output to Pipeline 3 (Reviews & Sentiment)...

‚è≥ This may take a while as it fetches and analyzes reviews...

‚úÖ Pipeline 3 succeeded!

No documents found in output
Available keys: ['result']
‚úÖ Pipeline 3 succeeded!

No documents found in output
Available keys: ['result']


---

# Complete Workflow: All Pipelines Together

This section demonstrates a complete end-to-end workflow.

In [None]:
def run_complete_workflow(query: str, include_details: bool = True, include_sentiment: bool = True):
    """
    Run the complete pipeline workflow.
    
    Args:
        query: Natural language search query
        include_details: Whether to fetch business details (Pipeline 2)
        include_sentiment: Whether to analyze reviews (Pipeline 3)
    
    Returns:
        Dictionary with results from all pipelines
    """
    print("="*70)
    print(f"RUNNING COMPLETE WORKFLOW")
    print(f"Query: {query}")
    print("="*70 + "\n")
    
    results = {}
    
    # Step 1: Business Search
    print("[1/3] Pipeline 1: Business Search...")
    response1 = requests.post(
        f"{BASE_URL}/business_search/run",
        json={"query": query}
    )
    
    if response1.status_code != 200:
        print(f"‚ùå Pipeline 1 failed: {response1.status_code}")
        return results
    
    pipeline1_output = response1.json()
    results['search'] = pipeline1_output
    
    businesses = pipeline1_output['result'].get('businesses', [])
    print(f"‚úÖ Found {len(businesses)} businesses\n")
    
    # Step 2: Business Details (Optional)
    if include_details:
        print("[2/3] Pipeline 2: Fetching Business Details...")
        response2 = requests.post(
            f"{BASE_URL}/business_details/run",
            json={"pipeline1_output": pipeline1_output}
        )
        
        if response2.status_code == 200:
            results['details'] = response2.json()
            print("‚úÖ Business details fetched\n")
        else:
            print(f"‚ùå Pipeline 2 failed: {response2.status_code}\n")
    
    # Step 3: Review Sentiment Analysis (Optional)
    if include_sentiment:
        print("[3/3] Pipeline 3: Analyzing Reviews & Sentiment...")
        print("‚è≥ This may take 30-60 seconds...")
        response3 = requests.post(
            f"{BASE_URL}/business_sentiment/run",
            json={"pipeline1_output": pipeline1_output},
            timeout=120
        )
        
        if response3.status_code == 200:
            results['sentiment'] = response3.json()
            print("‚úÖ Sentiment analysis completed\n")
        else:
            print(f"‚ùå Pipeline 3 failed: {response3.status_code}\n")
    
    print("\n" + "="*70)
    print("WORKFLOW COMPLETE")
    print("="*70)
    
    return results

## Example: Run Complete Workflow

In [None]:
# Run the complete workflow
workflow_results = run_complete_workflow(
    query="Italian restaurants in San Francisco",
    include_details=True,
    include_sentiment=True
)

## Display Comprehensive Results

In [None]:
# Display results from all pipelines
if workflow_results:
    print("\n" + "#"*70)
    print("COMPREHENSIVE RESULTS SUMMARY")
    print("#"*70 + "\n")
    
    # Search Results
    if 'search' in workflow_results:
        result = workflow_results['search']['result']
        businesses = result.get('businesses', [])
        print(f"üìç SEARCH RESULTS: {result.get('result_count', 0)} total results")
        print(f"   Location: {result.get('extracted_location', 'N/A')}")
        print(f"   Keywords: {result.get('extracted_keywords', [])}")
        print(f"   Showing {len(businesses)} businesses\n")
        
        for i, business in enumerate(businesses[:5], 1):
            print(f"{i}. {business['name']}")
            print(f"   ‚≠ê Rating: {business['rating']} ({business['review_count']} reviews)")
            print(f"   üí∞ Price: {business.get('price_range', 'N/A')}")
            print(f"   üìû Phone: {business.get('phone', 'N/A')}")
            print(f"   üîó Website: {business.get('website', 'N/A')}")
            print()
    
    # Sentiment Analysis
    if 'sentiment' in workflow_results:
        print("\n" + "-"*70)
        print("üòä SENTIMENT ANALYSIS\n")
        
        if 'reviews_aggregator' in workflow_results['sentiment']:
            docs = workflow_results['sentiment']['reviews_aggregator']['documents']
            
            for i, doc in enumerate(docs[:3], 1):
                meta = doc.get('meta', {})
                print(f"{i}. {meta.get('business_name', 'Unknown')}")
                
                # Sentiment counts
                pos = meta.get('positive_count', 0)
                neu = meta.get('neutral_count', 0)
                neg = meta.get('negative_count', 0)
                total = pos + neu + neg
                
                if total > 0:
                    print(f"   Sentiment: {pos} positive, {neu} neutral, {neg} negative")
                    print(f"   Positive ratio: {pos/total*100:.1f}%")
                
                # Show one top review
                top_reviews = meta.get('top_positive_reviews', [])
                if top_reviews:
                    print(f"   üí¨ Top review: {top_reviews[0].get('text', '')[:150]}...")
                print()
    
    print("\n" + "#"*70)
else:
    print("No results available")

---

# Parallel Execution

Pipelines 2 and 3 can run in parallel since they both only depend on Pipeline 1's output.

In [None]:
import concurrent.futures
import time

def run_pipelines_parallel(query: str):
    """
    Run Pipeline 2 and 3 in parallel to save time.
    """
    print(f"Running pipelines in parallel for: '{query}'\n")
    
    # Step 1: Business Search (must run first)
    start_time = time.time()
    
    print("[1] Pipeline 1: Business Search...")
    response1 = requests.post(
        f"{BASE_URL}/business_search/run",
        json={"query": query}
    )
    
    if response1.status_code != 200:
        print(f"‚ùå Pipeline 1 failed")
        return None
    
    pipeline1_output = response1.json()
    businesses = pipeline1_output['result'].get('businesses', [])
    print(f"‚úÖ Found {len(businesses)} businesses\n")
    
    # Step 2 & 3: Run Pipelines 2 and 3 in parallel
    print("[2 & 3] Running Pipelines 2 and 3 in parallel...")
    
    def run_pipeline2():
        response = requests.post(
            f"{BASE_URL}/business_details/run",
            json={"pipeline1_output": pipeline1_output}
        )
        return ("Pipeline 2", response)
    
    def run_pipeline3():
        response = requests.post(
            f"{BASE_URL}/business_sentiment/run",
            json={"pipeline1_output": pipeline1_output},
            timeout=120
        )
        return ("Pipeline 3", response)
    
    # Execute both pipelines concurrently
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        futures = [executor.submit(run_pipeline2), executor.submit(run_pipeline3)]
        
        for future in concurrent.futures.as_completed(futures):
            name, response = future.result()
            if response.status_code == 200:
                print(f"‚úÖ {name} completed")
            else:
                print(f"‚ùå {name} failed: {response.status_code}")
    
    elapsed = time.time() - start_time
    print(f"\n‚è±Ô∏è  Total time: {elapsed:.2f} seconds")
    print("(Running in parallel saved time compared to sequential execution)")

# Test parallel execution
run_pipelines_parallel("coffee shops in Portland, Oregon")

---

# Key Takeaways

## 1. Pipeline Entry Points

- **Pipeline 1**: `{"query": "your search"}`
- **Pipeline 2**: `{"pipeline1_output": {...}}`
- **Pipeline 3**: `{"pipeline1_output": {...}}`

## 2. Pipeline 1 Response Structure

```json
{
  "result": {
    "query": "original query",
    "extracted_location": "location or empty string",
    "extracted_keywords": ["keyword1", "keyword2"],
    "search_params": {
      "location": "search location",
      "query": "formatted query",
      "original_query": "original query"
    },
    "result_count": 170,
    "businesses": [
      {
        "business_id": "...",
        "name": "...",
        "alias": "...",
        "rating": 4.5,
        "review_count": 100,
        "categories": ["Mexican"],
        "price_range": "$$",
        "phone": "...",
        "website": "...",
        "location": {"lat": 37.77, "lon": -122.41},
        "images": ["..."]
      }
    ]
  }
}
```

## 3. Data Flow

```
Natural Language Query
         ‚Üì
   Pipeline 1 (Business Search)
   Input: {"query": "..."}
         ‚Üì
   Complete JSON Output with 'result' key
         ‚Üì
    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îê
    ‚Üì         ‚Üì
Pipeline 2  Pipeline 3
{"pipeline1_output": {...}}
```

## 4. Best Practices

- Always pass the **entire** Pipeline 1 output to downstream pipelines
- Pipeline 2 & 3 expect `pipeline1_output` directly at the top level (not nested under `parser`)
- Pipelines 2 & 3 can run in parallel for efficiency
- Use appropriate timeouts for sentiment analysis (can take 30-60 seconds)
- The entry point for Pipeline 1 is just `{"query": "..."}` - simple and direct

## 5. Common Patterns

```python
# Pattern 1: Sequential execution
p1_output = run_pipeline1({"query": query})
p2_output = run_pipeline2({"pipeline1_output": p1_output})
p3_output = run_pipeline3({"pipeline1_output": p1_output})

# Pattern 2: Parallel execution (faster)
p1_output = run_pipeline1({"query": query})
p2_output, p3_output = run_parallel({"pipeline1_output": p1_output})

# Pattern 3: Conditional execution
p1_output = run_pipeline1({"query": query})
if need_details:
    p2_output = run_pipeline2({"pipeline1_output": p1_output})
if need_sentiment:
    p3_output = run_pipeline3({"pipeline1_output": p1_output})
```

---

# Try Your Own Queries

Use the cells below to experiment with your own queries.

In [None]:
# Customize this cell with your own query
my_query = "vegan restaurants in Los Angeles"

# Run the complete workflow
my_results = run_complete_workflow(
    query=my_query,
    include_details=True,
    include_sentiment=True
)

In [None]:
# Or run just specific pipelines
my_query = "bookstores in Boston"

# Just search
search_only = run_complete_workflow(
    query=my_query,
    include_details=False,
    include_sentiment=False
)