# Yelp Navigator - Pipeline Chaining Guide

This notebook demonstrates how to:
1. Form queries for the `business_search/run` endpoint
2. Pass outputs from Pipeline 1 to other pipelines
3. Chain all pipelines together for a complete workflow

## Prerequisites

- cd `ch8/yelp-navigator/`
- Hayhooks server running: `uv run hayhooks run --pipelines-dir pipelines`
- Server should be accessible at `http://localhost:1416`

## Setup and Imports

In [1]:
import requests
import json
from pprint import pprint
from typing import Dict, Any

# Base URL for Hayhooks server
BASE_URL = "http://localhost:1416"

# Helper function to print JSON nicely
def print_json(data, max_items=3):
    """Print JSON data in a readable format"""
    print(json.dumps(data, indent=2)[:2000])  # Limit output length

## Test Server Connection

In [2]:
# Check if Hayhooks server is running
try:
    response = requests.get(f"{BASE_URL}/status")
    print("✅ Hayhooks server is running!")
    print(f"Status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("❌ Cannot connect to Hayhooks server")
    print("Please start the server with: hayhooks run --pipelines-dir pipelines")

✅ Hayhooks server is running!
Status: 200


---

# Pipeline 1: Business Search

## Understanding the Entry Point

The `business_search/run` endpoint accepts a natural language query and returns business results.

**Request Structure**:
```json
{
  "query_converter": {
    "query": "your natural language search here"
  }
}
```

The key `query_converter` matches the first component in the pipeline (QueryToDocument component).

## Example 1: Simple Search Query

In [28]:
# Form a query for business_search/run
query = {
  "query": "Mexican food in texas"
}

print("Sending query to Pipeline 1 (Business Search)...")
print(f"Query: {query['query']}")
print("\nRequest payload:")
print_json(query)

Sending query to Pipeline 1 (Business Search)...
Query: Mexican food in texas

Request payload:
{
  "query": "Mexican food in texas"
}


In [29]:
# Execute the search
# Form a query for business_search/run
query = {
  "query": "Mexican food in texas"
}


response1 = requests.post(
    f"{BASE_URL}/business_search/run",
    json=query
)

if response1.status_code == 200:
    pipeline1_output = response1.json()
    print("✅ Pipeline 1 succeeded!\n")
    
    # Extract key information
    results = pipeline1_output['result']
    businesses = results.get('businesses', [])
    
    print(f"Query: {results.get('query', 'N/A')}")
    print(f"Extracted Location: {results.get('extracted_location', 'None')}")
    print(f"Extracted Keywords: {results.get('extracted_keywords', [])}")
    print(f"Search Parameters: {results.get('search_params', {})}")
    print(f"\nFound {results.get('result_count', 0)} total results")
    print(f"Returned {len(businesses)} businesses on this page\n")
    
    # Show first 3 businesses
    for i, business in enumerate(businesses[:3], 1):
        print(f"{i}. {business['name']}")
        print(f"   ID: {business['business_id']}")
        print(f"   Alias: {business['alias']}")
        print(f"   Rating: {business['rating']} ({business['review_count']} reviews)")
        print(f"   Price: {business.get('price_range', 'N/A')}")
        print(f"   Categories: {', '.join(business['categories'])}")
        print(f"   Website: {business.get('website', 'N/A')}")
        print(f"   Location: ({business['location']['lat']}, {business['location']['lon']})")
        print()
else:
    print(f"❌ Pipeline 1 failed with status {response1.status_code}")
    print(response1.text)

✅ Pipeline 1 succeeded!

Query: Mexican food in texas
Extracted Location: 
Extracted Keywords: ['food', 'Mexican', 'texas']
Search Parameters: {'location': 'United States', 'query': 'food Mexican texas', 'original_query': 'Mexican food in texas'}

Found 0 total results
Returned 0 businesses on this page



## Example 2: Different Query Styles

In [6]:
# Try different query styles - the NER component will extract entities
test_queries = [
    "best Mexican restaurants in Austin, Texas",
    "sushi places near Seattle",
    "pizza in Chicago",
    "coffee shops in Portland, Oregon"
]

print("Testing different query formats:\n")
for query_text in test_queries:
    response = requests.post(
        f"{BASE_URL}/business_search/run",
        json= {"query": query_text}
    )
    
    if response.status_code == 200:
        data = response.json()
        result = data['result']
        count = result.get('result_count', 0)
        location = result.get('extracted_location', 'N/A')
        keywords = result.get('extracted_keywords', [])
        print(f"✅ '{query_text}'")
        print(f"   Found: {count} results")
        print(f"   Location: {location}, Keywords: {keywords}\n")
    else:
        print(f"❌ '{query_text}' failed\n")

Testing different query formats:

✅ 'best Mexican restaurants in Austin, Texas'
   Found: 240 results
   Location: Austin, Keywords: ['restaurants', 'Mexican', 'Texas', 'Austin,']

✅ 'sushi places near Seattle'
   Found: 240 results
   Location: Seattle, Keywords: ['sushi', 'places', 'near', 'Seattle']

✅ 'pizza in Chicago'
   Found: 240 results
   Location: Chicago, Keywords: ['Chicago', 'pizza']

✅ 'coffee shops in Portland, Oregon'
   Found: 53 results
   Location: Portland, Keywords: ['Portland,', 'coffee', 'Oregon', 'shops']



---

# Understanding Pipeline 1 Output Structure

Pipeline 1 returns a nested structure that will be passed to downstream pipelines.

In [7]:
# Inspect the complete output structure
print("Complete Pipeline 1 Output Structure:\n")
print("Top-level keys:", list(pipeline1_output.keys()))
print("\nresult keys:", list(pipeline1_output['result'].keys()))
print("\nSample business keys:", list(businesses[0].keys()) if businesses else "No businesses")

print("\n" + "="*60)
print("IMPORTANT: This entire structure will be passed to Pipelines 2 & 3")
print("="*60)

Complete Pipeline 1 Output Structure:

Top-level keys: ['result']

result keys: ['query', 'extracted_location', 'extracted_keywords', 'search_params', 'result_count', 'businesses']

Sample business keys: ['business_id', 'name', 'alias', 'rating', 'review_count', 'categories', 'price_range', 'phone', 'website', 'location', 'images']

IMPORTANT: This entire structure will be passed to Pipelines 2 & 3


---

# Pipeline 2: Business Details

## How to Pass Pipeline 1 Output

Pipeline 2 accepts the **complete Pipeline 1 output** directly as `pipeline1_output`.

**Request Structure**:
```json
{
  "pipeline1_output": {
    "result": { ... entire Pipeline 1 output ... }
  }
}
```

In [8]:
pipeline1_output

{'result': {'query': 'Mexican food in texas',
  'extracted_location': '',
  'extracted_keywords': ['food', 'Mexican', 'texas'],
  'search_params': {'location': 'United States',
   'query': 'food Mexican texas',
   'original_query': 'Mexican food in texas'},
  'result_count': 240,
  'businesses': [{'business_id': 'xdx57Qj1FJi0MK8kwzbeuA',
    'name': 'La Calle Tacos',
    'alias': 'la-calle-tacos-houston-16',
    'rating': 4.1,
    'review_count': 1236,
    'categories': ['Tacos', 'Latin American'],
    'price_range': '$$',
    'phone': '(832) 735-8226',
    'website': 'http://lacalletacos.com',
    'location': {'lat': 29.76355831, 'lon': -95.360686},
    'images': ['https://s3-media0.fl.yelpcdn.com/bphoto/1B16bDmixkNYMRoVhTVI2A/348s.jpg']},
   {'business_id': 'oYTWM_5ziYkSUtEUpYzITQ',
    'name': 'Guadalajara Del Centro',
    'alias': 'guadalajara-del-centro-houston',
    'rating': 3.7,
    'review_count': 515,
    'categories': ['Mexican', 'Tex-Mex'],
    'price_range': '$$',
    'pho

In [9]:
# Chain Pipeline 1 output to Pipeline 2
print("Sending Pipeline 1 output to Pipeline 2 (Business Details)...\n")

pipeline2_request = {
    "pipeline1_output": pipeline1_output
}

response2 = requests.post(
    f"{BASE_URL}/business_details/run",
    json=pipeline2_request
)

if response2.status_code == 200:
    pipeline2_output = response2.json()
    print("✅ Pipeline 2 succeeded!\n")
    
else:
    print(f"❌ Pipeline 2 failed with status {response2.status_code}")
    print(response2.text)  

Sending Pipeline 1 output to Pipeline 2 (Business Details)...

✅ Pipeline 2 succeeded!



In [10]:
response2.json()

{'result': {'document_count': 10,
  'business_count': 10,
  'urls_fetched': ['http://lacalletacos.com',
   'https://guad.com/locations/del-centro/',
   'http://pueblasmexicankitchen.com',
   'http://www.fondasantarosa.com',
   'https://www.mexicanrestaurant-houston.com',
   'http://www.xochihouston.com',
   'https://www.eltiempocantina.com',
   'http://www.teomexicancafe.com/',
   'http://www.polanquitohtx.com',
   'https://cielitocafehouston.wixsite.com/cielitocafe'],
  'documents': [{'content_length': 994,
    'content_preview': 'La Calle TacosCome fall in love with the authentic flavors of the streets of Mexico City and transport your taste buds with the recipes brought from our favorite taco stands.All The Flavor, All The TimeBreakfast¡Buenas Dias! Kick-start your day, any day, with our wide selection of breakfast items featuring tacos, chilaquiles & tortas.All Day MenuFeed me tortas and tell me I’m pretty. Join us today for a little mid-day fiesta.CantinaLet’s taco ’bout dinner an

In [11]:
# Extract enriched documents
if 'metadata_enricher' in pipeline2_output:
    documents = pipeline2_output['metadata_enricher']['documents']
    print(f"Created {len(documents)} enriched documents\n")
    
    # Show details for first document
    if documents:
        doc = documents[0]
        print("Sample Document:")
        print(f"  Business Name: {doc.get('meta', {}).get('business_name', 'N/A')}")
        print(f"  Price Range: {doc.get('meta', {}).get('price_range', 'N/A')}")
        print(f"  Rating: {doc.get('meta', {}).get('rating', 'N/A')}")
        print(f"  Coordinates: ({doc.get('meta', {}).get('latitude', 'N/A')}, {doc.get('meta', {}).get('longitude', 'N/A')})")
        print(f"  Website Content Length: {len(doc.get('content', ''))} characters")
else:
    print("No documents found in output")
    print("Available keys:", list(pipeline2_output.keys()))


No documents found in output
Available keys: ['result']


In [12]:
pipeline2_output

{'result': {'document_count': 10,
  'business_count': 10,
  'urls_fetched': ['http://lacalletacos.com',
   'https://guad.com/locations/del-centro/',
   'http://pueblasmexicankitchen.com',
   'http://www.fondasantarosa.com',
   'https://www.mexicanrestaurant-houston.com',
   'http://www.xochihouston.com',
   'https://www.eltiempocantina.com',
   'http://www.teomexicancafe.com/',
   'http://www.polanquitohtx.com',
   'https://cielitocafehouston.wixsite.com/cielitocafe'],
  'documents': [{'content_length': 994,
    'content_preview': 'La Calle TacosCome fall in love with the authentic flavors of the streets of Mexico City and transport your taste buds with the recipes brought from our favorite taco stands.All The Flavor, All The TimeBreakfast¡Buenas Dias! Kick-start your day, any day, with our wide selection of breakfast items featuring tacos, chilaquiles & tortas.All Day MenuFeed me tortas and tell me I’m pretty. Join us today for a little mid-day fiesta.CantinaLet’s taco ’bout dinner an

---

# Pipeline 3: Reviews & Sentiment Analysis

## How to Pass Pipeline 1 Output

Pipeline 3 also accepts the **complete Pipeline 1 output** under the `parser` key (same format as Pipeline 2).

In [22]:
# Chain Pipeline 1 output to Pipeline 3
print("Sending Pipeline 1 output to Pipeline 3 (Reviews & Sentiment)...\n")
print("⏳ This may take a while as it fetches and analyzes reviews...\n")

pipeline3_request = {
    "pipeline1_output": pipeline1_output
}

response3 = requests.post(
    f"{BASE_URL}/business_sentiment/run",
    json=pipeline3_request,
    timeout=120  # Longer timeout for sentiment analysis
)

if response3.status_code == 200:
    pipeline3_output = response3.json()
    print("✅ Pipeline 3 succeeded!\n")

Sending Pipeline 1 output to Pipeline 3 (Reviews & Sentiment)...

⏳ This may take a while as it fetches and analyzes reviews...

✅ Pipeline 3 succeeded!

✅ Pipeline 3 succeeded!



In [21]:
response3.json()

{'result': {'business_count': 10,
  'business_ids_processed': ['xdx57Qj1FJi0MK8kwzbeuA',
   'oYTWM_5ziYkSUtEUpYzITQ',
   'H94geLCllp0NTbwu5bW1zA',
   'qJ5N_qoTbSfxa60OE5RIBQ',
   'ASfaHySo2Ns2qrbOK9NGVA',
   'qwwRCIbAaaxIbCwWovXV6A',
   'ks4ViWK1KBcUUh8j7cKkqw',
   'EFGk8gTkaH-UjC-8wYXaew',
   'NvvtXXHaURTxlQDLueNmdw',
   'vck7wPLGT79YRiFbLJyRnQ'],
  'total_reviews_analyzed': 100,
  'businesses': [{'business_id': 'xdx57Qj1FJi0MK8kwzbeuA',
    'total_reviews': 10,
    'sentiment_distribution': {'positive': 10, 'neutral': 0, 'negative': 0},
    'sentiment_percentages': {'positive': 100.0,
     'neutral': 0.0,
     'negative': 0.0},
    'overall_sentiment': 'positive',
    'highest_rated_reviews': [{'rating': 5,
      'sentiment': 'positive',
      'text': 'Extremely kind and such good tacos. We walked in and were immediately greeted by the employees. They seemed to be decorating and still managed to make us feel welcome and cared for',
      'user': 'Sharon P.',
      'url': 'https://www

---

# Complete Workflow: All Pipelines Together

This section demonstrates a complete end-to-end workflow.

In [23]:
def run_complete_workflow(query: str, include_details: bool = True, include_sentiment: bool = True):
    """
    Run the complete pipeline workflow.
    
    Args:
        query: Natural language search query
        include_details: Whether to fetch business details (Pipeline 2)
        include_sentiment: Whether to analyze reviews (Pipeline 3)
    
    Returns:
        Dictionary with results from all pipelines
    """
    print("="*70)
    print(f"RUNNING COMPLETE WORKFLOW")
    print(f"Query: {query}")
    print("="*70 + "\n")
    
    results = {}
    
    # Step 1: Business Search
    print("[1/3] Pipeline 1: Business Search...")
    response1 = requests.post(
        f"{BASE_URL}/business_search/run",
        json={"query": query}
    )
    
    if response1.status_code != 200:
        print(f"❌ Pipeline 1 failed: {response1.status_code}")
        return results
    
    pipeline1_output = response1.json()
    results['search'] = pipeline1_output
    
    businesses = pipeline1_output['result'].get('businesses', [])
    print(f"✅ Found {len(businesses)} businesses\n")
    
    # Step 2: Business Details (Optional)
    if include_details:
        print("[2/3] Pipeline 2: Fetching Business Details...")
        response2 = requests.post(
            f"{BASE_URL}/business_details/run",
            json={"pipeline1_output": pipeline1_output}
        )
        
        if response2.status_code == 200:
            results['details'] = response2.json()
            print("✅ Business details fetched\n")
        else:
            print(f"❌ Pipeline 2 failed: {response2.status_code}\n")
    
    # Step 3: Review Sentiment Analysis (Optional)
    if include_sentiment:
        print("[3/3] Pipeline 3: Analyzing Reviews & Sentiment...")
        print("⏳ This may take 30-60 seconds...")
        response3 = requests.post(
            f"{BASE_URL}/business_sentiment/run",
            json={"pipeline1_output": pipeline1_output},
            timeout=120
        )
        
        if response3.status_code == 200:
            results['sentiment'] = response3.json()
            print("✅ Sentiment analysis completed\n")
        else:
            print(f"❌ Pipeline 3 failed: {response3.status_code}\n")
    
    print("\n" + "="*70)
    print("WORKFLOW COMPLETE")
    print("="*70)
    
    return results

## Example: Run Complete Workflow

In [24]:
# Run the complete workflow
workflow_results = run_complete_workflow(
    query="Italian restaurants in San Francisco",
    include_details=True,
    include_sentiment=True
)

RUNNING COMPLETE WORKFLOW
Query: Italian restaurants in San Francisco

[1/3] Pipeline 1: Business Search...
✅ Found 10 businesses

[2/3] Pipeline 2: Fetching Business Details...
✅ Found 10 businesses

[2/3] Pipeline 2: Fetching Business Details...
✅ Business details fetched

[3/3] Pipeline 3: Analyzing Reviews & Sentiment...
⏳ This may take 30-60 seconds...
✅ Business details fetched

[3/3] Pipeline 3: Analyzing Reviews & Sentiment...
⏳ This may take 30-60 seconds...
✅ Sentiment analysis completed


WORKFLOW COMPLETE
✅ Sentiment analysis completed


WORKFLOW COMPLETE


## Display Comprehensive Results

In [25]:
# Display results from all pipelines
if workflow_results:
    print("\n" + "#"*70)
    print("COMPREHENSIVE RESULTS SUMMARY")
    print("#"*70 + "\n")
    
    # Search Results
    if 'search' in workflow_results:
        result = workflow_results['search']['result']
        businesses = result.get('businesses', [])
        print(f"📍 SEARCH RESULTS: {result.get('result_count', 0)} total results")
        print(f"   Location: {result.get('extracted_location', 'N/A')}")
        print(f"   Keywords: {result.get('extracted_keywords', [])}")
        print(f"   Showing {len(businesses)} businesses\n")
        
        for i, business in enumerate(businesses[:5], 1):
            print(f"{i}. {business['name']}")
            print(f"   ⭐ Rating: {business['rating']} ({business['review_count']} reviews)")
            print(f"   💰 Price: {business.get('price_range', 'N/A')}")
            print(f"   📞 Phone: {business.get('phone', 'N/A')}")
            print(f"   🔗 Website: {business.get('website', 'N/A')}")
            print()
    
    # Sentiment Analysis
    if 'sentiment' in workflow_results:
        print("\n" + "-"*70)
        print("😊 SENTIMENT ANALYSIS\n")
        
        sentiment_data = workflow_results['sentiment']
        businesses_sentiment = sentiment_data.get('businesses', [])
        
        print(f"📊 Total: {sentiment_data.get('business_count', 0)} businesses, {sentiment_data.get('total_reviews_analyzed', 0)} reviews analyzed\n")
        
        for i, business in enumerate(businesses_sentiment[:3], 1):
            print(f"{i}. Business ID: {business['business_id']}")
            
            # Sentiment percentages
            sentiment_pct = business.get('sentiment_percentages', {})
            overall = business.get('overall_sentiment', 'unknown')
            print(f"   Overall Sentiment: {overall.upper()}")
            print(f"   Sentiment: {sentiment_pct.get('positive', 0):.1f}% positive, "
                  f"{sentiment_pct.get('neutral', 0):.1f}% neutral, "
                  f"{sentiment_pct.get('negative', 0):.1f}% negative")
            
            # Show one top review
            top_reviews = business.get('highest_rated_reviews', [])
            if top_reviews:
                print(f"   💬 Top review: {top_reviews[0].get('text', '')[:150]}...")
            print()
    
    print("\n" + "#"*70)
else:
    print("No results available")


######################################################################
COMPREHENSIVE RESULTS SUMMARY
######################################################################

📍 SEARCH RESULTS: 240 total results
   Location: San Francisco
   Keywords: ['restaurants', 'Italian', 'Francisco', 'San']
   Showing 10 businesses

1. Doppio Zero San Francisco
   ⭐ Rating: 4.0 (750 reviews)
   💰 Price: $$
   📞 Phone: (415) 624-3634
   🔗 Website: https://dzpizzeria.com/doppiozerosanfrancisco

2. Italian-American Social Club of San Francisco
   ⭐ Rating: 4.4 (49 reviews)
   💰 Price: $$
   📞 Phone: (415) 585-8059
   🔗 Website: http://www.iascsf.net

3. Trattoria da Vittorio - San Francisco
   ⭐ Rating: 4.3 (1099 reviews)
   💰 Price: $$
   📞 Phone: (415) 742-0300
   🔗 Website: http://trattoriadavittorio.com

4. San Francisco Pizza
   ⭐ Rating: 2.7 (180 reviews)
   💰 Price: $$
   📞 Phone: (510) 412-4400
   🔗 Website: http://www.sanfranciscopizzaria.com

5. Bottega
   ⭐ Rating: 4.3 (1568 reviews)
   💰 

---

# Key Takeaways

## 1. Pipeline Entry Points

- **Pipeline 1**: `{"query": "your search"}`
- **Pipeline 2**: `{"pipeline1_output": {...}}`
- **Pipeline 3**: `{"pipeline1_output": {...}}`

## 2. Pipeline 1 Response Structure

```json
{
  "result": {
    "query": "original query",
    "extracted_location": "location or empty string",
    "extracted_keywords": ["keyword1", "keyword2"],
    "search_params": {
      "location": "search location",
      "query": "formatted query",
      "original_query": "original query"
    },
    "result_count": 170,
    "businesses": [
      {
        "business_id": "...",
        "name": "...",
        "alias": "...",
        "rating": 4.5,
        "review_count": 100,
        "categories": ["Mexican"],
        "price_range": "$$",
        "phone": "...",
        "website": "...",
        "location": {"lat": 37.77, "lon": -122.41},
        "images": ["..."]
      }
    ]
  }
}
```

## 3. Data Flow

```
Natural Language Query
         ↓
   Pipeline 1 (Business Search)
   Input: {"query": "..."}
         ↓
   Complete JSON Output with 'result' key
         ↓
    ┌────┴────┐
    ↓         ↓
Pipeline 2  Pipeline 3
{"pipeline1_output": {...}}
```

## 4. Best Practices

- Always pass the **entire** Pipeline 1 output to downstream pipelines
- Pipeline 2 & 3 expect `pipeline1_output` directly at the top level (not nested under `parser`)
- Pipelines 2 & 3 can run in parallel for efficiency
- Use appropriate timeouts for sentiment analysis (can take 30-60 seconds)
- The entry point for Pipeline 1 is just `{"query": "..."}` - simple and direct

## 5. Common Patterns

```python
# Pattern 1: Sequential execution
p1_output = run_pipeline1({"query": query})
p2_output = run_pipeline2({"pipeline1_output": p1_output})
p3_output = run_pipeline3({"pipeline1_output": p1_output})

# Pattern 2: Parallel execution (faster)
p1_output = run_pipeline1({"query": query})
p2_output, p3_output = run_parallel({"pipeline1_output": p1_output})

# Pattern 3: Conditional execution
p1_output = run_pipeline1({"query": query})
if need_details:
    p2_output = run_pipeline2({"pipeline1_output": p1_output})
if need_sentiment:
    p3_output = run_pipeline3({"pipeline1_output": p1_output})
```

---

# Try Your Own Queries

Use the cells below to experiment with your own queries.

In [None]:
# Customize this cell with your own query
my_query = "vegan restaurants in Los Angeles"

# Run the complete workflow
my_results = run_complete_workflow(
    query=my_query,
    include_details=True,
    include_sentiment=True
)

In [None]:
# Or run just specific pipelines
my_query = "bookstores in Boston"

# Just search
search_only = run_complete_workflow(
    query=my_query,
    include_details=False,
    include_sentiment=False
)