ðŸ”§ **Setup Required**: Before running this notebook, please follow the [setup instructions](../README.md#setup-instructions) to configure your environment and API keys.

# Yelp Navigator - Pipeline Chaining to Summary Reports

This notebook demonstrates **2 workflow patterns** for generating business reports:

1. **Pipeline 1 + 2** (Basic info + website details)
2. **Pipeline 1 + 3** (Basic info + reviews/sentiment)

## Prerequisites

- **Environment variables** in `../.env` (including `RAPID_API_KEY` and `OPENAI_API_KEY`)
- **Hayhooks server running**: `sh start_hayhooks.sh`
- Server at `http://localhost:1416`

## Setup and Imports

In [11]:
import requests
import json
from pprint import pprint
from typing import Dict, Any

# Base URL for Hayhooks server
BASE_URL = "http://localhost:1416"

# Helper function to print JSON nicely
def print_json(data, max_items=3):
    """Print JSON data in a readable format"""
    print(json.dumps(data, indent=2)[:2000])  # Limit output length

## Test Server Connection

In [12]:
# Check if Hayhooks server is running
try:
    response = requests.get(f"{BASE_URL}/status")
    print("Hayhooks server is running!")
    print(f"Status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print("Cannot connect to Hayhooks server")
    print("Please start the server with: hayhooks run --pipelines-dir pipelines")

Hayhooks server is running!
Status: 200


In [13]:
# List available models/pipelines
print("Fetching available models/pipelines...")
try:
    models_response = requests.get(f"{BASE_URL}/v1/models")
    
    if models_response.status_code == 200:
        result = models_response.json()
        print(f"\nAvailable Pipelines (as 'models'):")
        print(f"Total: {len(result['data'])}\n")
        
        for model in result['data']:
            print(f"  - ID: {model['id']}")
            print(f"    Name: {model['name']}")
            print(f"    Owner: {model['owned_by']}")
            print(f"    Created: {model['created']}")
            print()
    else:
        print(f"Error: {models_response.status_code}")
        print(models_response.text)
        
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Fetching available models/pipelines...

Available Pipelines (as 'models'):
Total: 3

  - ID: business_details
    Name: business_details
    Owner: hayhooks
    Created: 1763794752

  - ID: business_sentiment
    Name: business_sentiment
    Owner: hayhooks
    Created: 1763794752

  - ID: business_search
    Name: business_search
    Owner: hayhooks
    Created: 1763794752



---

# Step 1: Run Pipeline 1 (Business Search)

**Purpose**: Get basic business information from Yelp search

This pipeline extracts location and query details, searches Yelp, and returns business data.

In [3]:
# Workflow 1: Search + Basic Report
query = "Italian restaurants in San Francisco and Vancouver"

print("Step 1: Running Pipeline 1 (Business Search)...")
response1 = requests.post(f"{BASE_URL}/business_search/run", json={"query": query})
pipeline1_output = response1.json()

Step 1: Running Pipeline 1 (Business Search)...


In [17]:
print("Locations:", pipeline1_output['result']['extracted_location'])
print("Keyworkds:", pipeline1_output['result']['extracted_keywords'])
print("Businesses Found:", pipeline1_output['result']['result_count'])
print("Sample Business:", pipeline1_output['result']['businesses'][0])

Locations: ['San Francisco', 'Vancouver']
Keyworkds: [['restaurants', 'Italian'], ['restaurants', 'Italian']]
Businesses Found: 20
Sample Business: {'business_id': 'msT3LrLB4fhN04HYHuFsew', 'name': 'Bella Trattoria', 'alias': 'bella-trattoria-san-francisco', 'rating': 4.3, 'review_count': 2064, 'categories': ['Italian', 'Bars', 'Pasta Shops'], 'price_range': '$$', 'phone': '(415) 221-0305', 'website': 'http://Www.bellatrattoriasf.com', 'location': {'lat': 37.78136, 'lon': -122.46092108}, 'images': ['https://s3-media0.fl.yelpcdn.com/bphoto/lDIjLb1APIFuSSltL1L4vw/348s.jpg']}


---

# Workflow 1: Pipeline 1 + 2 

**Use case**: Business overview + website details & offerings

**What it does**: Takes Pipeline 1 output and enriches it with website information

**Speed**: Medium (~10-15 seconds)

In [5]:

print("Step 2: Running Pipeline 2 (Website Details)...")
response2 = requests.post(
        f"{BASE_URL}/business_details/run",
        json={"pipeline1_output": pipeline1_output}
    )
pipeline2_output = response2.json()


Step 2: Running Pipeline 2 (Website Details)...


In [26]:
print("Websites found for ", pipeline2_output['result']['document_count'], "businesses.")
print("Sample Business Details:", pipeline2_output['result']['documents'][1]['content'])

Websites found for  17 businesses.
Sample Business Details: top of page1/12"The Best Damn Cioppino in San Francisco!"Located in the heart of North Beach, San Francisco, Sotto Mare Restaurant provides a delicious and authentic Italian North Beach experience. We are proud to serve the freshest fish and shellfish in town; Oysters and clams on the half shell, Boston style Clam Chowder, Baccala, Crab Cioppino, Louis salads, Seafood Pastas and Seafood Risotto are just a few of the items we offer. For those who like to cook at home, we also offer a selection of our fresh fish daily. If you are looking for the best Italian seafood in San Francisco, you found us!* * *bottom of page


---

# Workflow 2: Pipeline 1 + 3 

**Use case**: Business overview + customer reviews & sentiment

**What it does**: Takes Pipeline 1 output and enriches it with reviews and sentiment analysis

**Speed**: Slower (~30-60 seconds)

In [7]:
# Workflow 3: Search + Reviews/Sentiment + Report
print("Step 3: Running Pipeline 3 (Business Sentiment)...")
response3 = requests.post(
        f"{BASE_URL}/business_sentiment/run",
        json={"pipeline1_output": pipeline1_output},
        timeout=120
    )
pipeline3_output = response3.json()


Step 3: Running Pipeline 3 (Business Sentiment)...


In [49]:
print("Sentiment analysis completed for ", pipeline3_output['result']['business_count'], "businesses.")
print("Business IDs found:", pipeline3_output['result']['business_ids_processed'])
print("Sample Sentiment Analysis:", pipeline3_output['result']['businesses'][0]['business_id'])
print("Total reviews:", pipeline3_output['result']['businesses'][0]['total_reviews'])
print("Sentiment distribution reviews:", pipeline3_output['result']['businesses'][0]['sentiment_distribution'])
print("Overall sentiment:", pipeline3_output['result']['businesses'][0]['overall_sentiment'])
print("Sample Positive Reviews:", pipeline3_output['result']['businesses'][0]['highest_rated_reviews'][0]['text'])


Sentiment analysis completed for  20 businesses.
Business IDs found: ['msT3LrLB4fhN04HYHuFsew', '8dUaybEPHsZMgr1iKgqgMQ', 'B09WOy0W83Od-Xw4xEXxog', 'QueFVMcMlT-6aZFv2M47mg', 'oK6DQM2ztdNwMAyJqGWZpg', 'FvPRM23d7NAOdyS_OX0MpQ', 'K9XVDlPNhrrSVEJN7uWqJQ', 'o43B4DnnQbvkdDK6AVafQg', 'zREcSPXvizDj-79lSw1MSQ', '6e7gMgVJRQ5rtkIbTF6TLQ', 'htvl5L_V-zKN0UvUgP60PQ', 'V-kQkPZzlOXIqk3rSqkikg', 'DkKH_q0_vdWSWZwFhPFLZw', 'g-WSOodoG-L81DFYWM0FMQ', 'wHTXCWVsx2PYPT21JV-HeQ', 'k7-oClu97qColM6BcCaQMw', 'xWooPFPiXv8PZ2F6IBKjnA', 'GepW_CkeEoPbg-0fLb5FGw', 'bgQRJLO8QFYDpCfkeOhjVA', 'gNCSbTm3unsdztncbgSv5A']
Sample Sentiment Analysis: msT3LrLB4fhN04HYHuFsew
Total reviews: 10
Sentiment distribution reviews: {'positive': 8, 'neutral': 1, 'negative': 1}
Overall sentiment: positive
Sample Positive Reviews: Around 2:30pm, on a Thursday near the end of lunch, we stopped in at Bella Trattoria, an intimate restaurant and bar with darkwood subdued decor. We had strong espresso and coffee, and $8 panna cotta, a dense cus