# LangChain-ZenRows Integration Examples

This notebook demonstrates how to use the `langchain-zenrows` package for web scraping with LangChain.

## Prerequisites

1. Install the package: `pip install langchain-zenrows`
2. Get your ZenRows API key from [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)
3. For LangChain agents, you'll also need an LLM API key (OpenAI, Anthropic, etc.)

## Setup and Configuration

First, let's set up our API keys and import the necessary libraries.

In [None]:
import os
import json
import base64
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

# For LangChain agents, also set your LLM API key
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>"

print("✅ Environment variables set!")
print(f"ZenRows API Key: {'Set' if os.environ.get('ZENROWS_API_KEY') else 'Not set'}")
print(f"OpenAI API Key: {'Set' if os.environ.get('OPENAI_API_KEY') else 'Not set'}")

## Basic Web Scraping

Let's start with simple HTML extraction and explore different output formats.

In [None]:
# Initialize the ZenRows scraper
scraper = ZenRowsUniversalScraper()

# Basic HTML scraping
result = scraper.invoke({"url": "https://httpbin.io/html"})

print(f"Content length: {len(result)} characters")
print(f"First 300 characters:")
print(result[:300] + "...")

In [None]:
# Get content in clean markdown format
result = scraper.invoke({
    "url": "https://www.example.com", 
    "response_type": "markdown"
})

print("Content in Markdown format:")
print(result)

## JavaScript Rendering and Single Page Applications

Modern websites often require JavaScript rendering to display dynamic content.

In [None]:
# Scrape JavaScript-rendered content with advanced parameters
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/javascript-rendering",
    "js_render": True,
    "wait": 3000,  # Wait 3 seconds for content to load
    "wait_for": ".product-name",  # Wait for specific element
    "response_type": "markdown",
    "premium_proxy": True,
    "proxy_country": "us"
})

print("SPA content (rendered with JavaScript):")
print(result[:800] + "...")

## CSS Data Extraction

Extract specific data using CSS selectors to get structured information.

In [None]:
# Extract specific data using CSS selectors
css_selector = json.dumps({
    "title": ".site-title",
    "product_names": ".product-name",
    "prices": ".product-price"
})

result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "css_extractor": css_selector
})

print("Extracted product data:")
print(result)

## Structured Data Extraction

Automatically extract structured data like links, headings, tables, and more.

In [None]:
# Extract multiple data types automatically
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "outputs": "links,headings"
})

print("Extracted links and headings:")
print(result[:500] + "...")

In [None]:
# Extract all tables from a webpage
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/table-parsing",
    "outputs": "tables"
})

print("Extracted tables:")
print(result)

## Screenshots and Visual Capture

Capture screenshots of entire pages or specific elements.

In [None]:
# Capture full-page screenshot
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "screenshot": "true",
    "screenshot_fullpage": "true"
})

# Save screenshot to file
try:
    with open("full_page_screenshot.png", "wb") as f:
        if isinstance(result, bytes):
            f.write(result)
        else:
            f.write(base64.b64decode(result))
    print("✅ Full-page screenshot saved as 'full_page_screenshot.png'")
except Exception as e:
    print(f"Note: Screenshot feature requires actual API key. Error: {e}")

In [None]:
# Screenshot a specific element
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "screenshot_selector": "#product-list",
    "screenshot_format": "jpeg",
    "screenshot_quality": 85
})

# Save element screenshot
try:
    with open("products_grid_screenshot.jpg", "wb") as f:
        if isinstance(result, bytes):
            f.write(result)
        else:
            f.write(base64.b64decode(result))
    print("✅ Products grid screenshot saved as 'products_grid_screenshot.jpg'")
except Exception as e:
    print(f"Note: Screenshot feature requires actual API key. Error: {e}")

## Premium Proxies and Geo-targeting

Access geo-restricted content using premium proxies from different countries.

In [None]:
# Check IP location with premium proxy from US
result_us = scraper.invoke({
    "url": "https://httpbin.io/ip",
    "premium_proxy": True,
    "proxy_country": "us"
})

print("Request from US IP:")
print(result_us)

In [None]:
# Compare with different country (UK)
result_uk = scraper.invoke({
    "url": "https://httpbin.io/ip",
    "premium_proxy": True,
    "proxy_country": "gb"  # Great Britain
})

print("Request from UK IP:")
print(result_uk)

## Custom JavaScript Execution

Execute custom JavaScript to interact with page elements, fill forms, or click buttons.

In [None]:
# Execute custom JavaScript to interact with elements
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/login",
    "js_instructions": """[{"fill":["#email","admin@example.com"]},
                        {"fill":["#password","password"]},
                        {"click":"#submit-button"},
                        {"wait":500}]"""
})

print("Data extracted after JavaScript interactions:")
print(result[:300] + "...")

## Session Management

Maintain consistent sessions across multiple requests for multi-step processes.

In [None]:
session_id = 12345  # Use same session for related requests

# Step 1: Login or initial page
result1 = scraper.invoke({
    "url": "https://www.scrapingcourse.com/login",
    "premium_proxy": True,
    "session_id": session_id,
    "js_instructions": """[{"fill":["#email","admin@example.com"]},
                        {"fill":["#password","password"]},
                        {"click":"#submit-button"},
                        {"wait":500}]"""
})

print("First request (login page):")
print(result1[:200] + "...")

In [None]:
# Step 2: Access protected content with same session
result2 = scraper.invoke({
    "url": "https://www.scrapingcourse.com/dashboard",
    "premium_proxy": True,
    "session_id": session_id
})

print("Second request (dashboard with same session):")
print(result2[:200] + "...")

## JSON API Capture

Capture JSON API calls made by web pages to access underlying data.

In [None]:
# Capture network requests and API calls
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/javascript-rendering",
    "json_response": True,
    "wait": 3000  # Wait for API calls to complete
})

print("Captured JSON API responses:")
print(result[:500] + "...")

## Custom Headers

Use custom headers to mimic specific browser behavior or bypass certain restrictions.

In [None]:
# Scrape with custom headers
result = scraper.invoke({
    "url": "https://httpbin.io/headers",
    "js_render": True,
    "custom_headers": {"Referer": "https://google.com"}
})

print("Response with custom headers:")
print(result)

## Using with LangChain Agents

This is where the real power comes in - using ZenRows with LangChain agents for intelligent web scraping.

**Note:** Make sure you have the required dependencies installed:
```bash
pip install langchain-openai langgraph
```

In [None]:
try:
    from langchain_openai import ChatOpenAI
    from langgraph.prebuilt import create_react_agent
    
    # Initialize components
    llm = ChatOpenAI(model="gpt-4o-mini")
    zenrows_tool = ZenRowsUniversalScraper()
    
    # Create agent
    agent = create_react_agent(llm, [zenrows_tool])
    
    print("✅ Agent created successfully!")
    
except ImportError as e:
    print(f"❌ Missing dependencies: {e}")
    print("Please install: pip install langchain-openai langgraph")
except Exception as e:
    print(f"❌ Error creating agent: {e}")

In [None]:
# Use the agent to scrape and analyze Hacker News
try:
    result = agent.invoke({
        "messages": "Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time."
    })
    
    print("Agent Response:")
    for message in result["messages"]:
        print(f"{message.content}")
        print("-" * 50)
        
except NameError:
    print("⚠️  Agent not available - please run the previous cell successfully first")
except Exception as e:
    print(f"❌ Error running agent: {e}")

In [None]:
# Advanced agent example: News summarizer
try:
    result = agent.invoke({
        "messages": "Go to TechCrunch.com, scrape the homepage in markdown format, and provide a summary of the top 5 technology stories with their headlines and brief descriptions."
    })
    
    print("Tech News Summary:")
    for message in result["messages"]:
        print(f"{message.content}")
        print("-" * 50)
        
except NameError:
    print("⚠️  Agent not available - please run the agent creation cell first")
except Exception as e:
    print(f"❌ Error running agent: {e}")

## Error Handling and Best Practices

The tool provides comprehensive error handling for various scenarios.

In [None]:
# Test error handling with invalid API key
try:
    invalid_scraper = ZenRowsUniversalScraper(zenrows_api_key="invalid-key")
    result = invalid_scraper.invoke({"url": "https://httpbin.io/html"})
except ValueError as e:
    if "Invalid ZenRows API key" in str(e):
        print("✅ Correctly caught invalid API key error")
        print(f"Error message: {e}")
    elif "Rate limit exceeded" in str(e):
        print("⚠️  Rate limit exceeded - please upgrade your ZenRows plan")
    elif "Response size too large" in str(e):
        print("⚠️  Response too large - use CSS selectors to reduce content")
    else:
        print(f"❌ Unexpected error: {e}")

In [None]:
# Test error handling with invalid URL
try:
    result = scraper.invoke({"url": "not-a-valid-url"})
except ValueError as e:
    print("✅ Correctly caught invalid URL error")
    print(f"Error message: {e}")
except Exception as e:
    print(f"Unexpected error type: {type(e).__name__}: {e}")

## Performance Tips and Advanced Configuration

Optimize your scraping performance with these advanced techniques.

In [None]:
# Block unnecessary resources to speed up scraping
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "block_resources": "images,fonts,media",  # Block images, fonts, and media
    "wait_for": ".product-name",
    "response_type": "markdown"
})

print("Fast scraping with blocked resources:")
print(result[:400] + "...")

## Conclusion

This notebook has demonstrated the comprehensive features of the `langchain-zenrows` package:

### Core Features Covered:
1. **Basic web scraping** with multiple output formats (HTML, Markdown, Plaintext)
2. **JavaScript rendering** for modern SPAs and dynamic content
3. **CSS extraction** for targeted data retrieval
4. **Structured data extraction** (links, headings, tables, emails, etc.)
5. **Screenshots** - full page and element-specific
6. **Premium proxies** with geo-targeting (190+ countries)
7. **Custom JavaScript execution** for complex interactions
8. **Session management** for multi-step processes
9. **JSON API capture** for intercepting network requests
10. **Custom headers** for advanced request customization
11. **LangChain agent integration** for intelligent scraping workflows
12. **Error handling** and performance optimization

### Key Benefits:
- **55M+ residential IPs** for bypassing anti-bot systems
- **JavaScript rendering** with headless browsers
- **Multiple output formats** for different use cases
- **Structured data extraction** without writing custom parsers
- **Agent integration** for AI-powered scraping workflows

### Next Steps

- Explore the [official ZenRows API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference#parameter-overview) for all available parameters
- Check out the [LangChain documentation](https://python.langchain.com/) for more agent patterns
- Build custom scraping workflows combining multiple features
- Integrate with your existing LangChain applications

### Resources

- [ZenRows Documentation](https://docs.zenrows.com/)
- [LangChain Documentation](https://python.langchain.com/)
- [Package Repository](https://github.com/ZenRows-Hub/langchain-zenrows)
- [ZenRows Universal Scraper](https://app.zenrows.com/register?prod=universal_scraper)