🕷️ ClawPod — General Purpose Web Scraper Agent

An AI-powered web scraping agent built with Pydantic AI and the Massive Unblocker API. Point it at any URL, give it instructions, and get back structured data — no manual parsing required.

DEMO

smart-web-scraper-demo.mov

What it does

ClawPod takes a URL and optional instructions, fetches the page through the Massive Unblocker, strips all HTML noise, and passes clean text to an LLM. The agent returns a summary, key facts, and optionally a fully custom structured output you define yourself via a JSON schema.

Powered by Massive Unblocker

ClawPod uses the Massive Unblocker Browser API to fetch pages — and it's what makes this agent actually work on the modern web.

Most scrapers fail on sites that use:

Cloudflare and other WAF/bot protection layers
Anti-bot fingerprinting (TLS, headers, browser behavior checks)
JavaScript rendering (Next.js, React, Vue SPAs)

Massive Unblocker handles all of that automatically. It runs a real browser in the background, bypasses bot detection, solves challenges, and returns the fully rendered HTML — so the agent sees exactly what a human visitor would see, without any manual configuration.

Features

Scrape any URL with a single click
AI-generated summary and key facts in English
Additional instructions — guide the agent on what to focus on
Custom output schema — define your own JSON structure and the agent fills it in, returning a list of objects
Configurable character limits — tune how much content is sent to the AI vs. shown in the raw snippet viewer
Raw snippet viewer — inspect the actual HTML returned by the scraper

Setup

# Install dependencies
uv sync

# Copy and fill in your env
cp .env.example .env

.env:

MASSIVE_UNBLOCKER_TOKEN=your_token_here
OPENAI_API_KEY=your_openai_key_here

# Run the app
uv run streamlit run app.py

Custom Output Schema

Enable the toggle in the UI and provide a JSON object where keys are field names and values describe what to extract:

{
  "title": "job title",
  "city": "city of the salary listing",
  "salary": "monthly salary amount"
}

The agent will return a list of objects matching that schema — one item if only one match is found, multiple if there are many.

Stack

Pydantic AI — agent framework
Massive Unblocker — browser-grade scraping API
Streamlit — UI
BeautifulSoup4 — HTML cleaning

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
scraping_agent.py		scraping_agent.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ ClawPod — General Purpose Web Scraper Agent

DEMO

What it does

Powered by Massive Unblocker

Features

Setup

Custom Output Schema

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ ClawPod — General Purpose Web Scraper Agent

DEMO

What it does

Powered by Massive Unblocker

Features

Setup

Custom Output Schema

Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages