AeroCrawl

A lightweight, high-performance web scraping and crawling engine — a modern alternative to Firecrawl. Converts websites to LLM-ready markdown, extracts structured data with AI, and uses Chrome DevTools Protocol (CDP) for zero-dependency browser automation.

Features

Core Capabilities

Lightweight: No Playwright/Puppeteer — uses Chrome DevTools Protocol (CDP) directly
Smart Fetching: Auto-fallback to CDP browser rendering for SPAs and bot-protected sites
Clean Markdown: Mozilla Readability for semantic content extraction
AI Extraction: Structured data extraction via LLMs (OpenAI, Anthropic, Ollama, NVIDIA Nemotron via OpenRouter)
Site Mapping: Discover all URLs via sitemaps and link crawling
Async Crawling: SQLite-backed job queue for scalable crawling
Zero External Search: Automate search engines (Google, DuckDuckGo, Bing) via CDP — no third-party APIs
MCP Server: Integrate with AI tools like Cursor and KiloCode
Webhooks: Real-time crawl completion notifications
PDF Support: Automatic PDF text extraction to markdown
Agent Mode: AI-driven browser automation for complex tasks

Advanced CDP Features

Browser Actions: Click, type, scroll, press keys via CDP (no Playwright needed)
Screenshots: Full-page PNG/JPEG captures
Multi-Format Output: Markdown, HTML, or screenshots from any scrape
Batch Scraping: Async multi-URL scraping via /v1/batch/scrape

Quick Start

For Agents - Quick Start

As an AI agent, you can use AeroCrawl in two ways:

PATH A: Use Directly via CLI If you need to scrape something immediately in the terminal:

# 1. Install
Windows: iwr https://raw.githubusercontent.com/deviprasadshetty-dev/aerocrawl/main/install.ps1 -OutFile install.ps1; .\install.ps1
Mac/Linux: curl -fsSL https://raw.githubusercontent.com/deviprasadshetty-dev/aerocrawl/main/install.sh -o install.sh && bash install.sh

# 2. Scrape
aerocrawl https://example.com

PATH B: Install as MCP Server for Yourself To add AeroCrawl's tools natively to your own context window (like Cursor or KiloCode):

# 1. Install (same as above)
# 2. Generate configuration
aerocrawl --generate-mcp-config

(The AI editor will run the server automatically. No need to keep it running in the terminal).

For Humans - Quick Install

Option 1: Use install scripts (Recommended)

# Windows (PowerShell)
iwr https://raw.githubusercontent.com/deviprasadshetty-dev/aerocrawl/main/install.ps1 -OutFile install.ps1; .\install.ps1

# Mac/Linux
curl -fsSL https://raw.githubusercontent.com/deviprasadshetty-dev/aerocrawl/main/install.sh -o install.sh && bash install.sh

Option 2: Manual install

# Windows (PowerShell)
git clone https://github.com/deviprasadshetty-dev/aerocrawl.git; cd aerocrawl; npm install; npm run build; npm link

# Mac/Linux
git clone https://github.com/deviprasadshetty-dev/aerocrawl.git && cd aerocrawl && npm install && npm run build && npm link

Verify:

aerocrawl --help

CLI (Zero-Server Mode)

The easiest way to use AeroCrawl — no server setup required:

# Scrape a single URL to markdown
aerocrawl https://example.com

# Extract structured data with AI (uses OpenRouter free models by default)
aerocrawl https://example.com -m extract

# Map all URLs on a site
aerocrawl https://example.com -m map

# Crawl an entire website
aerocrawl https://example.com -m crawl

# AI Agent Mode: automate tasks with natural language
aerocrawl https://example.com --goal "Extract all product prices and features"

# Search the web (zero external API)
aerocrawl "best web scraping tools" -m search

# Capture screenshots
aerocrawl https://example.com --screenshot

# Batch scrape from file
aerocrawl --batch urls.txt --output results.json

MCP Server (AI Tool Integration)

Integrate with Cursor, KiloCode, and other MCP-compatible tools. The AI assistant will run the server automatically when needed—you do not need to run it manually in your terminal.

Auto-Generate Configuration:

aerocrawl --generate-mcp-config

Manual MCP Configuration JSON:

Add this to your MCP settings file:

{
  "mcpServers": {
    "aerocrawl": {
      "command": "aerocrawl",
      "args": ["-m", "mcp"]
    }
  }
}

Cursor: Add to ~/.cursor/mcp.json or .cursor/mcp.json in your project

KiloCode: Add to KiloCode's MCP settings file

Available MCP Tools:

scrape / extract / crawl / search / agent

Usage in Cursor:

@aerocrawl Can you scrape https://example.com and extract the main heading?

API Server (Optional)

Start the HTTP API server for programmatic access:

# Start server (defaults to port 3000)
aerocrawl -m serve

# Or via Node.js
node dist/cli.js serve

Installation

AeroCrawl is not published to the npm registry. Install directly from GitHub:

Prerequisites

Node.js ≥ 18
Git
Chrome or Edge (CDP uses your system browser)

Step-by-Step

Clone the repo:

git clone https://github.com/deviprasadshetty-dev/aerocrawl.git

Install dependencies:
```
cd aerocrawl && npm install
```
Build:
```
npm run build
```
Link globally (optional):
```
npm link
```

Local Project Use

npm install github:deviprasadshetty-dev/aerocrawl

Prerequisites:

Node.js ≥ 18
Chrome/Edge installed (CDP uses your system browser)

Configuration

Create a .env file in your working directory:

# LLM Provider (openai, anthropic, ollama, openrouter)
LLM_PROVIDER=openrouter

# OpenRouter (recommended - free models available)
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=nvidia/nemotron-3-super-120b-a12b:free

# OpenAI
# OPENAI_API_KEY=sk-...
# OPENAI_MODEL=gpt-4o-mini

# Anthropic
# ANTHROPIC_API_KEY=sk-ant-...
# ANTHROPIC_MODEL=claude-3-5-haiku-20241022

# Ollama (local)
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3.2

# Server
PORT=3000

API Endpoints

Base URL: http://localhost:3000

POST /v1/scrape

Scrape a single URL with optional CDP actions and multi-format output:

curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown", "html", "screenshot"]}'

POST /v1/agent

AI-driven browser automation:

curl -X POST http://localhost:3000/v1/agent \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/pricing", "goal": "Extract all product prices"}'

POST /v1/batch/scrape

Batch scrape multiple URLs asynchronously:

curl -X POST http://localhost:3000/v1/batch/scrape \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/1", "https://example.com/2"]}'

Comparison with Firecrawl

Feature	Firecrawl	AeroCrawl
Lightweight (No Playwright)	No	Yes
CDP Browser Automation	No	Yes
LLM-Ready Markdown	Yes	Yes
AI Structured Extraction	Yes	Yes
Site Mapping & Crawling	Yes	Yes
Browser Actions (Click, Type, etc.)	Yes	Yes
Screenshots & Multi-Format Output	Yes	Yes
Batch Scraping	Yes	Yes
AI Agent Mode	Yes	Yes
Zero-External-API Search	No	Yes
MCP Server Integration	Yes	Yes
Webhooks	Yes	Yes
PDF Support	No	Yes
Free AI Models (OpenRouter)	No	Yes

Key Advantages

70% lighter than Firecrawl (no 300MB+ Playwright dependency)
Local-first: SQLite instead of Redis/Postgres
Zero-config: Works with your system Chrome/Edge out of the box
Free AI: OpenRouter integration with free tier models

Architecture

AeroCrawl uses a modular three-layer architecture:

Engine Layer: SmartFetcher with CDP browser fallback
Transformation Layer: MarkdownPipeline (JSDOM + Readability + Turndown)
Routing Layer: Express API + SQLite job queue

See ARCHITECTURE.md for full technical details.

Supported LLM Providers

OpenRouter (recommended - free models available):
- nvidia/nemotron-3-super-120b-a12b:free
- meta-llama/llama-3-8b-instruct:free
- google/gemma-7b-it:free
OpenAI: GPT-4o, GPT-4o-mini, etc.
Anthropic: Claude 3.5 Haiku, Sonnet, Opus, etc.
Ollama: Local LLMs like Llama 3.2, Mistral, etc.

Development

# Install dependencies
npm install

# Run in development mode (hot reload)
npm run dev

# Build production bundle
npm run build

# Type check
npx tsc --noEmit

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
FIXES_AND_BENCHMARK_REPORT.md		FIXES_AND_BENCHMARK_REPORT.md
README.md		README.md
SKILL.md		SKILL.md
install.ps1		install.ps1
install.sh		install.sh
mitta.md		mitta.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AeroCrawl

Table of Contents

Features

Core Capabilities

Advanced CDP Features

Quick Start

For Agents - Quick Start

For Humans - Quick Install

CLI (Zero-Server Mode)

MCP Server (AI Tool Integration)

API Server (Optional)

Installation

Prerequisites

Step-by-Step

Local Project Use

Configuration

API Endpoints

POST /v1/scrape

POST /v1/agent

POST /v1/batch/scrape

Comparison with Firecrawl

Key Advantages

Architecture

Supported LLM Providers

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AeroCrawl

Table of Contents

Features

Core Capabilities

Advanced CDP Features

Quick Start

For Agents - Quick Start

For Humans - Quick Install

CLI (Zero-Server Mode)

MCP Server (AI Tool Integration)

API Server (Optional)

Installation

Prerequisites

Step-by-Step

Local Project Use

Configuration

API Endpoints

POST /v1/scrape

POST /v1/agent

POST /v1/batch/scrape

Comparison with Firecrawl

Key Advantages

Architecture

Supported LLM Providers

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages