EVVM Documentation Scraper

░▒▓████████▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓██████████████▓▒░
░▒▓█▓▒░      ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░       ░▒▓█▓▒▒▓█▓▒░ ░▒▓█▓▒▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓██████▓▒░  ░▒▓█▓▒▒▓█▓▒░ ░▒▓█▓▒▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░        ░▒▓█▓▓█▓▒░   ░▒▓█▓▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░        ░▒▓█▓▓█▓▒░   ░▒▓█▓▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓████████▓▒░  ░▒▓██▓▒░     ░▒▓██▓▒░  ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░

Automated documentation scraper that converts EVVM docs into LLM-friendly llms.txt format

🚀 Quick Start

# Clone repository
git clone git@github.com:0xOucan/evvmdocscrapper.git
cd evvmdocscrapper

# Install dependencies
npm install

# Run scraper (smart mode - checks for changes first)
npm run scrape

# Or use interactive menu
./scrape.sh

✨ Features

🎨 Beautiful terminal interface with ASCII art banners
⚡ Smart change detection - Only scrapes when website changes (95% faster for unchanged docs)
🔗 Auto-includes EIP-191 - Ethereum Signed Data Standard strategically placed for LLM context
📝 Clean extraction - Removes navigation, breadcrumbs, and metadata noise
🎯 Logical ordering - Documentation ordered by menu structure, not alphabetically
📊 100% llms.txt compliant - Follows llmstxt.org specification exactly

📦 Output Files

The scraper generates two files in ./dist/:

`llms.txt` (1.1KB)

Minimal index file with:

Key documentation links (6 core sections)
EIP-191 reference
Link to full documentation

`llms-full.txt` (753KB, 19,114 lines, 150 pages)

Complete documentation with:

149 EVVM documentation pages
EIP-191 standard (strategically placed after transaction docs)
All content in clean Markdown format

🎯 Usage

Interactive Menu (Recommended)

./scrape.sh

Options:

Smart scrape - Checks for changes first (~5-10s if no changes, ~2min if changed)
Force scrape - Always scrapes regardless of changes
Re-add EIP-191 - Update EIP-191 content only
Exit

Direct Commands

Smart scrape (recommended for automation):

npm run scrape

Force scrape (bypass change detection):

npm run scrape -- --force

Update EIP-191 only:

npm run add-eip191

🔍 Change Detection

The scraper automatically detects website changes before scraping:

How it works:

Reads metadata from previous scrape (timestamp, page count, URL hash)
Quickly crawls site to get current page URLs (~5-10 seconds)
Compares current state with previous metadata
Only performs full scrape if changes detected

Benefits:

95% time savings when docs unchanged
Bandwidth efficient - No unnecessary full scrapes
Perfect for CI/CD - Run frequently without waste
Always accurate - Detects new/removed/moved pages

Example output when no changes:

🔍 Checking for website changes...
📊 Previous scrape: 2025-11-10T17:32:53.335Z
📄 Previous page count: 149
✅ No changes detected!
💡 Use npm run scrape -- --force to scrape anyway

See CHANGE_DETECTION.md for full documentation.

🛠️ How It Works

Crawls - Uses Crawlee to crawl all EVVM docs pages
Extracts - Parses HTML with Cheerio and removes noise
Converts - HTML to Markdown using Turndown
Orders - Pages sorted by documentation menu structure
Includes EIP-191 - Automatically scrapes and inserts EIP-191 standard
Generates - Creates llms.txt compliant output files

📁 Project Structure

evvmdocscrapper/
├── src/
│   ├── build-llms-full.ts      # Main scraper
│   ├── add-eip191.ts            # EIP-191 scraper
│   └── change-detector.ts       # Change detection module
├── dist/
│   ├── llms.txt                 # Index file (generated)
│   └── llms-full.txt            # Full docs (generated)
├── scrape.sh                    # Interactive menu script
├── package.json
├── tsconfig.json
├── README.md
├── CHANGE_DETECTION.md          # Change detection docs
├── CHANGELOG.md                 # Version history
└── COMPLIANCE_CHECK.md          # llms.txt compliance verification

🔧 Configuration

Edit constants in src/build-llms-full.ts:

const DOMAIN = 'https://www.evvm.info';
const START_URL = `${DOMAIN}/docs/intro`;
const DOCS_PREFIX = `${DOMAIN}/docs/`;
const EIP_191_URL = 'https://eips.ethereum.org/EIPS/eip-191';

📝 Output Format

llms.txt Structure

# EVVM
> Brief description

## Docs
- [Key pages with descriptions]

## Reference
- [EIP-191 link]

## Context files
- [Link to llms-full.txt]

llms-full.txt Structure

<!-- Scraper Metadata: {timestamp, pageCount, pagesHash, eip191Included, version} -->
# EVVM Documentation
## Introduction
[content...]
## QuickStart
[content...]
## Process of a Transaction in EVVM
[content...]
## ERC-191: Signed Data Standard  ← Strategically placed here
[content...]
## EVVM Core Contract
[... 130 more sections ...]

🤖 Automation Examples

GitHub Actions

name: Update Docs
on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
      - run: npm install
      - run: npm run scrape  # Only scrapes if changed

Cron Job

# Run daily at midnight - only scrapes if docs changed
0 0 * * * cd /path/to/evvmdocscrapper && npm run scrape

✅ Compliance

✅ 100% llmstxt.org specification compliant
✅ Respects rate limits (maxConcurrency: 5)
✅ Includes source attribution (permalinks for each section)
✅ Checks robots.txt (handled by Crawlee)
✅ Only crawls /docs/* pages

📚 Documentation

CHANGE_DETECTION.md - Full change detection documentation
CHANGELOG.md - Version history and changes
COMPLIANCE_CHECK.md - llms.txt specification compliance
TERMINAL_INTERFACE.md - Terminal UI documentation

🤝 Contributing

Created by @0xOucan with assistance from Claude (Anthropic).

Contributions welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests

📄 License

This project is open source. Always verify the target site's terms of service before scraping.

🔗 Links

EVVM Documentation: https://www.evvm.info/docs/
llms.txt Specification: https://llmstxt.org/
EIP-191 Standard: https://eips.ethereum.org/EIPS/eip-191

Made with ❤️ for the EVVM ecosystem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EVVM Documentation Scraper

🚀 Quick Start

✨ Features

📦 Output Files

`llms.txt` (1.1KB)

`llms-full.txt` (753KB, 19,114 lines, 150 pages)

🎯 Usage

Interactive Menu (Recommended)

Direct Commands

🔍 Change Detection

🛠️ How It Works

📁 Project Structure

🔧 Configuration

📝 Output Format

llms.txt Structure

llms-full.txt Structure

🤖 Automation Examples

GitHub Actions

Cron Job

✅ Compliance

📚 Documentation

🤝 Contributing

📄 License

🔗 Links

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dist		dist
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CHANGE_DETECTION.md		CHANGE_DETECTION.md
CLAUDE.md		CLAUDE.md
COMPLIANCE_CHECK.md		COMPLIANCE_CHECK.md
LICENSE		LICENSE
README.md		README.md
TERMINAL_INTERFACE.md		TERMINAL_INTERFACE.md
package.json		package.json
scrape.sh		scrape.sh
tsconfig.json		tsconfig.json

License

EVVM-org/documentation-scraper

Folders and files

Latest commit

History

Repository files navigation

EVVM Documentation Scraper

🚀 Quick Start

✨ Features

📦 Output Files

llms.txt (1.1KB)

llms-full.txt (753KB, 19,114 lines, 150 pages)

🎯 Usage

Interactive Menu (Recommended)

Direct Commands

🔍 Change Detection

🛠️ How It Works

📁 Project Structure

🔧 Configuration

📝 Output Format

llms.txt Structure

llms-full.txt Structure

🤖 Automation Examples

GitHub Actions

Cron Job

✅ Compliance

📚 Documentation

🤝 Contributing

📄 License

🔗 Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`llms.txt` (1.1KB)

`llms-full.txt` (753KB, 19,114 lines, 150 pages)

Packages