░▒▓████████▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓██████████████▓▒░
░▒▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░ ░▒▓█▓▒▒▓█▓▒░ ░▒▓█▓▒▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓██████▓▒░ ░▒▓█▓▒▒▓█▓▒░ ░▒▓█▓▒▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░ ░▒▓█▓▓█▓▒░ ░▒▓█▓▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓█▓▒░ ░▒▓█▓▓█▓▒░ ░▒▓█▓▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
░▒▓████████▓▒░ ░▒▓██▓▒░ ░▒▓██▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓█▓▒░
Automated documentation scraper that converts EVVM docs into LLM-friendly llms.txt format
# Clone repository
git clone git@github.com:0xOucan/evvmdocscrapper.git
cd evvmdocscrapper
# Install dependencies
npm install
# Run scraper (smart mode - checks for changes first)
npm run scrape
# Or use interactive menu
./scrape.sh- 🎨 Beautiful terminal interface with ASCII art banners
- ⚡ Smart change detection - Only scrapes when website changes (95% faster for unchanged docs)
- 🔗 Auto-includes EIP-191 - Ethereum Signed Data Standard strategically placed for LLM context
- 📝 Clean extraction - Removes navigation, breadcrumbs, and metadata noise
- 🎯 Logical ordering - Documentation ordered by menu structure, not alphabetically
- 📊 100% llms.txt compliant - Follows llmstxt.org specification exactly
The scraper generates two files in ./dist/:
Minimal index file with:
- Key documentation links (6 core sections)
- EIP-191 reference
- Link to full documentation
Complete documentation with:
- 149 EVVM documentation pages
- EIP-191 standard (strategically placed after transaction docs)
- All content in clean Markdown format
./scrape.shOptions:
- Smart scrape - Checks for changes first (~5-10s if no changes, ~2min if changed)
- Force scrape - Always scrapes regardless of changes
- Re-add EIP-191 - Update EIP-191 content only
- Exit
Smart scrape (recommended for automation):
npm run scrapeForce scrape (bypass change detection):
npm run scrape -- --forceUpdate EIP-191 only:
npm run add-eip191The scraper automatically detects website changes before scraping:
How it works:
- Reads metadata from previous scrape (timestamp, page count, URL hash)
- Quickly crawls site to get current page URLs (~5-10 seconds)
- Compares current state with previous metadata
- Only performs full scrape if changes detected
Benefits:
- 95% time savings when docs unchanged
- Bandwidth efficient - No unnecessary full scrapes
- Perfect for CI/CD - Run frequently without waste
- Always accurate - Detects new/removed/moved pages
Example output when no changes:
🔍 Checking for website changes...
📊 Previous scrape: 2025-11-10T17:32:53.335Z
📄 Previous page count: 149
✅ No changes detected!
💡 Use npm run scrape -- --force to scrape anyway
See CHANGE_DETECTION.md for full documentation.
- Crawls - Uses Crawlee to crawl all EVVM docs pages
- Extracts - Parses HTML with Cheerio and removes noise
- Converts - HTML to Markdown using Turndown
- Orders - Pages sorted by documentation menu structure
- Includes EIP-191 - Automatically scrapes and inserts EIP-191 standard
- Generates - Creates llms.txt compliant output files
evvmdocscrapper/
├── src/
│ ├── build-llms-full.ts # Main scraper
│ ├── add-eip191.ts # EIP-191 scraper
│ └── change-detector.ts # Change detection module
├── dist/
│ ├── llms.txt # Index file (generated)
│ └── llms-full.txt # Full docs (generated)
├── scrape.sh # Interactive menu script
├── package.json
├── tsconfig.json
├── README.md
├── CHANGE_DETECTION.md # Change detection docs
├── CHANGELOG.md # Version history
└── COMPLIANCE_CHECK.md # llms.txt compliance verification
Edit constants in src/build-llms-full.ts:
const DOMAIN = 'https://www.evvm.info';
const START_URL = `${DOMAIN}/docs/intro`;
const DOCS_PREFIX = `${DOMAIN}/docs/`;
const EIP_191_URL = 'https://eips.ethereum.org/EIPS/eip-191';# EVVM
> Brief description
## Docs
- [Key pages with descriptions]
## Reference
- [EIP-191 link]
## Context files
- [Link to llms-full.txt]<!-- Scraper Metadata: {timestamp, pageCount, pagesHash, eip191Included, version} -->
# EVVM Documentation
## Introduction
[content...]
## QuickStart
[content...]
## Process of a Transaction in EVVM
[content...]
## ERC-191: Signed Data Standard ← Strategically placed here
[content...]
## EVVM Core Contract
[... 130 more sections ...]
name: Update Docs
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v2
- run: npm install
- run: npm run scrape # Only scrapes if changed# Run daily at midnight - only scrapes if docs changed
0 0 * * * cd /path/to/evvmdocscrapper && npm run scrape- ✅ 100% llmstxt.org specification compliant
- ✅ Respects rate limits (maxConcurrency: 5)
- ✅ Includes source attribution (permalinks for each section)
- ✅ Checks robots.txt (handled by Crawlee)
- ✅ Only crawls
/docs/*pages
- CHANGE_DETECTION.md - Full change detection documentation
- CHANGELOG.md - Version history and changes
- COMPLIANCE_CHECK.md - llms.txt specification compliance
- TERMINAL_INTERFACE.md - Terminal UI documentation
Created by @0xOucan with assistance from Claude (Anthropic).
Contributions welcome! Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
This project is open source. Always verify the target site's terms of service before scraping.
- EVVM Documentation: https://www.evvm.info/docs/
- llms.txt Specification: https://llmstxt.org/
- EIP-191 Standard: https://eips.ethereum.org/EIPS/eip-191
Made with ❤️ for the EVVM ecosystem