A robust web content extraction skill for OpenClaw that fetches and converts web pages to readable markdown/text using multiple fallback services.
- Multi-Service Fallback: Automatically tries 4 different extraction services
- No API Keys Required: Uses free, publicly available services
- Markdown Output: Clean, readable markdown format
- Easy Integration: Works seamlessly with OpenClaw
| Service | URL Pattern | Best For |
|---|---|---|
| markdown.new | https://markdown.new/{url} |
General use, Cloudflare sites |
| defuddle.md | https://defuddle.md/{url} |
Alternative parsing |
| r.jina.ai | https://r.jina.ai/{url} |
Article extraction |
| Scrapling | Python library | Complex pages, JavaScript |
clawhub install web-extract- Clone this repository:
git clone https://github.com/yourusername/web-extract.git- Copy to your OpenClaw skills directory:
cp -r web-extract ~/.openclaw/workspace/skills/- Package the skill:
cd ~/.openclaw/workspace/skills/web-extract
clawhub package .Once installed, OpenClaw will automatically use this skill when you provide a URL.
- Try markdown.new first (fastest, best formatting)
- Fallback to defuddle.md (alternative parser)
- Try r.jina.ai (good for articles)
- Use Scrapling (when services fail)
# Using the included script
python3 scripts/extract.py "https://example.com/article"
# With specific format
python3 scripts/extract.py "https://example.com/article" --format markdown
# Save to file
python3 scripts/extract.py "https://example.com/article" -o output.mdweb-extract/
├── SKILL.md # Main skill documentation
├── README.md # This file
├── LICENSE # MIT License
├── scripts/
│ └── extract.py # Scrapling extraction script
└── references/
└── services.md # Service documentation
- OpenClaw >= 1.0.0
- Python 3.8+ (for Scrapling fallback)
- Scrapling library (optional, for fallback):
pip install scrapling
# Extract a blog post
python3 scripts/extract.py "https://example.com/blog/post"
# Output:
# Title: Example Blog Post
# URL: https://example.com/blog/post
# Content: ...# Test with a simple URL
python3 scripts/extract.py "https://example.com"
# Test markdown output
python3 scripts/extract.py "https://example.com" --format markdownTo add a new extraction service:
- Update
SKILL.mdwith the new service - Add service details to
references/services.md - Update the fallback chain in documentation
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- markdown.new - Cloudflare-based markdown conversion
- defuddle.md - Alternative markdown service
- r.jina.ai - Article extraction service
- Scrapling - Python web scraping library
If you encounter any issues or have questions:
- Check the references/services.md for troubleshooting
- Open an issue on GitHub
- Contact the OpenClaw community
Made with ❤️ for OpenClaw