Web Extract Skill for OpenClaw

A robust web content extraction skill for OpenClaw that fetches and converts web pages to readable markdown/text using multiple fallback services.

🌟 Features

Multi-Service Fallback: Automatically tries 4 different extraction services
No API Keys Required: Uses free, publicly available services
Markdown Output: Clean, readable markdown format
Easy Integration: Works seamlessly with OpenClaw

🚀 Services (in order of preference)

Service	URL Pattern	Best For
markdown.new	`https://markdown.new/{url}`	General use, Cloudflare sites
defuddle.md	`https://defuddle.md/{url}`	Alternative parsing
r.jina.ai	`https://r.jina.ai/{url}`	Article extraction
Scrapling	Python library	Complex pages, JavaScript

📦 Installation

Method 1: Install from ClawHub (recommended)

clawhub install web-extract

Method 2: Manual Installation

Clone this repository:

git clone https://github.com/yourusername/web-extract.git

Copy to your OpenClaw skills directory:

cp -r web-extract ~/.openclaw/workspace/skills/

Package the skill:

cd ~/.openclaw/workspace/skills/web-extract
clawhub package .

🎯 Usage

Once installed, OpenClaw will automatically use this skill when you provide a URL.

Example Workflow

Try markdown.new first (fastest, best formatting)
Fallback to defuddle.md (alternative parser)
Try r.jina.ai (good for articles)
Use Scrapling (when services fail)

Manual Usage

# Using the included script
python3 scripts/extract.py "https://example.com/article"

# With specific format
python3 scripts/extract.py "https://example.com/article" --format markdown

# Save to file
python3 scripts/extract.py "https://example.com/article" -o output.md

📁 Project Structure

web-extract/
├── SKILL.md                 # Main skill documentation
├── README.md                # This file
├── LICENSE                  # MIT License
├── scripts/
│   └── extract.py          # Scrapling extraction script
└── references/
    └── services.md         # Service documentation

🔧 Requirements

OpenClaw >= 1.0.0
Python 3.8+ (for Scrapling fallback)
Scrapling library (optional, for fallback):
```
pip install scrapling
```

📝 Example

# Extract a blog post
python3 scripts/extract.py "https://example.com/blog/post"

# Output:
# Title: Example Blog Post
# URL: https://example.com/blog/post
# Content: ...

🛠️ Development

Testing

# Test with a simple URL
python3 scripts/extract.py "https://example.com"

# Test markdown output
python3 scripts/extract.py "https://example.com" --format markdown

Adding New Services

To add a new extraction service:

Update SKILL.md with the new service
Add service details to references/services.md
Update the fallback chain in documentation

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

markdown.new - Cloudflare-based markdown conversion
defuddle.md - Alternative markdown service
r.jina.ai - Article extraction service
Scrapling - Python web scraping library

📞 Support

If you encounter any issues or have questions:

Check the references/services.md for troubleshooting
Open an issue on GitHub
Contact the OpenClaw community

Made with ❤️ for OpenClaw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Extract Skill for OpenClaw

🌟 Features

🚀 Services (in order of preference)

📦 Installation

Method 1: Install from ClawHub (recommended)

Method 2: Manual Installation

🎯 Usage

Example Workflow

Manual Usage

📁 Project Structure

🔧 Requirements

📝 Example

🛠️ Development

Testing

Adding New Services

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Web Extract Skill for OpenClaw

🌟 Features

🚀 Services (in order of preference)

📦 Installation

Method 1: Install from ClawHub (recommended)

Method 2: Manual Installation

🎯 Usage

Example Workflow

Manual Usage

📁 Project Structure

🔧 Requirements

📝 Example

🛠️ Development

Testing

Adding New Services

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages