A Singer tap for extracting data from WordPress REST API, built with the Meltano Singer SDK.
-
Install the tap
pip install git+https://github.com/Automattic/tap-wordpress.git
-
Create a config file
{ "base_url": "https://your-wordpress-site.com", "per_page": 100 }
-
Run discovery to see available streams
tap-wordpress --config config.json --discover
-
Extract data
tap-wordpress --config config.json --catalog catalog.json
- ✅ Complete WordPress REST API coverage - Extract all major WordPress entities
- ✅ Incremental sync - Efficient updates for posts, pages, comments, and media
- ✅ No authentication required - Works with public WordPress REST API endpoints
- ✅ Production ready - Comprehensive error handling, logging, and retry logic
- ✅ Singer compliant - Full Singer specification compliance with state management
- ✅ Meltano native - Built with Meltano SDK for seamless integration
Stream | Replication Method | Description |
---|---|---|
posts |
Incremental | Blog posts with content, metadata, and relationships |
pages |
Incremental | WordPress pages with hierarchy and content |
comments |
Incremental | Comments on posts and pages with threading |
media |
Incremental | Media library items (images, files, etc.) |
users |
Full Table | User profiles, roles, and capabilities |
categories |
Full Table | Post categories with hierarchical structure |
tags |
Full Table | Post tags and taxonomies |
git clone https://github.com/Automattic/tap-wordpress.git
cd tap-wordpress
pip install -e .
Setting | Description |
---|---|
base_url |
WordPress site base URL (e.g., https://example.com ) |
Setting | Default | Description |
---|---|---|
start_date |
null |
Start date for incremental sync (ISO 8601) |
per_page |
100 |
Number of records to fetch per page |
timeout |
30 |
Request timeout in seconds |
Basic configuration (WordPress.org)
{
"base_url": "https://wordpress.org"
}
With custom settings
{
"base_url": "https://your-wordpress-site.com",
"per_page": 50,
"start_date": "2023-01-01T00:00:00Z"
}
# Discover available streams
tap-wordpress --config config.json --discover > catalog.json
# Extract data to stdout
tap-wordpress --config config.json --catalog catalog.json
# Extract with state management
tap-wordpress --config config.json --catalog catalog.json --state state.json
-
Add to your Meltano project
cd your-meltano-project meltano add extractor tap-wordpress --from-ref=https://github.com/Automattic/tap-wordpress.git
-
Configure the tap
meltano config tap-wordpress set base_url "https://your-wordpress-site.com" meltano config tap-wordpress set per_page 50
-
Test the connection
meltano invoke tap-wordpress --discover
-
Run data extraction
meltano run tap-wordpress target-jsonl
# meltano.yml
plugins:
extractors:
- name: tap-wordpress
variant: meltanolabs
pip_url: tap-wordpress
config:
base_url: https://wordpress.org
per_page: 50
select:
- posts.*
- pages.*
- categories.*
This tap works with:
- ✅ WordPress 4.7+ (when REST API was added to core)
- ✅ WordPress.com hosted sites
- ✅ Self-hosted WordPress installations
- ✅ WordPress Multisite networks
- ✅ Headless WordPress setups
- Python 3.8+
- Poetry for dependency management
# Clone the repository
git clone https://github.com/Automattic/tap-wordpress.git
cd tap-wordpress
# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Activate virtual environment
poetry shell
# Install pre-commit hooks
pre-commit install
# Run all tests
poetry run pytest
# Run with coverage
poetry run pytest --cov=tap_wordpress --cov-report=term-missing
# Run specific test file
poetry run pytest tests/test_streams.py -v
# Run tests against live WordPress.org API
poetry run pytest tests/test_integration.py -v
# Format code with Black
poetry run black tap_wordpress tests
# Lint with flake8
poetry run flake8 tap_wordpress tests
# Type checking with mypy
poetry run mypy tap_wordpress
# Run all quality checks
poetry run pre-commit run --all-files
The test suite includes integration tests that run against live WordPress APIs:
# Test against WordPress.org (public API)
poetry run python -m tap_wordpress.tap --config config.json.example --discover
# Test data extraction
poetry run python -m tap_wordpress.tap --config config.json.example --catalog catalog.json
-
403 Forbidden Error
- Check if the WordPress site has REST API enabled
- Verify the base_url is correct
- Some WordPress sites may restrict public API access
-
Rate Limiting
- Reduce
per_page
setting - Increase
timeout
setting - The tap includes automatic retry logic
- Reduce
-
SSL Certificate Issues
- Ensure the WordPress site has a valid SSL certificate
- For development, you may need to handle self-signed certificates
- 📖 Documentation: Singer.io
- 🛠️ Meltano SDK: SDK Documentation
- 🐛 Issues: GitHub Issues
- 💬 Community: Meltano Slack
We welcome contributions! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and add tests
- Run the test suite:
poetry run pytest
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Built with the Meltano Singer SDK
- Inspired by the WordPress REST API and the Singer ecosystem
- Thanks to all contributors and the Meltano community