Skip to content

Automattic/tap-wordpress

Repository files navigation

tap-wordpress

CI Python License: Apache 2.0

A Singer tap for extracting data from WordPress REST API, built with the Meltano Singer SDK.

Quick Start

  1. Install the tap

    pip install git+https://github.com/Automattic/tap-wordpress.git
  2. Create a config file

    {
      "base_url": "https://your-wordpress-site.com",
      "per_page": 100
    }
  3. Run discovery to see available streams

    tap-wordpress --config config.json --discover
  4. Extract data

    tap-wordpress --config config.json --catalog catalog.json

Features

  • Complete WordPress REST API coverage - Extract all major WordPress entities
  • Incremental sync - Efficient updates for posts, pages, comments, and media
  • No authentication required - Works with public WordPress REST API endpoints
  • Production ready - Comprehensive error handling, logging, and retry logic
  • Singer compliant - Full Singer specification compliance with state management
  • Meltano native - Built with Meltano SDK for seamless integration

Supported Streams

Stream Replication Method Description
posts Incremental Blog posts with content, metadata, and relationships
pages Incremental WordPress pages with hierarchy and content
comments Incremental Comments on posts and pages with threading
media Incremental Media library items (images, files, etc.)
users Full Table User profiles, roles, and capabilities
categories Full Table Post categories with hierarchical structure
tags Full Table Post tags and taxonomies

Installation

From source

git clone https://github.com/Automattic/tap-wordpress.git
cd tap-wordpress
pip install -e .

Configuration

Required Settings

Setting Description
base_url WordPress site base URL (e.g., https://example.com)

Optional Settings

Setting Default Description
start_date null Start date for incremental sync (ISO 8601)
per_page 100 Number of records to fetch per page
timeout 30 Request timeout in seconds

Configuration Examples

Basic configuration (WordPress.org)

{
  "base_url": "https://wordpress.org"
}

With custom settings

{
  "base_url": "https://your-wordpress-site.com",
  "per_page": 50,
  "start_date": "2023-01-01T00:00:00Z"
}

Usage

Standalone CLI

# Discover available streams
tap-wordpress --config config.json --discover > catalog.json

# Extract data to stdout
tap-wordpress --config config.json --catalog catalog.json

# Extract with state management
tap-wordpress --config config.json --catalog catalog.json --state state.json

With Meltano

  1. Add to your Meltano project

    cd your-meltano-project
    meltano add extractor tap-wordpress --from-ref=https://github.com/Automattic/tap-wordpress.git
  2. Configure the tap

    meltano config tap-wordpress set base_url "https://your-wordpress-site.com"
    meltano config tap-wordpress set per_page 50
  3. Test the connection

    meltano invoke tap-wordpress --discover
  4. Run data extraction

    meltano run tap-wordpress target-jsonl

Example Meltano Configuration

# meltano.yml
plugins:
  extractors:
  - name: tap-wordpress
    variant: meltanolabs
    pip_url: tap-wordpress
    config:
      base_url: https://wordpress.org
      per_page: 50
    select:
    - posts.*
    - pages.*
    - categories.*

WordPress Compatibility

This tap works with:

  • WordPress 4.7+ (when REST API was added to core)
  • WordPress.com hosted sites
  • Self-hosted WordPress installations
  • WordPress Multisite networks
  • Headless WordPress setups

Development

Prerequisites

  • Python 3.8+
  • Poetry for dependency management

Setup Development Environment

# Clone the repository
git clone https://github.com/Automattic/tap-wordpress.git
cd tap-wordpress

# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=tap_wordpress --cov-report=term-missing

# Run specific test file
poetry run pytest tests/test_streams.py -v

# Run tests against live WordPress.org API
poetry run pytest tests/test_integration.py -v

Code Quality

# Format code with Black
poetry run black tap_wordpress tests

# Lint with flake8
poetry run flake8 tap_wordpress tests

# Type checking with mypy
poetry run mypy tap_wordpress

# Run all quality checks
poetry run pre-commit run --all-files

Testing Against Live APIs

The test suite includes integration tests that run against live WordPress APIs:

# Test against WordPress.org (public API)
poetry run python -m tap_wordpress.tap --config config.json.example --discover

# Test data extraction
poetry run python -m tap_wordpress.tap --config config.json.example --catalog catalog.json

Troubleshooting

Common Issues

  1. 403 Forbidden Error

    • Check if the WordPress site has REST API enabled
    • Verify the base_url is correct
    • Some WordPress sites may restrict public API access
  2. Rate Limiting

    • Reduce per_page setting
    • Increase timeout setting
    • The tap includes automatic retry logic
  3. SSL Certificate Issues

    • Ensure the WordPress site has a valid SSL certificate
    • For development, you may need to handle self-signed certificates

Getting Help

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run the test suite: poetry run pytest
  5. Commit your changes: git commit -m 'Add amazing feature'
  6. Push to the branch: git push origin feature/amazing-feature
  7. Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

  • Built with the Meltano Singer SDK
  • Inspired by the WordPress REST API and the Singer ecosystem
  • Thanks to all contributors and the Meltano community

About

Meltano Extractor for the WordPress REST API, using the Singer SDK

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages