Sync Confluence spaces and pages to local Markdown files — with full and incremental modes, rate-limit-aware crawling, and per-space README generation.
- Full sync — downloads every page across all (or selected) spaces
- Incremental sync — skips pages that haven't changed since the last run
- Single-space sync — target one space with
--space - README generation — writes a
README.mdtable of contents inside each space directory - Rate-limited API requests with exponential-backoff retry
- Pages saved as clean Markdown with a metadata header
- Python 3.11+
- A Confluence Cloud account with an API token
git clone https://github.com/Childcity/confluence-sync
cd confluence-sync
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
pip install -r requirements.txtCopy .env.example to .env and fill in your credentials:
CONFLUENCE_EMAIL=your-email@example.com
CONFLUENCE_API_TOKEN=your-api-token
CONFLUENCE_URL=https://your-domain.atlassian.net
CONFLUENCE_SPACES= # comma-separated keys, empty = all spaces
CONFLUENCE_INCLUDE_ATTACHMENTS=false
SYNC_MODE=incremental # 'full' or 'incremental'
SYNC_RATE_LIMIT_DELAY=0.1 # seconds between API requests
STORAGE_BASE_PATH=./data
Generate a Confluence API token at Account Settings → Security → API tokens.
# Show help
python main.py --help
# Incremental sync of all configured spaces (default mode)
python main.py
# Full sync of all spaces
python main.py --mode full
# Limit to 2 spaces (useful for testing)
python main.py --mode full --space_limit 2
# Sync a single space
python main.py --space ENG
# Sync and regenerate README indexes
python main.py --mode incremental --generate_readmedata/
├── pages/
│ ├── ENG/
│ │ ├── README.md # generated index
│ │ ├── Getting_Started.md
│ │ └── Architecture.md
│ └── HR/
│ └── Onboarding.md
├── metadata/
│ └── <page_id>.json
└── sync_state.json # incremental sync timestamps
Each Markdown file starts with a metadata header:
# Page Title
- **Space**: ENG
- **Page ID**: 123456
- **Last Modified**: 2026-01-15T10:30:00.000Z
- **URL**: https://your-domain.atlassian.net/wiki/...
---
Page content …confluence-sync/
├── main.py # CLI entry point
├── src/
│ ├── confluence_crawler.py # Atlassian API client & pagination
│ ├── sync_manager.py # Orchestrates crawl + storage
│ ├── storage.py # Saves Markdown, metadata, sync state
│ ├── readme_generator.py # Per-space README.md generation
│ └── utils/
│ ├── logger.py # Console + file logging
│ └── markdown_converter.py # HTML → Markdown conversion
├── tests/ # Manual integration test scripts
├── .env.example
├── requirements.txt
└── LICENSE