confluence-sync

Sync Confluence spaces and pages to local Markdown files — with full and incremental modes, rate-limit-aware crawling, and per-space README generation.

Features

Full sync — downloads every page across all (or selected) spaces
Incremental sync — skips pages that haven't changed since the last run
Single-space sync — target one space with --space
README generation — writes a README.md table of contents inside each space directory
Rate-limited API requests with exponential-backoff retry
Pages saved as clean Markdown with a metadata header

Requirements

Python 3.11+
A Confluence Cloud account with an API token

Installation

git clone https://github.com/Childcity/confluence-sync
cd confluence-sync
python -m venv venv
venv\Scripts\activate   # Windows
# source venv/bin/activate  # macOS / Linux
pip install -r requirements.txt

Configuration

Copy .env.example to .env and fill in your credentials:

CONFLUENCE_EMAIL=your-email@example.com
CONFLUENCE_API_TOKEN=your-api-token
CONFLUENCE_URL=https://your-domain.atlassian.net

CONFLUENCE_SPACES=                  # comma-separated keys, empty = all spaces
CONFLUENCE_INCLUDE_ATTACHMENTS=false

SYNC_MODE=incremental               # 'full' or 'incremental'
SYNC_RATE_LIMIT_DELAY=0.1          # seconds between API requests
STORAGE_BASE_PATH=./data

Generate a Confluence API token at Account Settings → Security → API tokens.

Usage

# Show help
python main.py --help

# Incremental sync of all configured spaces (default mode)
python main.py

# Full sync of all spaces
python main.py --mode full

# Limit to 2 spaces (useful for testing)
python main.py --mode full --space_limit 2

# Sync a single space
python main.py --space ENG

# Sync and regenerate README indexes
python main.py --mode incremental --generate_readme

Output layout

data/
├── pages/
│   ├── ENG/
│   │   ├── README.md          # generated index
│   │   ├── Getting_Started.md
│   │   └── Architecture.md
│   └── HR/
│       └── Onboarding.md
├── metadata/
│   └── <page_id>.json
└── sync_state.json            # incremental sync timestamps

Each Markdown file starts with a metadata header:

# Page Title

- **Space**: ENG
- **Page ID**: 123456
- **Last Modified**: 2026-01-15T10:30:00.000Z
- **URL**: https://your-domain.atlassian.net/wiki/...

---

Page content …

Project structure

confluence-sync/
├── main.py                    # CLI entry point
├── src/
│   ├── confluence_crawler.py  # Atlassian API client & pagination
│   ├── sync_manager.py        # Orchestrates crawl + storage
│   ├── storage.py             # Saves Markdown, metadata, sync state
│   ├── readme_generator.py    # Per-space README.md generation
│   └── utils/
│       ├── logger.py          # Console + file logging
│       └── markdown_converter.py  # HTML → Markdown conversion
├── tests/                     # Manual integration test scripts
├── .env.example
├── requirements.txt
└── LICENSE

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

confluence-sync

Features

Requirements

Installation

Configuration

Usage

Output layout

Project structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

confluence-sync

Features

Requirements

Installation

Configuration

Usage

Output layout

Project structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages