Skip to content

Childcity/confluence-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

confluence-sync

Sync Confluence spaces and pages to local Markdown files — with full and incremental modes, rate-limit-aware crawling, and per-space README generation.

Features

  • Full sync — downloads every page across all (or selected) spaces
  • Incremental sync — skips pages that haven't changed since the last run
  • Single-space sync — target one space with --space
  • README generation — writes a README.md table of contents inside each space directory
  • Rate-limited API requests with exponential-backoff retry
  • Pages saved as clean Markdown with a metadata header

Requirements

  • Python 3.11+
  • A Confluence Cloud account with an API token

Installation

git clone https://github.com/Childcity/confluence-sync
cd confluence-sync
python -m venv venv
venv\Scripts\activate   # Windows
# source venv/bin/activate  # macOS / Linux
pip install -r requirements.txt

Configuration

Copy .env.example to .env and fill in your credentials:

CONFLUENCE_EMAIL=your-email@example.com
CONFLUENCE_API_TOKEN=your-api-token
CONFLUENCE_URL=https://your-domain.atlassian.net

CONFLUENCE_SPACES=                  # comma-separated keys, empty = all spaces
CONFLUENCE_INCLUDE_ATTACHMENTS=false

SYNC_MODE=incremental               # 'full' or 'incremental'
SYNC_RATE_LIMIT_DELAY=0.1          # seconds between API requests
STORAGE_BASE_PATH=./data

Generate a Confluence API token at Account Settings → Security → API tokens.

Usage

# Show help
python main.py --help

# Incremental sync of all configured spaces (default mode)
python main.py

# Full sync of all spaces
python main.py --mode full

# Limit to 2 spaces (useful for testing)
python main.py --mode full --space_limit 2

# Sync a single space
python main.py --space ENG

# Sync and regenerate README indexes
python main.py --mode incremental --generate_readme

Output layout

data/
├── pages/
│   ├── ENG/
│   │   ├── README.md          # generated index
│   │   ├── Getting_Started.md
│   │   └── Architecture.md
│   └── HR/
│       └── Onboarding.md
├── metadata/
│   └── <page_id>.json
└── sync_state.json            # incremental sync timestamps

Each Markdown file starts with a metadata header:

# Page Title

- **Space**: ENG
- **Page ID**: 123456
- **Last Modified**: 2026-01-15T10:30:00.000Z
- **URL**: https://your-domain.atlassian.net/wiki/...

---

Page content …

Project structure

confluence-sync/
├── main.py                    # CLI entry point
├── src/
│   ├── confluence_crawler.py  # Atlassian API client & pagination
│   ├── sync_manager.py        # Orchestrates crawl + storage
│   ├── storage.py             # Saves Markdown, metadata, sync state
│   ├── readme_generator.py    # Per-space README.md generation
│   └── utils/
│       ├── logger.py          # Console + file logging
│       └── markdown_converter.py  # HTML → Markdown conversion
├── tests/                     # Manual integration test scripts
├── .env.example
├── requirements.txt
└── LICENSE

License

MIT

About

Sync Confluence spaces and pages to local Markdown files — with full and incremental modes, rate-limit-aware crawling, and per-space README generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages