Centralized publishing flow: upload a PDF, extract text (with OCR fallback), generate Markdown with titles and file names derived from page content, sync into the single Docusaurus project at demo_docs/ (replacing all docs there), run npm run build, then copy the resulting demo_docs/build/ tree to backend/templates/doc_builds/<PublishedItem UUID>/ (one folder per upload, named after the row’s id) and serve it at /viewer/<uuid>/ (or your DOCUSAURUS_BASE_URL prefix). Read-only metadata is available over HTTP. Storage is treated as budgeted and renewable; this repo does not claim permanence.
| Path | Purpose |
|---|---|
backend/ |
Django app, REST API, staff upload UI, Celery tasks; viewer routes under /viewer/<uuid>/ |
demo_docs/ |
Only Docusaurus builder: pipeline replaces docs/, sidebars.ts, and static/img/publish/ on each publish, then builds here |
backend/utils/ |
PDF processing (main.py), publish pipeline (pipeline.py), plus input_files/ fixtures |
backend/templates/doc_builds/<uuid>/ |
Snapshot of demo_docs/build/ per upload (<uuid> = PublishedItem.pk; contents gitignored) |
Doc structure inside demo_docs/docs/: intro.md (overview) and a document/ category (Pages from source) containing one page per PDF page in order, with optional category index.
- Python 3.11+ (3.13 works if your stack matches)
- PostgreSQL 14+ (required; SQLite is not supported)
- Node.js 20+ and npm (
npm run buildindemo_docs/from the worker; runnpm installthere once after clone) - Redis (Celery broker and result backend by default)
- Tesseract (optional but recommended if PDFs need OCR; same as
backend/utils/usage) - PyMuPDF / Fitz and Pillow (installed via
backend/requirements.txt)
Start Postgres (example — adjust user, password, and database name to match DATABASE_URL):
docker run --name accessdoc-pg -e POSTGRES_USER=accessdoc -e POSTGRES_PASSWORD=accessdoc -e POSTGRES_DB=accessdoc -p 5432:5432 -d postgres:16-alpineFrom the repository root:
cd backend
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # edit DATABASE_URL if your Postgres differs
python manage.py migrate
python manage.py collectstatic --noinput
python manage.py createsuperuser
# non-interactive (e.g. scripts): DJANGO_SUPERUSER_USERNAME=admin DJANGO_SUPERUSER_EMAIL=admin@localhost DJANGO_SUPERUSER_PASSWORD=admin python manage.py createsuperuser --noinputSee backend/.env.example. Common values:
DATABASE_URL— required, e.g.postgresql://accessdoc:accessdoc@127.0.0.1:5432/accessdoc(Railway/Heroku provide this automatically for managed Postgres)CELERY_BROKER_URL/CELERY_RESULT_BACKEND— defaultredis://127.0.0.1:6379/0DOCUSAURUS_BASE_URL— default/viewer/; each snapshot is served at{DOCUSAURUS_BASE_URL}<uuid>/with matchingbaseUrlat build time (/viewer/<uuid>/).ACCESSDOC_DOCUSAURUS_ROOT— Docusaurus project path (default: repodemo_docs/)ACCESSDOC_DOCUSAURUS_BUILDS_ROOT— wherebuild/snapshots are stored (default:backend/templates/doc_builds/)ACCESSDOC_REPO_ROOT— override if the repo is not the parent ofbackend/ACCESSDOC_MAX_UPLOAD_MB— default25
Install dependencies once:
cd demo_docs
npm installSmoke-build (optional):
DOCUSAURUS_BASE_URL=/viewer/00000000-0000-4000-8000-000000000000/ DOCUSAURUS_SITE_TITLE="Smoke test" npm run buildThe upload pipeline runs npm run build in demo_docs/ after writing docs. Do not run two publishes concurrently against the same demo_docs/ tree without an external lock—they share the same docs/ working directory.
1. PostgreSQL — must match DATABASE_URL in .env (see Backend setup).
2. Redis
redis-server
# or: docker run --rm -p 6379:6379 redis:73. Celery worker (must see node and npm on PATH when processing uploads)
Run this from the backend/ directory so Python can import the accessdoc package.
cd backend
source .venv/bin/activate
celery -A accessdoc worker -l info4. Django
cd backend
source .venv/bin/activate
python manage.py runserver5. Use the app
- Public home:
http://127.0.0.1:8000/ - Staff sign-in (web app):
http://127.0.0.1:8000/login/ - Django admin:
http://127.0.0.1:8000/admin/login/ - Upload (staff):
http://127.0.0.1:8000/upload/ - Dashboard (staff):
http://127.0.0.1:8000/dashboard/— “Open docs” uses the item’s UUID (same as the API). - Read-only API (published items only):
http://127.0.0.1:8000/api/items/ - Viewer:
http://127.0.0.1:8000/viewer/<item-uuid>/docs/intro/
The Dockerfile only prepares the codebase: Node 20, demo_docs/ npm ci, Python deps, collectstatic. Nothing long-running is started during the image build except what Django needs for static collection.
docker compose up --build starts three services from that image:
- postgres — PostgreSQL 16; user/password/db
accessdoc(override by changing composeDATABASE_URLandPOSTGRES_*together). - redis — broker and result backend for Celery.
- web — migrates, optional default superuser (admin / admin unless
SKIP_DEFAULT_SUPERUSER=1), starts Celery in the background (ACCESSDOC_EMBEDDED_CELERY_WORKER=1), then Gunicorn on port 8000 withACCESSDOC_USE_CELERY=trueso uploads enqueue tasks and the same container consumes them.
Named volumes persist postgres_data (database), media/, and templates/doc_builds/ so uploads and builds stay on disk across restarts.
For a dedicated worker container only (no HTTP), run the image with ACCESSDOC_CONTAINER_ROLE=worker (see docker/entrypoint.sh). Do not run that and ACCESSDOC_EMBEDDED_CELERY_WORKER=1 on separate replicas of the same app, or tasks may be processed twice.
# From repo root — set DJANGO_SECRET_KEY in the environment or a .env file next to compose
export DJANGO_SECRET_KEY=$(python3 -c "import secrets; print(secrets.token_urlsafe(50))")
docker compose up --build
# http://127.0.0.1:8000/Requires a running PostgreSQL instance (Django creates a test_* database on the same server). Example:
# If DATABASE_URL in .env points at localhost:5432
cd backend
source .venv/bin/activate
python manage.py test items- Upload stuck on “Pending” (production / Celery): Typical causes: (1) No worker — Celery is not running; Docker Compose in this repo runs an embedded worker with Gunicorn (
ACCESSDOC_EMBEDDED_CELERY_WORKER=1). On Railway or a single container, set the same. (2) Wrong Redis URL — platforms often setREDIS_URL; this project falls back to it whenCELERY_BROKER_URLis unset. If both are missing, Django defaults to127.0.0.1and task enqueue fails (check server logs for Failed to queue process_published_item). (3) Missing or differentDATABASE_URL— every process must use the same Postgres. (4) Split media disk — if web and worker run on different machines without shared storage, the worker may not see uploaded PDFs underMEDIA_ROOT; use shared volumes or a single container with embedded Celery. - Upload stuck on “Pending” (local): With
DEBUG=true, PDF jobs often run in a background thread. UseACCESSDOC_USE_CELERY=trueandcelery -A accessdoc workerwith Redis. - Build fails inside the task: From
demo_docs/, runnpm installandnpm run buildmanually to capture errors; use Node 20+. - Viewer 404 or broken styling: The snapshot must be built with
DOCUSAURUS_BASE_URL=/viewer/<that-item’s-uuid>/. The task sets this automatically. - Concurrent uploads: Serialized processing for
demo_docs/is recommended; overlapping jobs can overwrite each other’s staged docs before build.