Skip to content

Destroyer1543/PaperPal-Hackathon

Repository files navigation

tslamp-logo

PaperPal- TSLAMP Hackathon

PaperPal converts unstructured manuscripts (DOCX/PDF) into a citation-safe LaTeX project and compiles a preview PDF.

image image image image image

What it generates

For each run, PaperPal materializes:

  • build/main.tex
  • build/references.bib
  • build/figures/...
  • build/output.pdf
  • run artifacts like extracted.ir.json, citation_map.json, compile_result.json, explainability.report.json
Untitled

Recommended Execution (Docker Compose)

PaperPal is primarily run with Docker Compose in this repo.

Prerequisites

  • Docker Engine / Docker Desktop
  • Docker Compose plugin

1) Configure env

cp .env.example .env
cp apps/web/.env.example apps/web/.env.local

Set at least:

  • OPENAI_API_KEY
  • OPENAI_BASE_URL=https://api.openai.com/v1

2) Start stack

docker compose -f infra/docker-compose.yml up -d --build

3) Open

  • Web: http://localhost:5173
  • API: http://localhost:8000
  • API docs: http://localhost:8000/docs

Local Installation (without Docker, optional)

Prerequisites

  • Python >=3.10 (3.11 recommended)
  • Node.js >=18 (20 recommended)
  • PostgreSQL (local instance)
  • Redis (local instance)
  • Pandoc (required for DOCX extraction)
  • For compile stage:
    • either Docker Engine/Desktop (PAPERPAL_COMPILE_USE_DOCKER=1), or
    • local latexmk + TeX Live (PAPERPAL_COMPILE_USE_DOCKER=0)

1) Clone and install backend deps

python -m venv .venv
# PowerShell
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]

2) Install frontend deps

cd apps/web
npm install
cd ../..

3) Environment setup

Copy env templates:

cp .env.example .env
cp apps/web/.env.example apps/web/.env.local

Minimum required variables in .env for full pipeline:

  • DATABASE_URL (Postgres DSN)
  • REDIS_URL
  • OPENAI_API_KEY (for live model calls)

Commonly adjusted variables:

  • ENABLE_PDF_PHASE2 (true to allow PDF ingestion)
  • PAPERPAL_COMPILE_USE_DOCKER (1 uses sandbox container; 0 uses local latexmk)
  • OPENAI_BASE_URL (keep https://api.openai.com/v1 unless using compatible proxy)
  • CORS_ORIGINS (JSON list, e.g. ["http://localhost:5173"])

4) Run DB migration

alembic upgrade head

5) Start services locally (separate terminals)

API:

uvicorn paperpal_api.main:app --reload --host 0.0.0.0 --port 8000

Pipeline worker:

rq worker pipeline --url redis://localhost:6379/0

Compile worker:

rq worker compile --url redis://localhost:6379/0

Frontend:

cd apps/web
npm run dev

Open:

  • Web: http://localhost:5173
  • API docs: http://localhost:8000/docs

Environment Variables

Authoritative template: .env.example.

The template is grouped by:

  • API/service settings
  • DB/Redis
  • storage/journals paths
  • queue names
  • pipeline + compile behavior
  • LLM/provider config
  • model-step overrides
  • compile sandbox limits
  • PDF ingestion adapter settings

Frontend env template: apps/web/.env.example

Workflow (How it works)

High-level run graph:

  1. extract_document (DOCX via Pandoc AST, PDF via PDF extractor when enabled)
  2. extract_references
  3. parse_references
  4. emit_bibtex
  5. extract_citation_mentions
  6. link_citations
  7. validate_citation_coverage
  8. load_guidelines
  9. analyze_scope
  10. build_block_plan
  11. generate_blocks
  12. postprocess_citations
  13. generate_figures_tables
  14. assemble_project
  15. optional polish stages (full_manuscript_polish, secondary_manuscript_polish)
  16. precompile_validate
  17. compile (latexmk + deterministic/LLM repair loop)
  18. explainability

Compile loop is bounded and compile-safe:

  • deterministic fixes first
  • then limited LLM patch retries
  • citation invariant enforced before compile (\cite{key} must exist in references.bib)

Core API

  • POST /projects
  • POST /projects/{id}/generate
  • POST /projects/{id}/compile
  • GET /projects/{id}/status?run_id=...
  • GET /projects/{id}/files
  • GET /projects/{id}/file?path=...
  • PUT /projects/{id}/file
  • GET /projects/{id}/pdf
  • GET /projects/{id}/explainability?run_id=...
  • GET /journals
  • GET /journals/{id}/guidelines-bundle

Notes

  • If OPENAI_API_KEY is empty, model steps fall back to deterministic defaults (quality may degrade, but pipeline stays compile-safe).
  • For PDF support, keep ENABLE_PDF_PHASE2=true and ensure extraction dependencies are available.
  • Runtime artifacts are under storage/projects/ (ignored by git except .gitkeep).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors