PDF Autofillr

AI-powered PDF form filling system. Extracts fields from PDF forms, maps them to your data schema using an LLM, and fills them automatically.

How it works

PDF template + schema keys
         │
         ▼
   make_embed_file          ← run once per PDF template
   (extract → map → embed)
         │
         ▼
   Embedded PDF             ← reusable template with metadata baked in
         │
         ▼
   fill(input_json)         ← run once per user
         │
         ▼
   Filled PDF

Project structure

pdf-autofillr/
├── modules/
│   ├── mapper/             # Core engine (production-ready)
│   │   ├── src/            # Server-side business logic
│   │   ├── sdk/            # Python SDK (pip install pdf-autofiller-mapper)
│   │   ├── entrypoints/    # local, HTTP server, Lambda, Azure, GCP
│   │   ├── deployment/     # Docker
│   │   ├── docs/           # Module-level docs
│   │   ├── tests/          # 169 tests
│   │   ├── api_server.py   # FastAPI entry point
│   │   └── README.md       # Module guide
│   │
│   └── chatbot/            # Conversational data collection (separate service)
│
├── sdks/
│   ├── openapi-mapper.yaml         # OpenAPI spec for mapper
│   ├── openapi-chatbot.yaml
│   ├── openapi-rag.yaml
│   ├── openapi-upload.yaml
│   └── typescript/                 # TypeScript HTTP client
│
├── docs/
│   ├── architecture/               # System design docs
│   ├── guides/                     # Per-module guides
│   └── MIGRATION_SDK_INTO_MODULES.md
│
├── benchmarks/                     # Model evaluation — datasets, tasks, metrics, leaderboard
│   ├── datasets/                   # PDF categories (financial, medical, legal, …)
│   ├── tasks/                      # field_extraction, field_mapping, form_filling
│   ├── metrics/                    # Scoring functions
│   ├── models/                     # Model config cards (gpt-4o, claude, llama, …)
│   ├── results/                    # Benchmark run outputs + leaderboard
│   └── run_benchmark.py            # Entry point
│
├── data/                           # Shared sample PDFs and JSON fixtures
├── examples/                       # Usage examples (HTTP API, direct SDK)
├── Makefile                        # Common commands
├── setup.sh / setup.ps1            # One-time project setup
└── start.sh / stop.sh              # Server lifecycle

Quick start

Option 1 — Automated setup

./setup.sh                        # Mac / Linux
# or
pwsh -File setup.ps1              # Windows

Option 2 — Manual setup

cd modules/mapper
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp config.ini.example config.ini
# set llm_model and API keys in config.ini
python api_server.py
# → http://localhost:8000

Option 3 — Docker

cd modules/mapper/deployment/docker
./docker-build.sh
./docker-run-local.sh

Set your LLM key before starting:

export OPENAI_API_KEY=sk-...          # OpenAI
export ANTHROPIC_API_KEY=sk-ant-...   # Anthropic
# or use Ollama (free, local): set llm_model = ollama/llama3.1 in config.ini

Python SDK

pip install pdf-autofiller-mapper           # HTTP client only
pip install pdf-autofiller-mapper[embedded] # + in-process pipeline

# Embedded (in-process, no server needed)
from pdf_autofiller_mapper import PDFMapper

mapper = PDFMapper(config_path="config.ini")
result = mapper.make_embed_file("form.pdf", "schema_keys.json")
result.save("form_embedded.pdf")

filled = mapper.fill("form_embedded.pdf", {"firstName": "Jane", "lastName": "Doe"})
filled.save("filled.pdf")

# HTTP client (talks to running server / Docker container)
from pdf_autofiller_mapper import PDFMapperClient

with PDFMapperClient("http://localhost:8000") as client:
    result = client.mapper.make_embed_file(pdf_path="s3://bucket/form.pdf")

Full SDK guide: modules/mapper/sdk/README.md

API endpoints

Method	Path	Description
GET	`/health`	Health check
POST	`/extract`	Extract form fields from PDF
POST	`/map`	LLM semantic mapping
POST	`/embed`	Embed field metadata into PDF
POST	`/fill`	Fill embedded PDF with user data
POST	`/make-embed-file`	extract + map + embed in one call
POST	`/fill-pdf`	Alias for fill
POST	`/run-all`	Full pipeline (make-embed + fill)
POST	`/check-embed-file`	Check if PDF has embedded metadata

Full API reference: modules/mapper/docs/api_server.md

Makefile commands

make setup          # Full automated setup
make start          # Start the API server
make dev            # Start with auto-reload
make stop           # Stop the server
make health         # curl /health
make test           # Run all tests (169 tests)
make install        # Install Python dependencies
make install-sdk    # Install mapper Python SDK
make docker-build   # Build Docker image
make docker-run     # Run Docker container

Configuration

Edit modules/mapper/config.ini (copied from config.ini.example):

[general]
llm_model = gpt-4o-mini       # or anthropic/claude-3-5-haiku, ollama/llama3.1
source_type = local            # local | aws | azure | gcp

[local]
workspace = /path/to/data

Full configuration reference: modules/mapper/docs/setup_guide.md

Tests

# Mapper module tests
cd modules/mapper
venv/bin/python -m pytest tests/ --override-ini="addopts=" -q
# 169 passed

# SDK tests
cd modules/mapper/sdk
venv/bin/python -m pytest tests/ -q
# 101 passed

Documentation

Topic	Link
Mapper module	`modules/mapper/README.md`
SDK guide	`modules/mapper/sdk/README.md`
API server	`modules/mapper/docs/api_server.md`
Setup guide	`modules/mapper/docs/setup_guide.md`
Docker	`modules/mapper/docs/docker.md`
Architecture	`docs/architecture/system-overview.md`
Module guides	`docs/guides/`
OpenAPI specs	`sdks/`
Benchmarks	`benchmarks/README.md`

License

MIT — see LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Autofillr

How it works

Project structure

Quick start

Option 1 — Automated setup

Option 2 — Manual setup

Option 3 — Docker

Python SDK

API endpoints

Makefile commands

Configuration

Tests

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
modules		modules
sdks		sdks
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
setup.ps1		setup.ps1
setup.sh		setup.sh
start.ps1		start.ps1
start.sh		start.sh
stop.ps1		stop.ps1
stop.sh		stop.sh

Folders and files

Latest commit

History

Repository files navigation

PDF Autofillr

How it works

Project structure

Quick start

Option 1 — Automated setup

Option 2 — Manual setup

Option 3 — Docker

Python SDK

API endpoints

Makefile commands

Configuration

Tests

Documentation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages