🧠 AI Document Analysis API

An intelligent document processing API that extracts, analyses, and summarises content from PDF, DOCX, and image files using Claude AI.

Features

Feature	Detail
Multi-format support	PDF, DOCX, JPEG, PNG, TIFF, BMP, WEBP, GIF
Text extraction	PyMuPDF (PDF), python-docx (DOCX), Tesseract OCR (images + scanned PDFs)
AI Summarisation	Concise 2–5 sentence summaries via Claude claude-opus-4-5
Entity extraction	PERSON, ORGANIZATION, LOCATION, DATE, MONEY, PRODUCT, EVENT
Sentiment analysis	positive / negative / neutral with confidence score
Authentication	API key via `X-API-Key` header

API Reference

`POST /analyse`

Upload a document for processing.

Headers:

X-API-Key: <your-api-key>
Content-Type: multipart/form-data

Body:

file: <file upload>

Response:

{
  "filename": "report.pdf",
  "file_type": "PDF",
  "num_pages": 5,
  "word_count": 1842,
  "extracted_text": "...",
  "summary": "This report analyses...",
  "entities": [
    {"text": "John Smith", "type": "PERSON"},
    {"text": "Acme Corp", "type": "ORGANIZATION"},
    {"text": "$2.5 million", "type": "MONEY"}
  ],
  "sentiment": {
    "label": "positive",
    "score": 0.87
  },
  "metadata": {"num_pages": 5}
}

`GET /health`

Returns {"status": "ok"}.

`GET /`

Returns API info and endpoint list.

Quick Start (Local)

# 1. Clone
git clone https://github.com/YOUR_USERNAME/docai.git
cd docai

# 2. Install dependencies
pip install -r requirements.txt

# 3. Set environment variables
export GROQ_API_KEY=sk-ant-...
export DOC_API_KEY=your-secret-key

# 4. Run
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Docker

docker build -t docai .
docker run -p 8000:8000 \
  -e GROQ_API_KEY=sk-ant-... \
  -e DOC_API_KEY=your-secret-key \
  docai

Deploy to Railway (Free Tier)

Push this repo to GitHub
Go to railway.app → New Project → Deploy from GitHub
Select your repo
Add environment variables:
- GROQ_API_KEY = your GROQ key
- DOC_API_KEY = your chosen API key
Deploy — Railway will auto-detect the Dockerfile

Deploy to Render (Free Tier)

Push repo to GitHub
Go to render.com → New → Web Service
Connect your GitHub repo
Set:
- Runtime: Docker
- Environment Variables: GROQ_API_KEY, DOC_API_KEY
Deploy

Test with curl

# PDF
curl -X POST https://YOUR-DOMAIN/analyse \
  -H "X-API-Key: your-secret-key" \
  -F "file=@document.pdf"

# DOCX
curl -X POST https://YOUR-DOMAIN/analyse \
  -H "X-API-Key: your-secret-key" \
  -F "file=@report.docx"

# Image
curl -X POST https://YOUR-DOMAIN/analyse \
  -H "X-API-Key: your-secret-key" \
  -F "file=@scan.jpg"

Environment Variables

Variable	Description	Default
`GROQ_API_KEY`	GROQ API key (required)	—
`DOC_API_KEY`	Your chosen API key for auth	`docai-secret-key-2024`
`PORT`	Server port (auto-set by Railway/Render)	`8000`

Architecture

Client ──► POST /analyse (multipart file)
              │
              ▼
         FastAPI Router
              │
              ▼
         DocumentProcessor
         ├── PDF  → PyMuPDF → text extraction
         │          └── scanned? → Claude Vision OCR
         ├── DOCX → python-docx → paragraph/table text
         └── Image → Tesseract OCR
              │
              ▼
         Claude claude-opus-4-5 (single prompt)
         ├── Summary
         ├── Entities (NER)
         └── Sentiment
              │
              ▼
         JSON Response

Tech Stack

FastAPI — async Python web framework
Claude claude-opus-4-5 — AI summarisation, NER, sentiment, OCR
PyMuPDF — PDF text extraction
Tesseract + pytesseract — Image OCR
python-docx — DOCX text extraction
Docker — containerised deployment

AI used

Claude and Perplexity

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
railway.toml		railway.toml
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 AI Document Analysis API

Features

API Reference

`POST /analyse`

`GET /health`

`GET /`

Quick Start (Local)

Docker

Deploy to Railway (Free Tier)

Deploy to Render (Free Tier)

Test with curl

Environment Variables

Architecture

Tech Stack

AI used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Document Analysis API

Features

API Reference

POST /analyse

GET /health

GET /

Quick Start (Local)

Docker

Deploy to Railway (Free Tier)

Deploy to Render (Free Tier)

Test with curl

Environment Variables

Architecture

Tech Stack

AI used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /analyse`

`GET /health`

`GET /`

Packages