AI Document Analyzer

An intelligent document processing service that extracts text from PDF and DOCX files, analyzes content using Large Language Models (LLMs) via OpenRouter, and provides structured metadata extraction.

Features

Document Upload: Support for PDF and DOCX files.
Secure Storage: Files are stored securely in Minio/S3.
Text Extraction: Automatic extraction of text content from uploaded documents.
AI Analysis:
- Concise Summaries.
- Document Type Detection (Invoice, CV, Report, etc.).
- Metadata Extraction (Date, Sender, Amount, etc.).
REST API: Built with FastAPI for high performance and easy integration.

Tech Stack

Backend: FastAPI
Database: PostgreSQL
Storage: Minio (S3 Compatible)
AI/LLM: OpenRouter (GPT-4o-mini or compatible)
ORM: SQLAlchemy
Dependencies: boto3, pdfplumber, python-docx, python-multipart

Prerequisites

Python 3.9+
PostgreSQL
Minio Server (or S3 access)
OpenRouter API Key

Installation

Clone the repository

git clone https://github.com/danielzfega/document-analyzer
cd document-analyzer

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Configuration Create a .env file in the root directory (ensure it is UTF-8 encoded):

DATABASE_URL=postgresql://user:password@localhost:5432/document_analyzer

MINIO_ENDPOINT=http://localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin123
MINIO_BUCKET_NAME=document-analyzer

OPENROUTER_API_KEY=your_openrouter_api_key_here

Run the Application
```
uvicorn main:app --reload
```

API Usage

1. Upload Document

POST /documents/upload

Body: multipart/form-data with file field.

Response:

{
    "id": "1",
    "file_name": "resume.pdf"
}

2. Analyze Document

POST /documents/{id}/analyze

Response:
```
{
    "message": "Analysis complete"
}
```

3. Get Document Details

GET /documents/{id}

Response:

{
    "id": "1",
    "file_name": "resume.pdf",
    "text": "Extracted text content...",
    "summary": "This is a resume for...",
    "detected_type": "CV",
    "attributes": {
        "name": "John Doe",
        "email": "john@example.com"
    }
}

Testing

Run the test suite using pytest:

pytest

Note: Ensure your .env is configured for testing (e.g., using a test database or SQLite).

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
core		core
db		db
models		models
routers		routers
schemas		schemas
services		services
tests		tests
.gitignore		.gitignore
README.md		README.md
config.py		config.py
important.txt		important.txt
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Analyzer

Features

Tech Stack

Prerequisites

Installation

API Usage

1. Upload Document

2. Analyze Document

3. Get Document Details

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Document Analyzer

Features

Tech Stack

Prerequisites

Installation

API Usage

1. Upload Document

2. Analyze Document

3. Get Document Details

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages