Sortex.AI

Intelligent document processing platform for transport and logistics. Upload PDF or image documents, automatically classify them, extract structured data with OCR + LLM, validate against business rules, and review results in a human-in-the-loop UI.

Features

Document classification — automatically identifies 11 transport document types (CMR, Invoice, Bill of Lading, Air Waybill, Customs Declaration, and more) using weighted keyword scoring with fuzzy matching and LLM fallback
OCR extraction — PaddleOCR (default) or Tesseract for text extraction from PDFs and images
LLM field extraction — structured field extraction via Ollama (qwen2.5:3b) or OpenAI, with per-document-type schemas
Validation engine — config-driven business rules with confidence scoring
Review UI — human-in-the-loop correction interface with side-by-side document viewer
Authentication — JWT access/refresh tokens with role-based access control (RBAC)
File security — magic bytes validation, filename sanitization, rate limiting

Tech Stack

Layer	Technology
Frontend	React 18, TypeScript, Tailwind CSS, Lucide icons
Backend	Python 3.11, FastAPI, SQLAlchemy, Pydantic v2
OCR	PaddleOCR / Tesseract (pluggable)
LLM	Ollama (qwen2.5:3b) / OpenAI (pluggable)
Database	PostgreSQL 15
Storage	MinIO (S3-compatible)
Queue	Redis 7
Monitoring	OpenTelemetry, Prometheus

Architecture

Hexagonal Architecture (Ports & Adapters):

backend/src/
├── domain/           # Entities, value objects, domain services
├── application/      # Use cases, DTOs, extraction schemas
├── infrastructure/   # Repositories, OCR, LLM, storage, auth, messaging
└── api/              # FastAPI routes, middleware

Quick Start

Prerequisites

Docker and Docker Compose
Ollama installed locally (for LLM extraction)

1. Clone and configure

git clone https://github.com/HopeyCodeDS/sortex-ai.git
cd sortex-ai

Create a .env file in the project root:

POSTGRES_USER=sortex
POSTGRES_PASSWORD=sortex
POSTGRES_DB=sortex

MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin

JWT_SECRET_KEY=change-this-in-production

DEFAULT_LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=qwen2.5:3b

CORS_ORIGINS=http://localhost:3000

2. Pull the Ollama model

ollama pull qwen2.5:3b

3. Start services

docker-compose up -d

This starts PostgreSQL, MinIO, Redis, the FastAPI backend, and the React frontend.

4. Access the app

Service	URL
Frontend	http://localhost:3000
Backend API	http://localhost:8000
API docs (Swagger)	http://localhost:8000/docs
MinIO console	http://localhost:9001

Default login: admin@docflow.ai / admin123

Project Structure

sortex-ai/
├── frontend/                     # React + TypeScript + Tailwind
│   ├── src/
│   │   ├── components/
│   │   │   ├── layout/           # AppLayout, Sidebar, TopBar
│   │   │   └── ui/               # Reusable components (Card, Modal, DataTable, etc.)
│   │   ├── pages/                # Dashboard, DocumentUpload, DocumentReview, Login
│   │   ├── contexts/             # Auth context
│   │   └── services/             # API client
│   └── package.json
├── backend/                      # FastAPI + Hexagonal Architecture
│   ├── src/
│   │   ├── domain/
│   │   │   ├── entities/         # Document, Extraction, User
│   │   │   └── services/         # Classification, validation
│   │   ├── application/
│   │   │   ├── use_cases/        # Extract fields, process document
│   │   │   └── extraction_schemas.py
│   │   ├── infrastructure/
│   │   │   ├── persistence/      # PostgreSQL repositories
│   │   │   ├── ocr/              # PaddleOCR, Tesseract adapters
│   │   │   ├── llm/              # Ollama, OpenAI adapters
│   │   │   ├── storage/          # MinIO adapter
│   │   │   └── auth/             # JWT, RBAC
│   │   └── api/
│   │       └── routes/           # REST endpoints
│   ├── migrations/
│   └── requirements.txt
├── docker-compose.yml
└── .env

Supported Document Types

Type	Description
CMR	Convention Marchandise Routiers (road consignment)
Invoice	Commercial / freight invoices
Delivery Note	Proof of delivery documents
Bill of Lading	Ocean shipping contracts
Air Waybill	Air cargo consignment notes
Sea Waybill	Non-negotiable sea transport
Packing List	Shipment contents listing
Customs Declaration	Import/export customs forms
Certificate of Origin	Country of origin certification
Dangerous Goods Declaration	Hazmat shipping declarations
Freight Bill	Carrier billing documents

Local Development (without Docker)

Backend:

cd backend
python -m venv venv
venv\Scripts\activate        # Linux/Mac: source venv/bin/activate
pip install -r requirements.txt
uvicorn src.api.main:app --reload

Frontend:

cd frontend
npm install
npm start

Requires PostgreSQL, Redis, and MinIO running locally or via Docker.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sortex.AI

Features

Tech Stack

Architecture

Quick Start

Prerequisites

1. Clone and configure

2. Pull the Ollama model

3. Start services

4. Access the app

Project Structure

Supported Document Types

Local Development (without Docker)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

HopeyCodeDS/sortex-ai

Folders and files

Latest commit

History

Repository files navigation

Sortex.AI

Features

Tech Stack

Architecture

Quick Start

Prerequisites

1. Clone and configure

2. Pull the Ollama model

3. Start services

4. Access the app

Project Structure

Supported Document Types

Local Development (without Docker)

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages