GENIE - Generic Extractor of Information Engine

A Python framework for intelligent data extraction using LLMs.

Quick Start

Prerequisites

Python 3.11+
Poetry (or pip)

Installation

Using Poetry (recommended):

poetry install
poetry shell

Using pip:

pip install -r requirements.txt

Configuration

Copy .env.example to .env:

cp .env.example .env

Add your API keys to .env:

ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-key-here

Running the Server

uvicorn spec.main:app --reload --port 8000

The API will be available at http://localhost:8000

Docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
Health: http://localhost:8000/api/v1/health

Project Structure

spec/
├── api/                    # REST API endpoints
│   └── v1/
│       ├── endpoints/      # Endpoint implementations
│       ├── router.py       # Route aggregator
│       └── dependencies.py # Dependency injection
├── core/                   # Core infrastructure
│   ├── config.py          # Settings management
│   ├── exceptions.py      # Custom exceptions
│   ├── logging_config.py  # Logging setup
│   └── security.py        # Security utilities
├── models/                 # Pydantic data models
├── extraction/             # Extraction engine
│   ├── engine.py          # Main orchestrator
│   ├── llm/               # LLM providers
│   ├── parsers/           # Content parsers
│   └── layout/            # Layout fingerprinting
├── search_library/        # Pattern storage
├── output/                # Output management
└── main.py                # FastAPI entry point

API Endpoints

Health Check

GET /api/v1/health

Extract Data

POST /api/v1/extract
Content-Type: application/json

{
    "config_id": "config_001",
    "source": {
        "type": "text",
        "content": "Document content here..."
    },
    "force_llm": false,
    "options": {
        "auto_create_patterns": true
    }
}

Testing

Run all tests:

pytest

Run specific test file:

pytest tests/unit/test_models.py -v

Run with coverage:

pytest --cov=spec --cov-report=html

Development

Code Style

Formatter: Black (88 chars line length)
Linter: Ruff
Type Checker: Mypy

Format code:

black spec/ tests/
ruff check . --fix

Type checking:

mypy spec/

Documentation

License

MIT

Project Status

Phase 1: MVP Core - In Development

✓ Project setup and tooling
✓ Core infrastructure
✓ Pydantic models
✓ LLM provider interface (Anthropic)
✓ Text and PDF parsers
✓ Layout fingerprinting
✓ Search library (JSON storage)
✓ Extraction engine
✓ REST API endpoints
⏳ Comprehensive testing
⏳ End-to-end validation

See PHASE-1-PLAN.md for detailed roadmap.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
config		config
docs		docs
sdks/python/genie_sdk		sdks/python/genie_sdk
spec		spec
tests		tests
.env.example		.env.example
.gitignore		.gitignore
FINAL-RELEASE-NOTES.md		FINAL-RELEASE-NOTES.md
HOMOLOGATION-CHECKLIST.md		HOMOLOGATION-CHECKLIST.md
IMPLEMENTATION-SUMMARY.md		IMPLEMENTATION-SUMMARY.md
PHASE-1-STATUS.md		PHASE-1-STATUS.md
README.md		README.md
TESTE-RAPIDO.md		TESTE-RAPIDO.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh
test-genie.py		test-genie.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GENIE - Generic Extractor of Information Engine

Quick Start

Prerequisites

Installation

Configuration

Running the Server

Project Structure

API Endpoints

Health Check

Extract Data

Testing

Development

Code Style

Documentation

License

Project Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GENIE - Generic Extractor of Information Engine

Quick Start

Prerequisites

Installation

Configuration

Running the Server

Project Structure

API Endpoints

Health Check

Extract Data

Testing

Development

Code Style

Documentation

License

Project Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages