v1.1.0

ayoub-ibm released this 24 Jan 18:02

· 122 commits to main since this release

b70a416

Features

Extraction Layer

Multi-phase audit and optimization: adaptive prompting, Chain of Density consolidation, schema-aware chunking, provider-specific batching, and cached protocol checks (1507dec)
Performance improvements: faster GPU inference, better memory management, and optimized context usage
Zero data loss architecture with robust error handling
Batching & inference optimization: provider-specific thresholds (OpenAI: 90%, Google: 88%, Anthropic: 85%, Ollama: 75%), real tokenizer with 20% safety margin, enhanced GPU memory cleanup, model capability tiers (SIMPLE/STANDARD/ADVANCED), and optimized chunked processing (1507dec)

LLM Client

vLLM hanging issue fixed with max_tokens limits and request timeouts (605bc99)
Graceful truncation detection and partial JSON recovery with actionable warnings (f0b2f53)
Consistent configuration across all clients (max_tokens, timeout, finish_reason metadata)

Bug Fixes

Prevented infinite generation in vLLM and other clients via token limits and timeouts (605bc99)
Fixed cryptic JSON parse errors when responses are truncated; partial results now recovered with warnings (f0b2f53)

Documentation

Reorganized docs into a 5-section minimalist structure (d705039)
Updated fundamentals and usage docs to reflect extraction layer changes, batching, performance tuning, and error handling (8994f72)
Enabled dynamic Mermaid flowchart loading with custom styling (2c16d25)

Assets 4