Skip to content

v1.1.0

Choose a tag to compare

@ayoub-ibm ayoub-ibm released this 24 Jan 18:02
· 122 commits to main since this release

Features

Extraction Layer
  • Multi-phase audit and optimization: adaptive prompting, Chain of Density consolidation, schema-aware chunking, provider-specific batching, and cached protocol checks (1507dec)
  • Performance improvements: faster GPU inference, better memory management, and optimized context usage
  • Zero data loss architecture with robust error handling
  • Batching & inference optimization: provider-specific thresholds (OpenAI: 90%, Google: 88%, Anthropic: 85%, Ollama: 75%), real tokenizer with 20% safety margin, enhanced GPU memory cleanup, model capability tiers (SIMPLE/STANDARD/ADVANCED), and optimized chunked processing (1507dec)
LLM Client
  • vLLM hanging issue fixed with max_tokens limits and request timeouts (605bc99)
  • Graceful truncation detection and partial JSON recovery with actionable warnings (f0b2f53)
  • Consistent configuration across all clients (max_tokens, timeout, finish_reason metadata)

Bug Fixes

  • Prevented infinite generation in vLLM and other clients via token limits and timeouts (605bc99)
  • Fixed cryptic JSON parse errors when responses are truncated; partial results now recovered with warnings (f0b2f53)

Documentation

  • Reorganized docs into a 5-section minimalist structure (d705039)
  • Updated fundamentals and usage docs to reflect extraction layer changes, batching, performance tuning, and error handling (8994f72)
  • Enabled dynamic Mermaid flowchart loading with custom styling (2c16d25)