Skip to content

Release 1.0.16 - Multi-Provider AI Support

Choose a tag to compare

@ivbeg ivbeg released this 12 Dec 12:22
· 11 commits to master since this release

Release 1.0.16 - Multi-Provider AI Support

🎉 Major Features

Multi-Provider AI Support

undatum now supports multiple AI providers for automatic field and dataset documentation:

  • OpenAI - GPT-4o-mini, GPT-4o, GPT-3.5-turbo, and more
  • OpenRouter - Unified API for accessing models from OpenAI, Anthropic, Google, and others
  • Ollama - Run local models without API keys
  • LM Studio - Local models via OpenAI-compatible API
  • Perplexity - Backward compatible with existing Perplexity integration

Structured AI Output

  • Replaced fragile text parsing with JSON Schema-based structured output
  • More reliable AI response parsing
  • Better error handling and fallback mechanisms

Flexible Configuration

Configure AI providers through:

  1. Environment variables (lowest precedence)
  2. Config files (undatum.yaml or ~/.undatum/config.yaml)
  3. CLI arguments (highest precedence)

✨ What's New

Added

  • Multi-provider AI support: Added support for OpenAI, OpenRouter, Ollama, LM Studio, and Perplexity APIs
  • Structured AI output: Replaced fragile text parsing with JSON Schema-based structured output for reliable AI responses
  • Flexible AI configuration: Support for environment variables, config files (undatum.yaml or ~/.undatum/config.yaml), and CLI arguments with proper precedence
  • AI provider factory: New get_ai_service() function for easy provider instantiation
  • Enhanced error handling: Proper exception classes (AIServiceError, AIConfigurationError, AIAPIError) with clear error messages
  • CLI arguments for AI: Added --ai-provider, --ai-model, and --ai-base-url options to analyze command
  • Configuration management: New undatum/ai/config.py module for unified configuration handling
  • Backward compatibility: Old get_fields_info() and get_description() functions maintained for compatibility
  • Enhanced code quality improvements and Pylint score improvements
  • Better error handling and resource management

Changed

  • AI system refactoring: Completely refactored AI documentation system from Perplexity-only to multi-provider architecture
  • Structured responses: All AI providers now use JSON Schema (response_format: json_object) instead of parsing CSV from markdown code blocks
  • Provider architecture: Implemented abstract base class AIService with concrete provider implementations
  • Improved code quality: fixed indentation, trailing whitespace, and formatting issues
  • Refactored file operations to use with statements for better resource management
  • Updated string formatting to use f-strings and lazy logging
  • Fixed dangerous default arguments in function signatures
  • Improved type hints and code documentation
  • Updated analyze command to accept AI provider configuration
  • Updated schemer command to use new AI service interface

Fixed

  • Fixed critical bug: added missing _process_json_data function in analyzer module
  • Fixed bad indentation issues in duckdb_decompose function
  • Fixed redefined builtin id parameter (renamed to table_id)
  • Fixed unused imports and arguments
  • Fixed dictionary iteration patterns (removed unnecessary .keys() calls)
  • Fixed isinstance() calls to use tuple syntax for better performance
  • Improved file handling with proper context managers
  • Fixed fragile AI response parsing: Replaced error-prone text extraction with proper JSON parsing
  • Fixed AI service initialization: Added proper error handling and fallback when AI service fails to initialize

📦 Installation

pip install --upgrade undatum

🔗 Links