Skip to content

Fix/llm handler improvements#6

Merged
fbs617 merged 4 commits intomainfrom
fix/llm-handler-improvements
Mar 12, 2026
Merged

Fix/llm handler improvements#6
fbs617 merged 4 commits intomainfrom
fix/llm-handler-improvements

Conversation

@fbs617
Copy link
Collaborator

@fbs617 fbs617 commented Mar 12, 2026

Fix: LLM Handler Improvements & Automated Testing Suite

Contributors


Summary

Major improvements to LLM handler robustness with comprehensive two-tier testing framework for quality assurance.


Changes

LLM Handler Improvements (@emirakdag0 )

Prompt Engineering:

  • Improved system prompts for better DREF field extraction
  • Enhanced multi-document context handling
  • Reduced hallucination with clearer instructions

Response Handling:

  • Stricter JSON parsing with error recovery
  • Field validation and type coercion (e.g., "5,000" → 5000)
  • Retry logic with exponential backoff for failures

Integration:

  • Seamless conflict resolution integration
  • Source tracking for all extracted fields
  • Confidence scoring for uncertain extractions

Testing Suite (@fbs617 )

Tier 1: Pytest (19 tests)

  • 5 blocker tests (conflict detection, security)
  • Runs automatically in CI on every push/PR
  • Fast (<30s), deterministic checks

Tier 2: Promptfoo (34 tests)

  • 12 categories with LLM-as-judge evaluation
  • Ground truth scoring across all edge cases
  • Comprehensive quality metrics

Infrastructure:

  • Test helpers: InputBuilder, FormStateFactory, assertions
  • Results viewer: python review.py
  • GitHub Actions CI workflow

emirakdag0 and others added 4 commits March 8, 2026 14:57
…romptfoo evaluation

- 19 Tier 1 hard-coded assertion tests (5 blockers for conflict detection and security)
- 34 Tier 2 LLM-as-judge test cases across 12 categories with ground truth scoring
- Test helpers: input builders, form state factories, reusable assertions
- Promptfoo provider wrapping handle_message with Azure OpenAI
- Combined results viewer (review.py) with inspection log support
- GitHub Actions CI workflow for automated Tier 1 runs on push/PR
@fbs617 fbs617 merged commit 7e55dc7 into main Mar 12, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants