Skip to content

feat: Reorganize eval framework with clean mock data structure#44

Merged
boringdata merged 1 commit intomainfrom
eval-answer
Nov 25, 2025
Merged

feat: Reorganize eval framework with clean mock data structure#44
boringdata merged 1 commit intomainfrom
eval-answer

Conversation

@boringdata
Copy link
Owner

Summary

Reorganize the evaluation framework with a cleaner, more logical directory structure and improved tooling for faster test setup.

Key Changes

📁 Mock Data Reorganization

  • New structure: Organized into data/documents/, data/db/, data/integrations/
  • Moved websites: All website content consolidated in data/documents/ (acme-corp, acme-docs, competitor-co)
  • Moved integrations: research, analytics, cms → data/integrations/
  • Removed fixtures/: Eliminated redundant fixtures directory structure
  • Consolidated tools: All generators and loaders in generators/ directory

🗂️ Scenario Organization

Split monolithic scenarios file into logical, focused files:

  • scenarios_kurt_setup.yaml - Initialization and configuration tests
  • scenarios_content_ingestion.yaml - Fetching and indexing tests
  • scenarios_retrieval.yaml - Answer command tests with correctness metrics
  • scenarios_answer.yaml - Legacy answer scenarios

📊 Database Fixtures (Major Performance Improvement)

  • ACME Docs dump: Pre-built knowledge graph with 4 documents, 15 entities, 19 links, 11 relationships
  • JSONL format: Human-readable, git-friendly (line-based diffs), streamable
  • Loading time: < 1 second vs 30-60s for fetch+index - 50-60x faster!
  • New tools:
    • load_dump.py - Import JSONL dumps into Kurt projects
    • create_dump.py - Export Kurt projects to JSONL dumps

📝 Documentation

  • Consolidated README: Single comprehensive eval/README.md (370 lines) with all information
  • Removed redundant markdown files: TESTING_SUMMARY, DSPY_OPTIMIZER, CONFIG_AUDIT, scenarios_idea, mock/README, mock/STRUCTURE
  • Updated all paths and usage examples throughout

🧹 Cleanup

  • Removed root-level test files (test_conversation_completion.py, test_workflow_follow.py, test_workflow_queuing.py)
  • Removed duplicate and outdated documentation files

Impact

Before:

eval/mock/
├── websites/acme-corp/, acme-docs/, competitor-co/
├── research/*, analytics/*, cms/*
├── README.md, STRUCTURE.md
└── generate_mock_data.py

After:

eval/mock/
├── data/
│   ├── documents/       # All website content
│   ├── db/              # Fast JSONL dumps
│   └── integrations/    # Research, analytics, CMS data
└── generators/          # All data tools

Benefits

  1. Much faster test setup: 50-60x speedup with JSONL database dumps
  2. Cleaner structure: Logical organization by data type
  3. Better maintainability: Single source of truth for documentation
  4. Git-friendly: JSONL format produces readable diffs
  5. Portable: Easy to create/share database fixtures

Testing

All existing eval scenarios continue to work with updated paths:

uv run kurt-eval list
uv run kurt-eval run 1

Quick database loading:

KURT_TELEMETRY_DISABLED=1 uv run kurt init
python3 eval/mock/generators/load_dump.py acme-docs

🤖 Generated with Claude Code

Reorganize the evaluation framework with a cleaner, more logical structure:

## Mock Data Reorganization
- **New structure**: `data/documents/`, `data/db/`, `data/integrations/`
- **Moved websites**: All website content now in `data/documents/` (acme-corp, acme-docs, competitor-co)
- **Moved integrations**: research, analytics, cms → `data/integrations/`
- **Added database dumps**: JSONL format in `data/db/acme-docs/` (loads in <1s vs 30-60s for fetch+index)
- **Consolidated tools**: All generators and loaders in `generators/` directory

## Scenario Organization
- Split scenarios into logical files:
  - `scenarios_kurt_setup.yaml` - Initialization tests
  - `scenarios_content_ingestion.yaml` - Fetch and indexing tests
  - `scenarios_retrieval.yaml` - Answer command tests with correctness metrics
  - `scenarios_answer.yaml` - Legacy answer scenarios

## Documentation
- **Consolidated README**: Single comprehensive `eval/README.md` (370 lines)
- Removed redundant markdown files (TESTING_SUMMARY, DSPY_OPTIMIZER, CONFIG_AUDIT, etc.)
- Updated all paths and examples to reflect new structure

## Cleanup
- Removed root-level test files (test_conversation_completion.py, test_workflow_follow.py, etc.)
- Removed duplicate markdown documentation
- Cleaned up scenarios_idea.md

## Database Fixtures
- **ACME Docs dump**: 4 documents, 15 entities, 19 links, 11 relationships
- **JSONL format**: Human-readable, git-friendly, streamable
- **Fast loading**: < 1 second vs 30-60s for full fetch+index (50-60x faster!)
- **Tools**: `load_dump.py` and `create_dump.py` for managing dumps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@boringdata boringdata merged commit 99e7eb8 into main Nov 25, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants