feat: Reorganize eval framework with clean mock data structure by boringdata · Pull Request #44 · boringdata/kurt-core

boringdata · 2025-11-25T11:08:15Z

Summary

Reorganize the evaluation framework with a cleaner, more logical directory structure and improved tooling for faster test setup.

Key Changes

📁 Mock Data Reorganization

New structure: Organized into data/documents/, data/db/, data/integrations/
Moved websites: All website content consolidated in data/documents/ (acme-corp, acme-docs, competitor-co)
Moved integrations: research, analytics, cms → data/integrations/
Removed fixtures/: Eliminated redundant fixtures directory structure
Consolidated tools: All generators and loaders in generators/ directory

🗂️ Scenario Organization

Split monolithic scenarios file into logical, focused files:

scenarios_kurt_setup.yaml - Initialization and configuration tests
scenarios_content_ingestion.yaml - Fetching and indexing tests
scenarios_retrieval.yaml - Answer command tests with correctness metrics
scenarios_answer.yaml - Legacy answer scenarios

📊 Database Fixtures (Major Performance Improvement)

ACME Docs dump: Pre-built knowledge graph with 4 documents, 15 entities, 19 links, 11 relationships
JSONL format: Human-readable, git-friendly (line-based diffs), streamable
Loading time: < 1 second vs 30-60s for fetch+index - 50-60x faster!
New tools:
- load_dump.py - Import JSONL dumps into Kurt projects
- create_dump.py - Export Kurt projects to JSONL dumps

📝 Documentation

Consolidated README: Single comprehensive eval/README.md (370 lines) with all information
Removed redundant markdown files: TESTING_SUMMARY, DSPY_OPTIMIZER, CONFIG_AUDIT, scenarios_idea, mock/README, mock/STRUCTURE
Updated all paths and usage examples throughout

🧹 Cleanup

Removed root-level test files (test_conversation_completion.py, test_workflow_follow.py, test_workflow_queuing.py)
Removed duplicate and outdated documentation files

Impact

Before:

eval/mock/
├── websites/acme-corp/, acme-docs/, competitor-co/
├── research/*, analytics/*, cms/*
├── README.md, STRUCTURE.md
└── generate_mock_data.py

After:

eval/mock/
├── data/
│   ├── documents/       # All website content
│   ├── db/              # Fast JSONL dumps
│   └── integrations/    # Research, analytics, CMS data
└── generators/          # All data tools

Benefits

Much faster test setup: 50-60x speedup with JSONL database dumps
Cleaner structure: Logical organization by data type
Better maintainability: Single source of truth for documentation
Git-friendly: JSONL format produces readable diffs
Portable: Easy to create/share database fixtures

Testing

All existing eval scenarios continue to work with updated paths:

uv run kurt-eval list
uv run kurt-eval run 1

Quick database loading:

KURT_TELEMETRY_DISABLED=1 uv run kurt init
python3 eval/mock/generators/load_dump.py acme-docs

🤖 Generated with Claude Code

Reorganize the evaluation framework with a cleaner, more logical structure: ## Mock Data Reorganization - **New structure**: `data/documents/`, `data/db/`, `data/integrations/` - **Moved websites**: All website content now in `data/documents/` (acme-corp, acme-docs, competitor-co) - **Moved integrations**: research, analytics, cms → `data/integrations/` - **Added database dumps**: JSONL format in `data/db/acme-docs/` (loads in <1s vs 30-60s for fetch+index) - **Consolidated tools**: All generators and loaders in `generators/` directory ## Scenario Organization - Split scenarios into logical files: - `scenarios_kurt_setup.yaml` - Initialization tests - `scenarios_content_ingestion.yaml` - Fetch and indexing tests - `scenarios_retrieval.yaml` - Answer command tests with correctness metrics - `scenarios_answer.yaml` - Legacy answer scenarios ## Documentation - **Consolidated README**: Single comprehensive `eval/README.md` (370 lines) - Removed redundant markdown files (TESTING_SUMMARY, DSPY_OPTIMIZER, CONFIG_AUDIT, etc.) - Updated all paths and examples to reflect new structure ## Cleanup - Removed root-level test files (test_conversation_completion.py, test_workflow_follow.py, etc.) - Removed duplicate markdown documentation - Cleaned up scenarios_idea.md ## Database Fixtures - **ACME Docs dump**: 4 documents, 15 entities, 19 links, 11 relationships - **JSONL format**: Human-readable, git-friendly, streamable - **Fast loading**: < 1 second vs 30-60s for full fetch+index (50-60x faster!) - **Tools**: `load_dump.py` and `create_dump.py` for managing dumps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

boringdata merged commit 99e7eb8 into main Nov 25, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Reorganize eval framework with clean mock data structure#44

feat: Reorganize eval framework with clean mock data structure#44
boringdata merged 1 commit intomainfrom
eval-answer

boringdata commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boringdata commented Nov 25, 2025

Summary

Key Changes

📁 Mock Data Reorganization

🗂️ Scenario Organization

📊 Database Fixtures (Major Performance Improvement)

📝 Documentation

🧹 Cleanup

Impact

Benefits

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants