Skip to content

feat: add pipeline/orchestrator metadata to session adapter #175

@decko

Description

@decko

Summary

Session transcripts contain pipeline metadata that RAKI currently ignores. This metadata would help users understand what produced each session and filter/group evaluations by orchestrator type.

Current State

SessionMeta (src/raki/model/dataset.py:11-20) has 10 fields. Two are dead: tenant_id and knowledge_version are defined but never populated by any adapter. No pipeline metadata fields exist.

There are two adapters (not four):

  • SessionSchemaAdapter (src/raki/adapters/session_schema.py) — directory-based sessions (one dir per session with meta.json + events)
  • AlcoveAdapter (src/raki/adapters/alcove.py) — single-file JSON; handles both Alcove and Bridge formats via is_bridge = "task_id" in raw

Available Metadata

Session-schema sessions (meta.json):

  • Phase names and counts already extracted (total_phases, rework_cycles)
  • No pipeline/orchestrator name field in source data

Alcove/Bridge sessions (single JSON):

  • Bridge format has: provider ("Vertex AI"), repos list, submitter ("workflow"), task_name
  • Alcove format has: modelUsage, conversation entries
  • Neither has an explicit orchestrator/pipeline name

Proposed

Add optional fields to SessionMeta:

class SessionMeta(BaseModel):
    # ... existing fields ...
    orchestrator: str | None = None       # "soda", "alcove", "bridge", "manual"
    provider: str | None = None           # "Vertex AI", "Anthropic"
    pipeline_phases: list[str] | None = None  # ["triage", "plan", "implement", ...]

How to populate

  • SessionSchemaAdapter: infer orchestrator from directory structure or meta.json content. Extract phase names into pipeline_phases (currently only counted, not named).
  • AlcoveAdapter: set orchestrator = "bridge" when is_bridge, "alcove" otherwise. Extract provider from bridge format's provider field.

Cleanup

Consider removing the dead tenant_id and knowledge_version fields, or populate them if there's a source.

Use Cases

  • Filter evaluations by orchestrator: "show me only soda sessions"
  • Compare orchestrator performance: "does soda produce better results than manual runs?"
  • Track pipeline evolution: "did adding the review phase improve quality?"

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions