Summary
Session transcripts contain pipeline metadata that RAKI currently ignores. This metadata would help users understand what produced each session and filter/group evaluations by orchestrator type.
Current State
SessionMeta (src/raki/model/dataset.py:11-20) has 10 fields. Two are dead: tenant_id and knowledge_version are defined but never populated by any adapter. No pipeline metadata fields exist.
There are two adapters (not four):
SessionSchemaAdapter (src/raki/adapters/session_schema.py) — directory-based sessions (one dir per session with meta.json + events)
AlcoveAdapter (src/raki/adapters/alcove.py) — single-file JSON; handles both Alcove and Bridge formats via is_bridge = "task_id" in raw
Available Metadata
Session-schema sessions (meta.json):
- Phase names and counts already extracted (
total_phases, rework_cycles)
- No pipeline/orchestrator name field in source data
Alcove/Bridge sessions (single JSON):
- Bridge format has:
provider ("Vertex AI"), repos list, submitter ("workflow"), task_name
- Alcove format has:
modelUsage, conversation entries
- Neither has an explicit orchestrator/pipeline name
Proposed
Add optional fields to SessionMeta:
class SessionMeta(BaseModel):
# ... existing fields ...
orchestrator: str | None = None # "soda", "alcove", "bridge", "manual"
provider: str | None = None # "Vertex AI", "Anthropic"
pipeline_phases: list[str] | None = None # ["triage", "plan", "implement", ...]
How to populate
- SessionSchemaAdapter: infer
orchestrator from directory structure or meta.json content. Extract phase names into pipeline_phases (currently only counted, not named).
- AlcoveAdapter: set
orchestrator = "bridge" when is_bridge, "alcove" otherwise. Extract provider from bridge format's provider field.
Cleanup
Consider removing the dead tenant_id and knowledge_version fields, or populate them if there's a source.
Use Cases
- Filter evaluations by orchestrator: "show me only soda sessions"
- Compare orchestrator performance: "does soda produce better results than manual runs?"
- Track pipeline evolution: "did adding the review phase improve quality?"
Summary
Session transcripts contain pipeline metadata that RAKI currently ignores. This metadata would help users understand what produced each session and filter/group evaluations by orchestrator type.
Current State
SessionMeta(src/raki/model/dataset.py:11-20) has 10 fields. Two are dead:tenant_idandknowledge_versionare defined but never populated by any adapter. No pipeline metadata fields exist.There are two adapters (not four):
SessionSchemaAdapter(src/raki/adapters/session_schema.py) — directory-based sessions (one dir per session with meta.json + events)AlcoveAdapter(src/raki/adapters/alcove.py) — single-file JSON; handles both Alcove and Bridge formats viais_bridge = "task_id" in rawAvailable Metadata
Session-schema sessions (meta.json):
total_phases,rework_cycles)Alcove/Bridge sessions (single JSON):
provider("Vertex AI"),reposlist,submitter("workflow"),task_namemodelUsage, conversation entriesProposed
Add optional fields to
SessionMeta:How to populate
orchestratorfrom directory structure or meta.json content. Extract phase names intopipeline_phases(currently only counted, not named).orchestrator = "bridge"whenis_bridge,"alcove"otherwise. Extractproviderfrom bridge format'sproviderfield.Cleanup
Consider removing the dead
tenant_idandknowledge_versionfields, or populate them if there's a source.Use Cases