backend/models/conversation.py is the highest-coupling file in the backend. Knowledge graph analysis (graphify) on the production codebase (excluding tests) shows: 312 edges, 1,927 total degree, reaches 8 separate communities, and 29 production files import from it. The file contains 20+ models stuffed into 570 lines — not just Conversation, but CategoryEnum, Geolocation, ActionItem, Structured, CalendarMeetingContext, AudioFile, ConversationPhoto, and 10+ request/response models. Any change to this file has an unpredictable blast radius across chat, calendar, notifications, developer API, vector DB, streaming pipeline, phone calls, and fair use.
Current Behavior
models/conversation.py contains 20+ Pydantic models serving different domains (audio, calendar, LLM output, geolocation, photos, action items, categories)
- 29 production files import from this single file — many only need
CategoryEnum or Geolocation, not Conversation itself
- A change to any model in this file triggers review concern across 8 architectural communities
from models.conversation import * appears in utils/conversations/process_conversation.py (wildcard import of the entire file)
- Consumers that only read title/overview/transcript (LLM utils, vector_db, notifications) take the full 20-field
Conversation object
Expected Behavior
Each domain model lives in its own file. Consumers depend on the narrowest interface they need. The Conversation class itself stays intact (maps 1:1 to Firestore document — no schema migration needed).
Affected Areas
| File |
Line |
Description |
backend/models/conversation.py |
1-570 |
Kitchen-sink file with 20+ models |
backend/models/conversation.py |
298-343 |
Conversation class (the god object itself) |
backend/models/conversation.py |
169-204 |
Structured — LLM output shape, used by utils/llm/ |
backend/models/conversation.py |
24-58 |
CategoryEnum — 31 categories, imported standalone by routers/mcp.py |
backend/models/conversation.py |
207-213 |
Geolocation — used by routers/users.py, utils/conversations/location.py |
backend/models/conversation.py |
102-137 |
ActionItem — used by utils/llm/chat.py |
backend/models/conversation.py |
12-21 |
AudioFile — audio storage metadata |
backend/models/conversation.py |
66-88 |
ConversationPhoto — media metadata |
backend/models/conversation.py |
222-235 |
CalendarMeetingContext — calendar-specific |
backend/utils/conversations/process_conversation.py |
1 |
from models.conversation import * wildcard |
backend/database/vector_db.py |
1 |
Imports full Conversation but only uses ID + text |
backend/utils/llm/chat.py |
1 |
Imports Conversation but only reads structured fields |
Solution
Phase 1 — Split the file (low risk, high impact)
Move domain-specific models out of conversation.py into their own files. Add re-exports for backward compatibility.
# backend/models/structured.py (NEW)
from models.conversation_enums import CategoryEnum
class ActionItem(BaseModel):
description: str
completed: bool = False
# ... existing fields
class Event(BaseModel):
title: str
start: datetime
duration: int = 30
# ... existing fields
class Structured(BaseModel):
title: str = ''
overview: str = ''
emoji: str = '🧠'
category: CategoryEnum = CategoryEnum.other
action_items: List[ActionItem] = []
events: List[Event] = []
# backend/models/conversation_enums.py (NEW)
class CategoryEnum(str, Enum):
personal = 'personal'
# ... all 31 categories
class ConversationSource(str, Enum):
omi = 'omi'
# ... all 15 sources
class ConversationStatus(str, Enum): ...
class ConversationVisibility(str, Enum): ...
class PostProcessingStatus(str, Enum): ...
# backend/models/geolocation.py (NEW)
class Geolocation(BaseModel):
google_place_id: Optional[str] = None
latitude: float
longitude: float
address: Optional[str] = None
location_type: Optional[str] = None
# backend/models/audio_file.py (NEW)
class AudioFile(BaseModel):
id: str
uid: str
conversation_id: str
# ... existing fields
# backend/models/calendar_context.py (NEW)
class MeetingParticipant(BaseModel): ...
class CalendarMeetingContext(BaseModel): ...
# backend/models/conversation_photo.py (NEW)
class ConversationPhoto(BaseModel): ...
# backend/models/conversation.py (SIMPLIFIED — keep only Conversation + CRUD models)
from models.structured import Structured, ActionItem, Event
from models.conversation_enums import CategoryEnum, ConversationSource, ConversationStatus, ConversationVisibility
from models.geolocation import Geolocation
from models.audio_file import AudioFile
from models.conversation_photo import ConversationPhoto
from models.calendar_context import CalendarMeetingContext, MeetingParticipant
# Re-export for backward compatibility (remove after Phase 3)
__all__ = [
'Conversation', 'CreateConversation', 'CreateConversationResponse',
'Structured', 'ActionItem', 'Event', 'CategoryEnum',
'Geolocation', 'AudioFile', 'ConversationPhoto',
'CalendarMeetingContext', 'MeetingParticipant',
'ConversationSource', 'ConversationStatus', 'ConversationVisibility',
]
class Conversation(BaseModel):
# ... unchanged, same fields, same Firestore mapping
Phase 2 — Introduce ConversationSummary view (medium risk)
# backend/models/conversation_summary.py (NEW)
class ConversationSummary(BaseModel):
"""Lightweight read-only view for consumers that don't need the full Conversation."""
id: str
title: str
overview: str
category: str
transcript_text: str
created_at: datetime
person_ids: List[str] = []
@classmethod
def from_conversation(cls, c: 'Conversation', **kwargs) -> 'ConversationSummary':
return cls(
id=c.id,
title=c.structured.title,
overview=c.structured.overview,
category=c.structured.category.value,
transcript_text=c.get_transcript(include_timestamps=False),
created_at=c.created_at,
person_ids=c.get_person_ids(),
)
Phase 3 — Migrate callers to narrow interfaces
Gradually update imports across the codebase:
# BEFORE (29 files do this)
from models.conversation import Conversation, CategoryEnum, Geolocation
# AFTER — each file imports only what it needs from the right place
from models.conversation_enums import CategoryEnum # routers/mcp.py
from models.geolocation import Geolocation # utils/conversations/location.py
from models.conversation_summary import ConversationSummary # utils/llm/chat.py
Remove re-exports from conversation.py once all callers are migrated. Remove from models.conversation import * wildcard in process_conversation.py.
Files to Modify
Phase 1 (new files + refactor):
backend/models/structured.py — NEW
backend/models/conversation_enums.py — NEW
backend/models/geolocation.py — NEW
backend/models/audio_file.py — NEW
backend/models/conversation_photo.py — NEW
backend/models/calendar_context.py — NEW
backend/models/conversation.py — simplify, add re-exports
Phase 2 (new view model):
backend/models/conversation_summary.py — NEW
Phase 3 (caller migration, 29 files):
backend/database/conversations.py
backend/database/vector_db.py
backend/routers/conversations.py
backend/routers/developer.py
backend/routers/mcp.py
backend/routers/pusher.py
backend/routers/users.py
backend/routers/chat.py
backend/routers/calendar_meetings.py
backend/routers/sync.py
backend/routers/transcribe.py
backend/routers/speech_profile.py
backend/routers/folders.py
backend/routers/integration.py
backend/utils/llm/chat.py
backend/utils/llm/conversation_processing.py
backend/utils/llm/external_integrations.py
backend/utils/llm/clients.py
backend/utils/llm/openglass.py
backend/utils/conversations/process_conversation.py
backend/utils/conversations/postprocess_conversation.py
backend/utils/conversations/merge_conversations.py
backend/utils/conversations/location.py
backend/utils/apps.py
backend/utils/app_integrations.py
backend/utils/chat.py
backend/utils/imports/limitless.py
backend/database/trends.py
backend/models/message_event.py
Impact
No Firestore schema changes. No API changes. Phase 1 is fully backward-compatible via re-exports — zero risk of runtime breakage. Phase 2-3 are incremental caller migrations that can be done file-by-file across multiple PRs. After completion, the conversation.py coupling should drop from 312 edges / 8 communities to ~50 edges / 2 communities (core CRUD + streaming pipeline only).
This issue was drafted by AI on behalf of @beastoin
backend/models/conversation.pyis the highest-coupling file in the backend. Knowledge graph analysis (graphify) on the production codebase (excluding tests) shows: 312 edges, 1,927 total degree, reaches 8 separate communities, and 29 production files import from it. The file contains 20+ models stuffed into 570 lines — not justConversation, butCategoryEnum,Geolocation,ActionItem,Structured,CalendarMeetingContext,AudioFile,ConversationPhoto, and 10+ request/response models. Any change to this file has an unpredictable blast radius across chat, calendar, notifications, developer API, vector DB, streaming pipeline, phone calls, and fair use.Current Behavior
models/conversation.pycontains 20+ Pydantic models serving different domains (audio, calendar, LLM output, geolocation, photos, action items, categories)CategoryEnumorGeolocation, notConversationitselffrom models.conversation import *appears inutils/conversations/process_conversation.py(wildcard import of the entire file)ConversationobjectExpected Behavior
Each domain model lives in its own file. Consumers depend on the narrowest interface they need. The
Conversationclass itself stays intact (maps 1:1 to Firestore document — no schema migration needed).Affected Areas
backend/models/conversation.pybackend/models/conversation.pyConversationclass (the god object itself)backend/models/conversation.pyStructured— LLM output shape, used byutils/llm/backend/models/conversation.pyCategoryEnum— 31 categories, imported standalone byrouters/mcp.pybackend/models/conversation.pyGeolocation— used byrouters/users.py,utils/conversations/location.pybackend/models/conversation.pyActionItem— used byutils/llm/chat.pybackend/models/conversation.pyAudioFile— audio storage metadatabackend/models/conversation.pyConversationPhoto— media metadatabackend/models/conversation.pyCalendarMeetingContext— calendar-specificbackend/utils/conversations/process_conversation.pyfrom models.conversation import *wildcardbackend/database/vector_db.pyConversationbut only uses ID + textbackend/utils/llm/chat.pyConversationbut only reads structured fieldsSolution
Phase 1 — Split the file (low risk, high impact)
Move domain-specific models out of
conversation.pyinto their own files. Add re-exports for backward compatibility.Phase 2 — Introduce ConversationSummary view (medium risk)
Phase 3 — Migrate callers to narrow interfaces
Gradually update imports across the codebase:
Remove re-exports from
conversation.pyonce all callers are migrated. Removefrom models.conversation import *wildcard inprocess_conversation.py.Files to Modify
Phase 1 (new files + refactor):
backend/models/structured.py— NEWbackend/models/conversation_enums.py— NEWbackend/models/geolocation.py— NEWbackend/models/audio_file.py— NEWbackend/models/conversation_photo.py— NEWbackend/models/calendar_context.py— NEWbackend/models/conversation.py— simplify, add re-exportsPhase 2 (new view model):
backend/models/conversation_summary.py— NEWPhase 3 (caller migration, 29 files):
backend/database/conversations.pybackend/database/vector_db.pybackend/routers/conversations.pybackend/routers/developer.pybackend/routers/mcp.pybackend/routers/pusher.pybackend/routers/users.pybackend/routers/chat.pybackend/routers/calendar_meetings.pybackend/routers/sync.pybackend/routers/transcribe.pybackend/routers/speech_profile.pybackend/routers/folders.pybackend/routers/integration.pybackend/utils/llm/chat.pybackend/utils/llm/conversation_processing.pybackend/utils/llm/external_integrations.pybackend/utils/llm/clients.pybackend/utils/llm/openglass.pybackend/utils/conversations/process_conversation.pybackend/utils/conversations/postprocess_conversation.pybackend/utils/conversations/merge_conversations.pybackend/utils/conversations/location.pybackend/utils/apps.pybackend/utils/app_integrations.pybackend/utils/chat.pybackend/utils/imports/limitless.pybackend/database/trends.pybackend/models/message_event.pyImpact
No Firestore schema changes. No API changes. Phase 1 is fully backward-compatible via re-exports — zero risk of runtime breakage. Phase 2-3 are incremental caller migrations that can be done file-by-file across multiple PRs. After completion, the
conversation.pycoupling should drop from 312 edges / 8 communities to ~50 edges / 2 communities (core CRUD + streaming pipeline only).This issue was drafted by AI on behalf of @beastoin