Skip to content

Decouple Conversation god object in backend/models/conversation.py #6423

@beastoin

Description

@beastoin

backend/models/conversation.py is the highest-coupling file in the backend. Knowledge graph analysis (graphify) on the production codebase (excluding tests) shows: 312 edges, 1,927 total degree, reaches 8 separate communities, and 29 production files import from it. The file contains 20+ models stuffed into 570 lines — not just Conversation, but CategoryEnum, Geolocation, ActionItem, Structured, CalendarMeetingContext, AudioFile, ConversationPhoto, and 10+ request/response models. Any change to this file has an unpredictable blast radius across chat, calendar, notifications, developer API, vector DB, streaming pipeline, phone calls, and fair use.

Current Behavior

  • models/conversation.py contains 20+ Pydantic models serving different domains (audio, calendar, LLM output, geolocation, photos, action items, categories)
  • 29 production files import from this single file — many only need CategoryEnum or Geolocation, not Conversation itself
  • A change to any model in this file triggers review concern across 8 architectural communities
  • from models.conversation import * appears in utils/conversations/process_conversation.py (wildcard import of the entire file)
  • Consumers that only read title/overview/transcript (LLM utils, vector_db, notifications) take the full 20-field Conversation object

Expected Behavior

Each domain model lives in its own file. Consumers depend on the narrowest interface they need. The Conversation class itself stays intact (maps 1:1 to Firestore document — no schema migration needed).

Affected Areas

File Line Description
backend/models/conversation.py 1-570 Kitchen-sink file with 20+ models
backend/models/conversation.py 298-343 Conversation class (the god object itself)
backend/models/conversation.py 169-204 Structured — LLM output shape, used by utils/llm/
backend/models/conversation.py 24-58 CategoryEnum — 31 categories, imported standalone by routers/mcp.py
backend/models/conversation.py 207-213 Geolocation — used by routers/users.py, utils/conversations/location.py
backend/models/conversation.py 102-137 ActionItem — used by utils/llm/chat.py
backend/models/conversation.py 12-21 AudioFile — audio storage metadata
backend/models/conversation.py 66-88 ConversationPhoto — media metadata
backend/models/conversation.py 222-235 CalendarMeetingContext — calendar-specific
backend/utils/conversations/process_conversation.py 1 from models.conversation import * wildcard
backend/database/vector_db.py 1 Imports full Conversation but only uses ID + text
backend/utils/llm/chat.py 1 Imports Conversation but only reads structured fields

Solution

Phase 1 — Split the file (low risk, high impact)

Move domain-specific models out of conversation.py into their own files. Add re-exports for backward compatibility.

# backend/models/structured.py (NEW)
from models.conversation_enums import CategoryEnum

class ActionItem(BaseModel):
    description: str
    completed: bool = False
    # ... existing fields

class Event(BaseModel):
    title: str
    start: datetime
    duration: int = 30
    # ... existing fields

class Structured(BaseModel):
    title: str = ''
    overview: str = ''
    emoji: str = '🧠'
    category: CategoryEnum = CategoryEnum.other
    action_items: List[ActionItem] = []
    events: List[Event] = []
# backend/models/conversation_enums.py (NEW)
class CategoryEnum(str, Enum):
    personal = 'personal'
    # ... all 31 categories

class ConversationSource(str, Enum):
    omi = 'omi'
    # ... all 15 sources

class ConversationStatus(str, Enum): ...
class ConversationVisibility(str, Enum): ...
class PostProcessingStatus(str, Enum): ...
# backend/models/geolocation.py (NEW)
class Geolocation(BaseModel):
    google_place_id: Optional[str] = None
    latitude: float
    longitude: float
    address: Optional[str] = None
    location_type: Optional[str] = None
# backend/models/audio_file.py (NEW)
class AudioFile(BaseModel):
    id: str
    uid: str
    conversation_id: str
    # ... existing fields
# backend/models/calendar_context.py (NEW)
class MeetingParticipant(BaseModel): ...
class CalendarMeetingContext(BaseModel): ...
# backend/models/conversation_photo.py (NEW)
class ConversationPhoto(BaseModel): ...
# backend/models/conversation.py (SIMPLIFIED — keep only Conversation + CRUD models)
from models.structured import Structured, ActionItem, Event
from models.conversation_enums import CategoryEnum, ConversationSource, ConversationStatus, ConversationVisibility
from models.geolocation import Geolocation
from models.audio_file import AudioFile
from models.conversation_photo import ConversationPhoto
from models.calendar_context import CalendarMeetingContext, MeetingParticipant

# Re-export for backward compatibility (remove after Phase 3)
__all__ = [
    'Conversation', 'CreateConversation', 'CreateConversationResponse',
    'Structured', 'ActionItem', 'Event', 'CategoryEnum',
    'Geolocation', 'AudioFile', 'ConversationPhoto',
    'CalendarMeetingContext', 'MeetingParticipant',
    'ConversationSource', 'ConversationStatus', 'ConversationVisibility',
]

class Conversation(BaseModel):
    # ... unchanged, same fields, same Firestore mapping

Phase 2 — Introduce ConversationSummary view (medium risk)

# backend/models/conversation_summary.py (NEW)
class ConversationSummary(BaseModel):
    """Lightweight read-only view for consumers that don't need the full Conversation."""
    id: str
    title: str
    overview: str
    category: str
    transcript_text: str
    created_at: datetime
    person_ids: List[str] = []

    @classmethod
    def from_conversation(cls, c: 'Conversation', **kwargs) -> 'ConversationSummary':
        return cls(
            id=c.id,
            title=c.structured.title,
            overview=c.structured.overview,
            category=c.structured.category.value,
            transcript_text=c.get_transcript(include_timestamps=False),
            created_at=c.created_at,
            person_ids=c.get_person_ids(),
        )

Phase 3 — Migrate callers to narrow interfaces

Gradually update imports across the codebase:

# BEFORE (29 files do this)
from models.conversation import Conversation, CategoryEnum, Geolocation

# AFTER — each file imports only what it needs from the right place
from models.conversation_enums import CategoryEnum    # routers/mcp.py
from models.geolocation import Geolocation            # utils/conversations/location.py
from models.conversation_summary import ConversationSummary  # utils/llm/chat.py

Remove re-exports from conversation.py once all callers are migrated. Remove from models.conversation import * wildcard in process_conversation.py.

Files to Modify

Phase 1 (new files + refactor):

  • backend/models/structured.py — NEW
  • backend/models/conversation_enums.py — NEW
  • backend/models/geolocation.py — NEW
  • backend/models/audio_file.py — NEW
  • backend/models/conversation_photo.py — NEW
  • backend/models/calendar_context.py — NEW
  • backend/models/conversation.py — simplify, add re-exports

Phase 2 (new view model):

  • backend/models/conversation_summary.py — NEW

Phase 3 (caller migration, 29 files):

  • backend/database/conversations.py
  • backend/database/vector_db.py
  • backend/routers/conversations.py
  • backend/routers/developer.py
  • backend/routers/mcp.py
  • backend/routers/pusher.py
  • backend/routers/users.py
  • backend/routers/chat.py
  • backend/routers/calendar_meetings.py
  • backend/routers/sync.py
  • backend/routers/transcribe.py
  • backend/routers/speech_profile.py
  • backend/routers/folders.py
  • backend/routers/integration.py
  • backend/utils/llm/chat.py
  • backend/utils/llm/conversation_processing.py
  • backend/utils/llm/external_integrations.py
  • backend/utils/llm/clients.py
  • backend/utils/llm/openglass.py
  • backend/utils/conversations/process_conversation.py
  • backend/utils/conversations/postprocess_conversation.py
  • backend/utils/conversations/merge_conversations.py
  • backend/utils/conversations/location.py
  • backend/utils/apps.py
  • backend/utils/app_integrations.py
  • backend/utils/chat.py
  • backend/utils/imports/limitless.py
  • backend/database/trends.py
  • backend/models/message_event.py

Impact

No Firestore schema changes. No API changes. Phase 1 is fully backward-compatible via re-exports — zero risk of runtime breakage. Phase 2-3 are incremental caller migrations that can be done file-by-file across multiple PRs. After completion, the conversation.py coupling should drop from 312 edges / 8 communities to ~50 edges / 2 communities (core CRUD + streaming pipeline only).


This issue was drafted by AI on behalf of @beastoin

Metadata

Metadata

Assignees

No one assigned

    Labels

    p2Priority: Important (score 14-21)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions