Skip to content

Gracefully handle files with missing or empty title field during sync #381

@phernandez

Description

@phernandez

Problem

Files with missing or empty title fields in frontmatter cause infinite retry loops during sync, leading to:

  • Database constraint violations every 30 seconds
  • Continuous directory rescanning (e.g., 737 files every 30 seconds)
  • Elevated memory usage from retry state accumulation
  • Poor user experience with no clear error reporting

Evidence

Tenant production-basic-memory-tenant-cb6aad6f showing repeated errors:

Failed to upsert entity for RESEARCH/_Templates/Research Library Card.md: 
(sqlite3.IntegrityError) NOT NULL constraint failed: entity.title
[SQL: INSERT INTO entity (title, entity_type, ...) VALUES (?, ?, ...)]
[parameters: (None, 'note', ...)]

This file is detected as "changed" on every 30-second sync cycle because it never successfully syncs.

Root Cause

The file has either:

  1. No title: field in frontmatter
  2. Empty title: field (title: with no value)
  3. title: null or similar

Current sync behavior:

  • Tries to insert with title=None
  • Hits NOT NULL constraint
  • Logs error but doesn't skip file permanently
  • Circuit breaker doesn't trigger because each 30-second sync is treated as a fresh attempt

Expected Behavior

When encountering a file with missing/empty title:

  1. Validation: Detect missing title during entity creation
  2. Generate default: Create a title from filename (e.g., "Research Library Card" from "Research Library Card.md")
  3. Log warning: Inform user that title was auto-generated
  4. Succeed sync: Don't retry indefinitely
  5. Optional: Add to sync report as "files with auto-generated titles"

Alternative Approach

If we want to keep strict validation:

  1. Skip file permanently: Add to failure cache with permanent skip flag
  2. Report to user: Include in sync status endpoint with actionable error
  3. Don't retry: Circuit breaker should prevent infinite retries

Impact

High - Affects production tenants with template files, research notes, or any files with missing frontmatter. Causes:

  • Memory pressure from continuous scanning
  • Database I/O waste
  • Poor system resource utilization

Files to Modify

  • src/basic_memory/services/entity_service.py - Add title generation/validation
  • src/basic_memory/sync/sync_service.py - Enhance circuit breaker for constraint violations
  • src/basic_memory/file_utils.py - Add generate_title_from_filename() helper

Example Fix

def generate_title_from_filename(file_path: Path) -> str:
    """Generate a title from filename when frontmatter is missing title."""
    # Remove extension and clean up
    title = file_path.stem
    # Replace underscores/hyphens with spaces
    title = re.sub(r'[_-]+', ' ', title)
    return title.strip()

# In entity creation:
if not frontmatter_data.get('title'):
    title = generate_title_from_filename(file_path)
    logger.warning(
        f"Missing title in {file_path}, auto-generated: {title}"
    )
    frontmatter_data['title'] = title

Priority

P1 - Affects production stability and resource usage

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions