Skip to content

Observation parsing too broad: HTML color codes interpreted as hashtags + permalink too long #446

@phernandez

Description

@phernandez

Problem

Two related issues causing indexing failures in cloud:

1. HTML color codes interpreted as hashtags

In src/basic_memory/markdown/plugins.py:33-34:

has_tags = "#" in content
return bool(match) or has_tags

This is too broad. Content like:

- **<font color="#4285F4">Jane:</font>** Welcome to the deep dive...

The #4285F4 is interpreted as a hashtag, so basic-memory treats this as an observation when it's just a regular list item.

2. Observation permalinks can exceed btree index limit

In src/basic_memory/models/knowledge.py:166-167:

return generate_permalink(
    f"{self.entity.permalink}/observations/{self.category}/{self.content}"
)

The full observation content is passed to generate_permalink with no truncation. When observations are long paragraphs (like transcript dialogue), permalinks can be 5000+ bytes, exceeding PostgreSQL's btree index limit of 2704 bytes.

Error

asyncpg.exceptions.ProgramLimitExceededError: index row size 5528 exceeds btree version 4 maximum 2704 for index "uix_search_index_permalink_project"

Suggested Fixes

Fix 1: More specific hashtag detection

# Instead of: has_tags = "#" in content
# Use regex to find proper hashtags, not HTML color codes
import re
has_tags = bool(re.search(r'(?<![0-9a-fA-F])#\w+', content))
# Or even more strict: only match #word not preceded by = or hex chars

Fix 2: Truncate observation content in permalinks

# Truncate content portion to ~200 chars before generating permalink
content_for_permalink = self.content[:200] if len(self.content) > 200 else self.content
return generate_permalink(
    f"{self.entity.permalink}/observations/{self.category}/{content_for_permalink}"
)

Context

Discovered during cloud tenant migration with user who has transcript files containing HTML-formatted dialogue. Each dialogue line was being parsed as an observation, and the long observation content created permalinks that exceeded the index limit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions