Problem
Two related issues causing indexing failures in cloud:
1. HTML color codes interpreted as hashtags
In src/basic_memory/markdown/plugins.py:33-34:
has_tags = "#" in content
return bool(match) or has_tags
This is too broad. Content like:
- **<font color="#4285F4">Jane:</font>** Welcome to the deep dive...
The #4285F4 is interpreted as a hashtag, so basic-memory treats this as an observation when it's just a regular list item.
2. Observation permalinks can exceed btree index limit
In src/basic_memory/models/knowledge.py:166-167:
return generate_permalink(
f"{self.entity.permalink}/observations/{self.category}/{self.content}"
)
The full observation content is passed to generate_permalink with no truncation. When observations are long paragraphs (like transcript dialogue), permalinks can be 5000+ bytes, exceeding PostgreSQL's btree index limit of 2704 bytes.
Error
asyncpg.exceptions.ProgramLimitExceededError: index row size 5528 exceeds btree version 4 maximum 2704 for index "uix_search_index_permalink_project"
Suggested Fixes
Fix 1: More specific hashtag detection
# Instead of: has_tags = "#" in content
# Use regex to find proper hashtags, not HTML color codes
import re
has_tags = bool(re.search(r'(?<![0-9a-fA-F])#\w+', content))
# Or even more strict: only match #word not preceded by = or hex chars
Fix 2: Truncate observation content in permalinks
# Truncate content portion to ~200 chars before generating permalink
content_for_permalink = self.content[:200] if len(self.content) > 200 else self.content
return generate_permalink(
f"{self.entity.permalink}/observations/{self.category}/{content_for_permalink}"
)
Context
Discovered during cloud tenant migration with user who has transcript files containing HTML-formatted dialogue. Each dialogue line was being parsed as an observation, and the long observation content created permalinks that exceeded the index limit.
Problem
Two related issues causing indexing failures in cloud:
1. HTML color codes interpreted as hashtags
In
src/basic_memory/markdown/plugins.py:33-34:This is too broad. Content like:
The
#4285F4is interpreted as a hashtag, so basic-memory treats this as an observation when it's just a regular list item.2. Observation permalinks can exceed btree index limit
In
src/basic_memory/models/knowledge.py:166-167:The full observation content is passed to
generate_permalinkwith no truncation. When observations are long paragraphs (like transcript dialogue), permalinks can be 5000+ bytes, exceeding PostgreSQL's btree index limit of 2704 bytes.Error
Suggested Fixes
Fix 1: More specific hashtag detection
Fix 2: Truncate observation content in permalinks
Context
Discovered during cloud tenant migration with user who has transcript files containing HTML-formatted dialogue. Each dialogue line was being parsed as an observation, and the long observation content created permalinks that exceeded the index limit.