fix(sync): preserve canonical markdown in single-file sync by phernandez · Pull Request #746 · basicmachines-co/basic-memory

phernandez · 2026-04-15T23:59:35Z

Summary

add a one-file markdown sync path that returns the canonical persisted markdown, checksum, and entity state
reconcile permalink conflicts after markdown upserts so frontmatter is updated to the accepted permalink
cover the single-file sync and batch indexer behavior with focused regression tests

Testing

user ran tests locally

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-04-16T00:00:13Z

Claude finished @phernandez's task in 2m 39s —— View job

PR Review: fix(sync): preserve canonical markdown in single-file sync

Gather context and understand the request
Read changed files
Review against team checklist
Post full review findings

Summary

This PR adds a dedicated single-file markdown sync primitive (sync_one_markdown_file) that returns the canonical, post-rewrite content/checksum/entity state, and adds a permalink reconciliation step after upsert to handle late DB-level conflicts. The design is clean and the test coverage targets the right regression scenarios.

Code Quality & Standards

Item	Status	Notes
Coding conventions	✅	async patterns, dataclasses, type annotations throughout
Python 3.12+ types	✅
SQLAlchemy 2.0	✅
100-char line length	✅
No commented-out code	✅
Literate comments	✅	Trigger/Why/Outcome pattern used consistently

Testing & Documentation

Item	Status	Notes
Unit tests for new methods	✅	`test_sync_one_markdown_file.py`, `test_batch_indexer.py` additions
Edge cases covered	✅	No-rewrite path, permalink conflict, checksum source
100% coverage intent	⚠️	See notes below
Integration tests	✅	Tests use real services

Architecture

Item	Status	Notes
Protocol-based `IndexFileWriter`	✅	Clean abstraction boundary
Adapter `_FileServiceIndexWriter`	✅	Prevents indexing layer from importing FileService
`SyncedMarkdownFile` placement	✅	Lives in `indexing/models.py` with `TYPE_CHECKING` guard for Entity
Knowledge graph consistency	✅	Relations resolved before returning
Markdown file integrity	✅	Frontmatter reconciled post-upsert

Issues Found

1. Redundant `entity_repository.update()` in `sync_one_markdown_file` — missing comment

src/basic_memory/sync/sync_service.py L1099–1114

batch_indexer.index_markdown_file() calls _persist_markdown_file() which already calls entity_repository.update() with metadata from the initial IndexInputFile. Then sync_one_markdown_file() immediately calls entity_repository.update() again with freshly re-read file metadata. This second write is intentional — it captures the post-rewrite mtime/size — but there's no comment explaining why. Without it, the pattern looks like an accidental double-write.

# Sync one markdown file, apply metadata from rewritten file on disk
updated_entity = await self.entity_repository.update(
    refreshed_entities[0].id,
    {
        "checksum": indexed.checksum,
        "created_at": file_metadata.created_at,
        ...
    },
)

Suggest adding a Trigger/Why/Outcome comment: the initial metadata came from the pre-rewrite file; this second update captures mtime/size from the file as it exists after any frontmatter rewrite.

Fix this →

2. `_reconcile_persisted_permalink` mixes mutation and value semantics

src/basic_memory/indexing/batch_indexer.py L560–591

The method mutates prepared.markdown.frontmatter.metadata["permalink"] in-place (L578) while also returning a new _PreparedMarkdownFile wrapper. Since the markdown field is shared by reference, callers holding a reference to the original prepared will also see the mutation. The same pattern appears in _normalize_markdown_file (L319). This is harmless in the current call graph but is a subtle API footgun — either mutate in-place exclusively, or always return a fully new object. A comment noting the intentional mutation would help.

3. `_build_prepared_entity` can pass frontmatter YAML as `search_content`

src/basic_memory/indexing/batch_indexer.py L604–609

search_content=(
    prepared.markdown.content
    if prepared.markdown.content is not None
    else prepared.content  # full file including frontmatter
),

When prepared.markdown.content is None (body-less file), prepared.content (the full markdown with frontmatter YAML) is sent to the search index. This is pre-existing, but the new _build_prepared_entity helper makes it more prominent. Consider using remove_frontmatter(prepared.content) as the fallback to avoid indexing YAML metadata. This would match the explicit remove_frontmatter(final_markdown_content) used in sync_one_markdown_file (L1119).

Fix this →

4. Minor: `pragma: no cover` for the outer exception in `update_frontmatter_with_result`

src/basic_memory/services/file_service.py L507–515

The outer except Exception as e: # pragma: no cover is correct (hard to trigger outside of platform I/O failures). No action needed, just noting it is appropriately guarded.

5. Test gap: `index_search=True` path in `test_sync_one_markdown_file_writes_missing_frontmatter_and_returns_canonical_content`

The test patches index_entity_data and asserts it is called with content=remove_frontmatter(final_content). This is good. The only missing assertion is that result.content_type equals "text/markdown". Minor, not blocking.

Checklist Results

Code Quality & Standards

Follows Basic Memory coding conventions
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI/Typer conventions followed (N/A here)
100-character line length maintained
No commented-out code

Testing & Documentation

Unit tests for new functions/methods
Integration tests (real services, no mocks)
Edge cases covered
[⚠️] 100% coverage — update_frontmatter_with_result outer except is pragma: no cover; defensible
Documentation updated — no README/docstring changes needed for this fix

Basic Memory Architecture

Atomic, composable MCP design maintained
No database schema changes (no migration needed)
Local-first architecture preserved
Knowledge graph consistency maintained
Markdown file integrity preserved

Security & Performance

No hardcoded secrets
Input validation present
Proper error handling
[⚠️] Double DB write per markdown sync — intentional but undocumented
No sensitive data in logs

Compatibility

Windows: Path objects used for file ops; as_posix() used in scan paths
No emoji/unicode in log output

Verdict

The core fix is sound and the test coverage is focused and well-targeted. Issues #1 and #3 above are the most worth addressing: the missing comment on the double-update makes the code harder to reason about, and the frontmatter fallback in search_content is a latent correctness issue. Issue #2 is low risk given the current call graph but worth a comment.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecaf247168

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-16T00:05:24Z

-            content=file_content,
-            mtime=file_metadata.modified_at.timestamp(),
-            ctime=file_metadata.created_at.timestamp(),
+        initial_checksum = await compute_checksum(initial_markdown_content)


Compute single-file checksum from file bytes

Using compute_checksum(initial_markdown_content) hashes the decoded text, not the actual file bytes. For markdown files with CRLF (or any encoding/newline normalization), this produces a different checksum than FileService.compute_checksum(path), so sync_one_markdown_file can persist a checksum that does not match disk when no frontmatter rewrite occurs. That breaks checksum-based move/modify detection and can cause repeated unnecessary sync work.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-16T00:05:24Z

+        if (
+            self.app_config.disable_permalinks
+            or entity.permalink is None
+            or entity.permalink == prepared.markdown.frontmatter.permalink
+        ):


Skip permalink rewrite when frontmatter enforcement is off

This reconciliation branch rewrites frontmatter whenever the persisted entity permalink differs, but it does not check whether the source file had frontmatter. If ensure_frontmatter_on_sync is False, syncing an existing frontmatterless note still triggers write_frontmatter because the DB entity has a permalink and parsed frontmatter permalink is None, which unexpectedly mutates files and violates the no-frontmatter-enforcement behavior.

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>

phernandez · 2026-04-16T00:07:43Z

Addressed the two actionable bits from this review:

Added the Trigger/Why/Outcome comment around the post-rewrite metadata refresh in sync_one_markdown_file.
Switched the _build_prepared_entity() fallback to strip frontmatter before indexing search content, and added a regression for a frontmatter-only note.

I left the mutation/value-semantics note alone since it stays inside the prepared-file normalization path, and skipped the extra content_type assertion suggestion.

github-actions · 2026-04-16T00:07:56Z

Claude finished @phernandez's task in 3m 17s —— View job

PR Review: fix(sync): preserve canonical markdown in single-file sync

Gather context and understand the request
Read changed files
Evaluate previous review items and new Codex findings
Post full review findings

Follow-up Review

Good follow-through on both actionable items from the first pass:

Previous Issue	Status
#1 — Missing Trigger/Why/Outcome on double `entity_repository.update()`	✅ Added in `ee92812`
#3 — Frontmatter YAML in `search_content` fallback	✅ Fixed + regression test added in `ee92812`
#2 — Mutation/value semantics note	Intentionally deferred — acceptable

Codex Automated Review Findings

P1 — Checksum computed from decoded text, not raw bytes

src/basic_memory/sync/sync_service.py L1080

initial_checksum = await compute_checksum(initial_markdown_content)

compute_checksum(str) calls .encode() which normalizes CRLF → LF before hashing. FileService.compute_checksum(path) hashes raw binary chunks. On Windows with CRLF files, these produce different digests.

sync_one_markdown_file stores the text-based checksum into the DB; the watcher/scanner uses file_service.compute_checksum(path) (binary). If the two diverge, every scan iteration would flag the file as "modified" and re-sync it unnecessarily.

That said, this is pre-existing behavior inherited from the old sync_markdown_file and the project primarily targets macOS/Linux. Not blocking, but worth a note in the sync_one_markdown_file docstring or a follow-up issue.

P2 — `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync`

src/basic_memory/indexing/batch_indexer.py L565–591

if (
    self.app_config.disable_permalinks
    or entity.permalink is None
    or entity.permalink == prepared.markdown.frontmatter.permalink
):
    return prepared  # skip rewrite

_normalize_markdown_file correctly guards both write paths:

# batch-normalize path
if not prepared.file_contains_frontmatter and self.app_config.ensure_frontmatter_on_sync:
    ...
elif prepared.file_contains_frontmatter and not self.app_config.disable_permalinks:
    ...

But _reconcile_persisted_permalink only checks disable_permalinks. When ensure_frontmatter_on_sync=False and a file has no frontmatter:

_normalize_markdown_file correctly leaves the file unchanged.
The DB entity receives a generated permalink.
_reconcile_persisted_permalink sees entity.permalink != None and prepared.markdown.frontmatter.permalink == None → condition is False → proceeds to call write_frontmatter.
Frontmatter is injected into a file the user configured to be frontmatter-free.

This is a bug introduced by this PR (new code path). The fix is one extra guard:

if (
    self.app_config.disable_permalinks
    or (not prepared.file_contains_frontmatter and not self.app_config.ensure_frontmatter_on_sync)
    or entity.permalink is None
    or entity.permalink == prepared.markdown.frontmatter.permalink
):
    return prepared

Fix this →

Checklist Results

Code Quality & Standards

Coding conventions (async, dataclasses, type annotations)
Python 3.12+ types
SQLAlchemy 2.0
100-char line length
No commented-out code
Literate Trigger/Why/Outcome comments

Testing & Documentation

Unit tests for new functions
Integration tests use real services
Edge cases covered
Regression test for frontmatter-only search content

Architecture

IndexFileWriter protocol boundary clean
SyncedMarkdownFile in indexing/models.py
Knowledge graph consistency maintained

Security & Performance

No hardcoded secrets
Proper error handling
Intentional double DB write is now documented

Compatibility

Path objects used for file ops
[⚠️] Text-based checksum diverges from binary on Windows/CRLF (P1 — pre-existing, non-blocking)
[⚠️] _reconcile_persisted_permalink ignores ensure_frontmatter_on_sync (P2 — blocking)

Verdict

The two requested fixes from the first review are clean. The P2 Codex finding is a real correctness bug in new code: _reconcile_persisted_permalink can inject frontmatter into files that the user configured to be frontmatter-free when ensure_frontmatter_on_sync=False. That one-line guard is worth adding before merge. P1 is pre-existing and can be tracked as a follow-up.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee9281209f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-16T00:14:07Z

+            await self.search_service.index_entity_data(
+                updated_entity,
+                content=remove_frontmatter(final_markdown_content),
+            )


Avoid strict frontmatter stripping in one-file search indexing

sync_one_markdown_file(..., index_search=True) now builds search content with remove_frontmatter(final_markdown_content), but remove_frontmatter raises ParseError when text starts with --- and has no closing delimiter. That can happen for valid markdown that begins with a thematic break (or malformed frontmatter when enforcement is disabled), so this path now fails hard after the entity upsert instead of completing sync. Using parser-derived body content (or handling this parse error) would keep one-file sync robust for these inputs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-16T00:14:07Z

+        updated = await self.entity_repository.update(
+            entity.id,
+            self._entity_metadata_updates(prepared.file, prepared.final_checksum),
+        )


Refresh metadata after permalink reconciliation writes

When _reconcile_persisted_permalink rewrites frontmatter, the on-disk file mtime/size can change, but _persist_markdown_file still calls _entity_metadata_updates(prepared.file, ...) using the original IndexInputFile metadata from before that write. In the index_files batch path this leaves stale updated_at/mtime/size in the entity row, which causes future scans to repeatedly treat the file as metadata-changed and re-hash it unnecessarily (and can skew recency ordering).

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-04-16T00:34:27Z

Claude finished @phernandez's task in 2m 9s —— View job

PR Review: fix(sync): preserve canonical markdown in single-file sync

Gather context and understand the request
Read changed files and evaluate new commits
Assess open issues from previous reviews
Post full review findings

Follow-up Review (commit `8ef292d`)

Good progress on the Codex P1 finding — 8ef292d now passes raw bytes to compute_checksum so the checksum in the DB matches what the watcher/scanner sees on disk.

Previous Issue	Status
P1 — Checksum from decoded text, not raw bytes	✅ Fixed in `8ef292d` — `compute_checksum(initial_markdown_bytes)`
P2 — `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync`	❌ Still open
Codex P2 — `remove_frontmatter` raises `ParseError` for thematic-break files	❌ Still open
Codex P2 — Stale mtime/size after `_reconcile_persisted_permalink` rewrites	❌ Still open

Remaining Open Issues

1. `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync` — blocking

src/basic_memory/indexing/batch_indexer.py L565–570

if (
    self.app_config.disable_permalinks
    or entity.permalink is None
    or entity.permalink == prepared.markdown.frontmatter.permalink
):
    return prepared   # skip rewrite

When ensure_frontmatter_on_sync=False and a file has no frontmatter block, _normalize_markdown_file correctly leaves the file untouched — but the upsert still generates a permalink for the DB entity. _reconcile_persisted_permalink then sees entity.permalink != None and prepared.markdown.frontmatter.permalink == None, evaluates the condition as False, and calls write_frontmatter, silently injecting frontmatter that the user opted out of.

The one-line fix (same as flagged last review):

if (
    self.app_config.disable_permalinks
    or (not prepared.file_contains_frontmatter and not self.app_config.ensure_frontmatter_on_sync)
    or entity.permalink is None
    or entity.permalink == prepared.markdown.frontmatter.permalink
):
    return prepared

Fix this →

2. `remove_frontmatter` raises `ParseError` for thematic-break files — blocking

src/basic_memory/sync/sync_service.py L1131

await self.search_service.index_entity_data(
    updated_entity,
    content=remove_frontmatter(final_markdown_content),
)

remove_frontmatter raises ParseError when content starts with --- but has no closing --- delimiter (file_utils.py L392–393). Valid markdown files can start with a thematic break (---), and malformed frontmatter that slipped through with ensure_frontmatter_on_sync=False would also trigger this. The result is a hard failure after the entity upsert and DB write — leaving the entity in the DB without a search index entry.

The fix is to catch ParseError from remove_frontmatter and fall back to the raw content, or use the parsed body from the markdown model instead:

try:
    search_body = remove_frontmatter(final_markdown_content)
except ParseError:
    search_body = final_markdown_content
await self.search_service.index_entity_data(updated_entity, content=search_body)

Alternatively, use indexed.search_content if it's accessible on the _PreparedMarkdownFile result, which already went through the body extraction path.

Fix this →

3. Stale mtime/size after `_reconcile_persisted_permalink` rewrites — low priority

src/basic_memory/indexing/batch_indexer.py L552–555

prepared = await self._reconcile_persisted_permalink(prepared, entity)
updated = await self.entity_repository.update(
    entity.id,
    self._entity_metadata_updates(prepared.file, prepared.final_checksum),
)

When _reconcile_persisted_permalink rewrites frontmatter, the file's mtime/size on disk change. But _entity_metadata_updates(prepared.file, ...) uses prepared.file.size and prepared.file.last_modified from the original IndexInputFile — the pre-rewrite values. This means follow-up scans see a DB mtime that doesn't match the filesystem and re-hash the file on every subsequent scan until the next full sync overwrites it.

This is lower priority than #1 and #2 (it doesn't corrupt data, just causes extra work), but worth tracking. The sync_one_markdown_file path already solves this correctly by re-reading metadata after indexing (L1108). The batch path would need a similar refresh, or the _reconcile_persisted_permalink return value could carry the new mtime/size from the write result.

Checklist

Code Quality & Standards

Coding conventions (async, dataclasses, type annotations)
Python 3.12+ types
SQLAlchemy 2.0
100-char line length
No commented-out code
Literate Trigger/Why/Outcome comments

Testing & Documentation

Unit tests for new functions
Integration tests use real services
Edge cases covered
Regression test for frontmatter-only search content (ee92812)
[⚠️] No test for ensure_frontmatter_on_sync=False + permalink reconciliation path (issue Pre release #1)
[⚠️] No test for thematic-break --- file through index_search=True (issue Memory json import #2)

Architecture

IndexFileWriter protocol boundary clean
SyncedMarkdownFile in indexing/models.py
Knowledge graph consistency maintained
Intentional double DB write documented

Security & Performance

No hardcoded secrets
Proper error handling
Checksum now computed from raw bytes (P1 fixed)

Compatibility

Path objects used for file ops
Text-based vs binary checksum divergence resolved

Verdict

Issues #1 and #2 are both correctness bugs in new code and worth fixing before merge. #1 (frontmatter injection when enforcement is off) is a one-line guard. #2 (ParseError on thematic-break files in the search indexing path) needs a try/except. #3 is real but low priority and can ship as a known follow-up. Overall the PR is in good shape — these are targeted, fixable gaps.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ef292de29

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-16T00:36:17Z

+        if existing_permalink_by_path is None:
+            existing_permalink_by_path = {
+                path: permalink
+                for path, permalink in (
+                    await self.entity_repository.get_file_path_to_permalink_map()


Avoid full permalink scans in one-file sync

index_markdown_file rebuilds existing_permalink_by_path from get_file_path_to_permalink_map() whenever the caller does not pass a map, and sync_one_markdown_file invokes this path for each markdown file sync. In incremental/watch workloads (which call sync_file per changed file), this adds a full-entity-table scan to every single-file update, creating an O(N) DB read per event and causing avoidable slowdowns on larger projects. A single-file fast path (or a shared cached map across calls) would prevent this regression.

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>

phernandez · 2026-04-16T00:45:00Z

Addressed the two remaining correctness issues from the latest follow-up review:

Added the _reconcile_persisted_permalink() guard so frontmatterless files stay untouched when ensure_frontmatter_on_sync=False, including the existing-entity-with-permalink case.
Made sync_one_markdown_file() fall back to raw markdown content when strict frontmatter stripping rejects leading --- text, and added a regression for thematic-break content.

Also tightened the no-rewrite assertions to compare against the exact on-disk text so the Windows newline cases stop flapping. I left the stale batch mtime/size note alone for now since it looks like extra work rather than bad data.

github-actions · 2026-04-16T00:45:19Z

Claude finished @phernandez's task in 1m 45s —— View job

PR Review: fix(sync): preserve canonical markdown in single-file sync

Gather context and understand the request
Read changed files (batch_indexer.py, sync_service.py, tests)
Evaluate fixes for previously identified issues
Check new Codex finding (permalink scan performance)
Post full review findings

Follow-up Review (commit `1bb42df`)

Both blocking issues from the previous review have been addressed cleanly.

Previous Issue	Status
P2 — `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync`	✅ Fixed — `batch_indexer.py` L569–577
P2 — `remove_frontmatter` raises `ParseError` for thematic-break files	✅ Fixed — `sync_service.py` L1128–1140
Stale mtime/size after batch `_reconcile_persisted_permalink` rewrite	↗️ Deferred — acknowledged as non-critical
No-rewrite assertions flapping on Windows CRLF	✅ Fixed — assertions now compare against `read_bytes().decode()`

Fix Quality

_reconcile_persisted_permalink guard (batch_indexer.py L565–577)

if (
    self.app_config.disable_permalinks
    or (
        not prepared.file_contains_frontmatter
        and not self.app_config.ensure_frontmatter_on_sync
    )
    or entity.permalink is None
    or entity.permalink == prepared.markdown.frontmatter.permalink
):
    return prepared

The guard is correct and the Trigger/Why/Outcome comment explains the intent. Test test_batch_indexer_does_not_inject_frontmatter_when_sync_enforcement_is_disabled covers the key path: an existing entity with a DB permalink whose file has no frontmatter and enforcement is off — no write occurs.

ParseError fallback in sync_one_markdown_file (sync_service.py L1128–1140)

try:
    search_content = remove_frontmatter(final_markdown_content)
except ParseError:
    search_content = final_markdown_content

Clean and well-commented. Test test_sync_one_markdown_file_indexes_thematic_break_content_without_frontmatter covers the ---\nBody content... case end-to-end, asserting both that the file is not mutated and that the raw content is passed to the search index.

New Codex Finding: O(N) DB Scan Per One-File Sync

src/basic_memory/indexing/batch_indexer.py L193–200

if existing_permalink_by_path is None:
    existing_permalink_by_path = {
        path: permalink
        for path, permalink in (
            await self.entity_repository.get_file_path_to_permalink_map()
        ).items()
    }

sync_one_markdown_file → index_markdown_file always hits this branch (no map is passed), triggering a full entity table scan on every single-file watch event. For a project with thousands of files, each watcher event incurs an O(N) query.

This is a real performance concern for watch workloads. _index_changed_files already caches this map once per sync pass (L539–544), but the one-file path has no equivalent optimization.

Severity: Performance regression, not a correctness bug. The watch/single-file path was also present before this PR — this PR makes it more prominent by funneling through index_markdown_file. Worth a follow-up issue but not blocking.

Track this as follow-up →

Checklist

Code Quality & Standards

Coding conventions (async, dataclasses, type annotations)
Python 3.12+ types
SQLAlchemy 2.0
100-char line length
No commented-out code
Literate Trigger/Why/Outcome comments

Testing & Documentation

Unit tests for new functions
Integration tests use real services
Edge cases covered (ensure_frontmatter_on_sync=False, thematic-break content)
No-rewrite assertions hardened against Windows CRLF
Regression test for frontmatter-only search content

Architecture

IndexFileWriter protocol boundary clean
SyncedMarkdownFile in indexing/models.py
Knowledge graph consistency maintained
Intentional double DB write documented

Security & Performance

No hardcoded secrets
Proper error handling
Checksum computed from raw bytes (P1 fixed in prior pass)
[⚠️] O(N) scan per one-file sync event — performance, not correctness; follow-up issue recommended

Compatibility

Path objects used for file ops
Binary read_bytes().decode() for Windows CRLF robustness

Verdict

Both blocking correctness bugs are fixed and tested. The PR is clean — ready to merge. The O(N) permalink scan in the one-file path is worth a follow-up issue but does not block this fix landing.

fix(sync): preserve canonical markdown in single-file sync

ecaf247

Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector bot reviewed Apr 16, 2026

View reviewed changes

fix(sync): strip frontmatter-only search content

ee92812

Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector bot reviewed Apr 16, 2026

View reviewed changes

fix(sync): hash one-file markdown from raw bytes

8ef292d

Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector bot reviewed Apr 16, 2026

View reviewed changes

fix(sync): preserve frontmatterless markdown sync

1bb42df

Signed-off-by: phernandez <paul@basicmachines.co>

phernandez merged commit b3d5448 into main Apr 16, 2026
39 of 41 checks passed

phernandez deleted the more-write-fixes branch April 16, 2026 01:04

Conversation

phernandez commented Apr 15, 2026

Summary

Testing

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(sync): preserve canonical markdown in single-file sync

Summary

Code Quality & Standards

Testing & Documentation

Architecture

Issues Found

1. Redundant entity_repository.update() in sync_one_markdown_file — missing comment

2. _reconcile_persisted_permalink mixes mutation and value semantics

3. _build_prepared_entity can pass frontmatter YAML as search_content

4. Minor: pragma: no cover for the outer exception in update_frontmatter_with_result

5. Test gap: index_search=True path in test_sync_one_markdown_file_writes_missing_frontmatter_and_returns_canonical_content

Checklist Results

Verdict

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

phernandez commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(sync): preserve canonical markdown in single-file sync

Follow-up Review

Codex Automated Review Findings

P1 — Checksum computed from decoded text, not raw bytes

P2 — _reconcile_persisted_permalink ignores ensure_frontmatter_on_sync

Checklist Results

Verdict

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(sync): preserve canonical markdown in single-file sync

Follow-up Review (commit 8ef292d)

Remaining Open Issues

1. _reconcile_persisted_permalink ignores ensure_frontmatter_on_sync — blocking

2. remove_frontmatter raises ParseError for thematic-break files — blocking

3. Stale mtime/size after _reconcile_persisted_permalink rewrites — low priority

Checklist

Verdict

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

phernandez commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(sync): preserve canonical markdown in single-file sync

Follow-up Review (commit 1bb42df)

Fix Quality

New Codex Finding: O(N) DB Scan Per One-File Sync

Checklist

github-actions bot commented Apr 16, 2026 •

edited

Loading

1. Redundant `entity_repository.update()` in `sync_one_markdown_file` — missing comment

2. `_reconcile_persisted_permalink` mixes mutation and value semantics

3. `_build_prepared_entity` can pass frontmatter YAML as `search_content`

4. Minor: `pragma: no cover` for the outer exception in `update_frontmatter_with_result`

5. Test gap: `index_search=True` path in `test_sync_one_markdown_file_writes_missing_frontmatter_and_returns_canonical_content`

github-actions bot commented Apr 16, 2026 •

edited

Loading

P2 — `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync`

github-actions bot commented Apr 16, 2026 •

edited

Loading

Follow-up Review (commit `8ef292d`)

1. `_reconcile_persisted_permalink` ignores `ensure_frontmatter_on_sync` — blocking

2. `remove_frontmatter` raises `ParseError` for thematic-break files — blocking

3. Stale mtime/size after `_reconcile_persisted_permalink` rewrites — low priority

github-actions bot commented Apr 16, 2026 •

edited

Loading

Follow-up Review (commit `1bb42df`)