Skip to content

Post-cutover: migrate body-heavy entities to gitsheets v1.2 content-typed (markdown) records #44

@themightychris

Description

@themightychris

gitsheets v1.2.0 added content-typed records — sheets opt into format.type = 'markdown' to store records as .md files with TOML frontmatter and a designated body field. Plus lazy body loading via query({ withBody: false }).

This is the biggest one-time upgrade we'd take from the gitsheets 1.x line. Not urgent — defer to after cutover-prep ships so we're not refactoring entities mid-migration.

Why migrate

  • Snapshot is actually-readable. Contributors cloning codeforphilly-data-snapshot see real .md files in any markdown viewer — instead of parsing TOML records to find the prose.
  • Authoring via PR. Staff / maintainers can edit a project overview in any markdown editor and PR it; currently they roundtrip through the API.
  • Listing performance. queryAll({ withBody: false }) on hot paths: projects-index, activity feed, FTS seeding, snapshot scrub.
  • Indexes stay fast. Index builds use body-less reads natively in v1.2.

What changes

Entities with substantial body content:

  • Projectoverview (markdown body), summary (short markdown) → migrate overview as the body field, keep summary in frontmatter
  • ProjectUpdatebody (markdown) → migrate body as the body field
  • ProjectBuzzsummary (markdown) → migrate as body
  • Personbio (markdown) → migrate as body
  • HelpWantedRoledescription (markdown) → migrate as body
  • Tagdescription (markdown, short) → optional; cheaper to leave as TOML field

The migration is bounded; entities without long bodies (ProjectMembership, SlugHistory, Revocation, TagAssignment, HelpWantedInterestExpression) stay as TOML records.

Tasks

  1. Schema reshape in packages/shared/src/schemas/ — one designated body field per content-typed entity (rename or restructure the existing overview / body / bio / description / summary fields).
  2. Update .gitsheets/<sheet>.toml configs with [gitsheet.format] type = 'markdown' body = '<fieldName>'.
  3. In-memory loader in apps/api/src/store/memory/loader.ts — use { withBody: false } for index-building reads; lazy-load via Sheet.loadBody(record) when serving record detail responses.
  4. Serializers in apps/api/src/services/serializers/*Html / *Excerpt derived from the body field instead of the legacy string field.
  5. FTS pipeline in apps/api/src/store/fts.ts — body included in the indexed text via lazy-load batch.
  6. apps/api/scripts/import-laddr.ts — write the new markdown format for migrated entities.
  7. apps/api/scripts/scrub-data.ts — the snapshot now contains real .md files; verify the scrub still strips PII correctly across the new file shape.
  8. The data repo's existing TOML records need migration once — write a one-shot apps/api/scripts/migrate-to-content-typed.ts that reads existing records and rewrites as .md per the new format.
  9. Update specs/behaviors/markdown-rendering.md and specs/data-model.md to reflect content-typed entities.

Why defer

  • cutover-prep is next and depends on every other plan; this would invalidate frozen plans (storage-foundation, read-api, write-api, laddr-import, public-snapshot-scrub).
  • The benefit is real but landing is post-cutover work, not pre-cutover refactor.

Out of scope

  • gitsheets check pre-commit hooks belong in the data repo, not this code repo.
  • The bundled Claude Code skill at node_modules/gitsheets/skills/gitsheets/ is available once we bump the dep range; future plans touching gitsheets can load it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions