Skip to content

fix(importer): cleanup trio — http urls, tag defaults, gitsheets staleness docs#74

Merged
themightychris merged 5 commits into
mainfrom
chore/importer-cleanup-trio
May 19, 2026
Merged

fix(importer): cleanup trio — http urls, tag defaults, gitsheets staleness docs#74
themightychris merged 5 commits into
mainfrom
chore/importer-cleanup-trio

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Three small importer/schema fixes bundled because they're each tiny on their own and share the spec-→-importer surface. Counts validated by a --dry-run importer pass against the live laddr snapshot.

  • importer: 81 of 113 project-buzz records skip on http:// URLs #56 — relax ProjectBuzz.url schema to allow any valid URL. The legacy importer was dropping 81 of 113 buzz records on http:// press links that codeforphilly.org itself still serves as plain HTTP. Fidelity wins over the marginal security value of refusing them. 32 → 112 imported (1 still legitimately skipped due to unresolved FK, not URL).
  • importer: ~120 laddr tags have no resolvable namespace #58 — default tags with no resolvable namespace to topic. ~120 laddr tags (autocomplete-create artifacts with bare-word handles) were being skipped; they now import with an audit warning. 885 → 1017 imported, matches the "~120" estimate.
  • queryAll on slug-history sheet returns empty after staff-approve merge #47 — document the gitsheets Sheet#dataTree caching limitation that bit the account-claim test. Investigation found no live production exposure (route handlers read from the typed in-memory Store, kept in lockstep by StateApply; only direct sheet.query*() reads after a write are stale). New section in specs/behaviors/storage.md; tightened JSDoc on Store.swapPublic; failing-test fallback comment now links the discussion.

The live legacy-import re-run intentionally did not run from this branch — the relaxed schema must ship to the sandbox pod first, otherwise a subsequent published merge would fail validation at boot/reload. Dry-run proved the importer-side fix; the actual write happens at the next deploy cadence.

Stacked on PR #73 — once #73 merges, this branch rebases cleanly.

Test plan

  • npm run type-check && npm run lint clean (pre-confirmed locally)
  • npm run -w packages/shared test and npm run -w apps/api test pass
  • After merge + deploy: re-run npm run -w apps/api script:import-laddr -- --branch=legacy-import against the live laddr snapshot, push, merge to published, watch the hot-reload short-circuit log
  • Project-detail pages with previously-skipped buzz items render those links (e.g. anywhere a 2016-era PhillyMag piece exists)
  • Skim the defaulted-to-topic tags via /tags/topic for any obvious mis-classifications

🤖 Generated with Claude Code

themightychris and others added 5 commits May 19, 2026 16:04
Bundles #47, #56, #58 — three small fixes on the importer/schema surface
that all surfaced from the legacy-import dry run. Plan body covers each
sub-deliverable's approach, expected count deltas, and the risks for
both the http:// fidelity and topic-default decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The legacy importer was dropping 81 of 113 ProjectBuzz records because
their URLs are http:// — mid-2010s press links that codeforphilly.org
itself still serves as plain HTTP. Fidelity wins over the marginal
security value of refusing them; future moderation tooling will need to
flag bad-actor URLs irrespective of scheme.

Schema drops `.startsWith('https://')`; spec row updated to call out the
legacy-import policy. Schemas test now asserts http:// passes (and a
malformed URL still fails so the .url() floor is intact).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ough (closes #58)

Two importer adjustments paired with the spec changes:

splitTagHandle no longer returns null — bare-word laddr tags (~120 of
them: org names, single-event keywords) now default to namespace=topic
with an audit warning. Operators can re-namespace later via tooling.
The Tag spec already documents this policy (data-model.md, prior
commit-adjacent edit landed alongside the http:// schema relaxation
spec change).

For ProjectBuzz urls, swap validHttps() for a sibling validUrl() helper
that accepts http: or https: — validHttps stays in place for Project's
usersUrl / developersUrl which still require HTTPS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…47)

queryAll() on slug-history returned [] after a transact even though
git ls-tree showed the file was written. Investigation: every gitsheets
Sheet caches the dataTree snapshot it was opened against and never
refreshes — the transact path itself is fine because repo.transact
builds a fresh workspace from HEAD per call, and route handlers read
from the typed in-memory Store (mutated in lockstep by StateApply), so
production exposure today is zero. The bite is on direct sheet.query*()
reads after a write — currently only an issue in tests that need to
verify writes to sheets we don't load into the typed Store
(slug-history, revocations).

New storage.md section explains the limitation and the in-memory-state
fix path; swapPublic JSDoc points at it; failing test's git-show
fallback now has a comment that links the discussion instead of just
describing the symptom. No runtime change — when a future redirect
handler needs slug-history post-write, the right move is to load it
into the typed Store like the other sheets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closeout: tick the validation criteria, fill Notes (incl. the
data-model.md spec-edit straddling two commits — minor but worth
flagging) and Follow-ups (upstream gitsheets enhancement, deferred
live re-run, future tag re-namespacing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@themightychris themightychris force-pushed the chore/importer-cleanup-trio branch from 46141f4 to b1bb223 Compare May 19, 2026 20:04
@themightychris themightychris merged commit 97d6bd7 into main May 19, 2026
1 check passed
@themightychris themightychris deleted the chore/importer-cleanup-trio branch May 19, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant