Skip to content

feat: sync --gaps, quoted tweets, bookmarkedAt, update checker#35

Merged
afar1 merged 15 commits into
mainfrom
sync-gaps-quoted-tweets-update-checker
Apr 7, 2026
Merged

feat: sync --gaps, quoted tweets, bookmarkedAt, update checker#35
afar1 merged 15 commits into
mainfrom
sync-gaps-quoted-tweets-update-checker

Conversation

@afar1
Copy link
Copy Markdown
Owner

@afar1 afar1 commented Apr 6, 2026

Summary

Sync modes

  • --full--rebuild: Renamed for clarity. Full re-crawl with confirmation prompt, backup instructions, and --yes flag to skip.
  • --gaps: Scans existing bookmarks and backfills missing data via X's syndication API (no auth needed):
    • Missing quoted tweet content
    • Truncated article/note tweet text (detects text >= 275 chars, expands if longer text exists)
    • Incremental apply with checkpointing every 100 fetches
    • Failure log written to gaps-failures.json with per-tweet diagnostics grouped by reason
  • Stale page fix: Removed redundant incremental && guards so --rebuild stops after 3 empty pages instead of paging to 500.

Data extraction (all sync modes)

  • Quoted tweets: convertTweetToRecord() now reads quoted_status_result.result. Captures text, author, media, URL.
  • Full article text: Prefers note_tweet.note_tweet_results.result.text over truncated legacy.full_text (capped at ~304 chars).
  • bookmarkedAt timestamp: Decoded from timeline entry sortIndex snowflake. Previously always null.

UX

  • Update checker: After sync, checks npm registry (cached daily, 5s timeout). One-liner if newer version available.
  • What's new: Changelog highlights shown once on first run after updating.
  • Version in logo: Version number visible in logo box on every command.
  • Smooth spinner: Animates every 80ms independent of fetch callbacks.
  • Graceful Ctrl+C: Stops cleanly with "Your data is safe — run again to pick up where you left off."
  • Empty sync warning: Troubleshooting hints when sync finds 0 bookmarks.
  • Rebuild confirmation: Warning, backup command, and y/N prompt before full re-crawl.

Security

  • Atomic file writes: JSONL, JSON, and SQLite write to .tmp then rename, preventing corruption on crash.
  • Directory permissions: ~/.ft-bookmarks/ created with mode 0o700 (owner-only).
  • Transaction safety: buildIndex wrapped with ROLLBACK on error.
  • FTS5 error handling: Query parse errors caught with user-friendly message.

Housekeeping

Test plan

  • ft sync captures quoted tweets, full article text, and bookmarkedAt
  • ft sync --gaps backfills quoted tweets and expands truncated articles
  • ft sync --gaps writes gaps-failures.json with diagnostics for failed tweets
  • ft sync --rebuild shows confirmation prompt with backup command
  • ft sync --rebuild --yes skips confirmation
  • ft sync --rebuild --gaps errors cleanly
  • Ctrl+C during sync/gaps shows friendly resume message
  • Spinner animates smoothly between fetches
  • Update checker prints when outdated, silent when current
  • What's-new shows once after version change, not on subsequent runs
  • Crash during write doesn't corrupt JSONL or SQLite (atomic writes)
  • npm run build passes
  • npm test — 73 pass (5 pre-existing fixture failures unchanged)

🤖 Generated with Claude Code

afar1 and others added 15 commits April 6, 2026 10:12
Remove redundant `incremental &&` guards from sync stop conditions.
`reachedLatestStored` is already self-gating (only true when
`newestKnownId` is set, which requires incremental mode). The stale
page limit should apply universally to prevent full syncs from
fetching hundreds of empty pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename --full to --rebuild for full re-crawl
- Add --gaps mode: backfills missing quoted tweets via syndication API
- Extract quoted tweet content from GraphQL response (all sync modes)
- Extract bookmarkedAt timestamp from entry sortIndex snowflake
- Add QuotedTweetSnapshot type, schema v4 migration
- Add daily npm update checker (non-blocking, cached)
- Error if --rebuild and --gaps used together
- Extract CHROME_UA constant to deduplicate user-agent string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove dead updateBookmarkedAt() — bookmarkedAt comes from sortIndex
  during sync, not from gap-fill
- Export compareVersions for testability
- Add 6 tests for version comparison edge cases (double-digit segments, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Version number (e.g. v1.2.1) now displayed in the logo box
- WHATS_NEW map shows changelog highlights once on first run after
  updating to a new version (tracked via .last-version file)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prefer tweet.note_tweet.note_tweet_results.result.text over
legacy.full_text which truncates at 280 chars. Applies to all
sync modes. Existing truncated articles fixed on next --rebuild.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
legacy.full_text caps at ~304 chars, so X Articles and long-form
note tweets were silently truncated. --gaps now fetches all bookmarks
with text >= 275 chars from the syndication API and expands any
where the full text is longer. Combined with quoted tweet backfill
in a single deduplicated fetch pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply quoted tweet and text expansion results as each fetch
  completes instead of batching fetch-then-apply. Progress bar
  now shows accurate counts during the run, not just at the end.
- Checkpoint JSONL every 100 fetches so a crash doesn't lose
  all progress on large gap-fill runs.
- Build lookup indexes (recordsByQuotedId, recordsByTweetId) to
  avoid re-scanning all records for each fetch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running ft sync --rebuild now shows a warning explaining it will
re-crawl all bookmarks, prints the exact cp command to back up
first, and asks for confirmation. Use --yes to skip the prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…TS5 errors

- Atomic file writes: JSONL, JSON, and SQLite now write to .tmp
  then rename, preventing corruption on crash/interrupt
- Data directory created with mode 0o700 (owner-only access)
- buildIndex transaction wrapped in try/catch with ROLLBACK on error
- FTS5 query parse errors caught and re-thrown with a user-friendly
  message instead of raw SQLite errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spinner now runs on an 80ms interval independent of network fetches.
Data callbacks update the state, the timer handles rendering. Applies
to both sync and gaps progress bars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ctrl+C during sync or gaps now stops the spinner cleanly and
prints "Your data is safe — progress has been saved. Run the
same command again to pick up where you left off." instead of
a raw stack trace. Handler is scoped to spinner lifetime and
cleaned up on normal completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Failed tweets now tracked with reason (deleted, private, rate
limited, empty) and written to gaps-failures.json. Summary
grouped by reason shown in CLI output with path to full log
for inspection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant