feat: extract quoted tweet content for quote-tweet bookmarks by mindswim · Pull Request #30 · afar1/fieldtheory-cli

mindswim · 2026-04-06T15:31:33Z

Summary

Bookmarks that quote another tweet only store the quoted tweet's ID (quotedStatusId). The quoted tweet's text, author, and media are missing, making many bookmarks hard to understand in isolation.

The GraphQL bookmarks API already returns this data nested inside tweet.quoted_status_result.result -- it just wasn't being extracted. This PR reads it during sync and adds a backfill command for existing bookmarks.

Closes #15

What changed

src/types.ts -- QuotedTweetSnapshot interface, quotedTweet field on BookmarkRecord
src/graphql-bookmarks.ts -- extract quoted_status_result.result in convertTweetToRecord(). Future syncs include quoted tweets automatically.
src/bookmarks-db.ts -- schema v4: quoted_tweet_json column, migration, updateQuotedTweets() for the DB layer
src/bookmark-enrich.ts -- new ft enrich command that backfills existing bookmarks via X's syndication API. Retry with exponential backoff (matching fetchPageWithRetry pattern), rate limiting, idempotent.
src/cli.ts -- register ft enrich command
tests/graphql-bookmarks.test.ts -- 2 tests: extraction with quoted tweet present, graceful handling when absent

How it works

New syncs: convertTweetToRecord() now reads the nested quoted tweet from the GraphQL response. No extra API calls, no config. It just works.

Existing bookmarks: ft enrich fetches missing quoted tweets via cdn.syndication.twimg.com (no auth required). Deduplicates by quoted tweet ID, retries on 429/5xx, skips deleted/private tweets. Run once to backfill, then never needed again.

ft enrich              # backfill missing quoted tweets
ft enrich --delay-ms 500  # slower rate limit

Verification

npm run build          # clean compile
npm test               # 2 new tests pass (5 pre-existing db test failures on main unchanged)
ft enrich              # tested against 7,641 bookmarks -- 1,164/1,205 quoted tweets fetched

Note

Medium Risk
Adds new data extraction and persistence paths (GraphQL parsing + SQLite schema migration) and a networked backfill command, which could affect sync/index correctness and rate-limit behavior but is contained to optional enrichment and a new column.

Overview
Adds first-class support for storing quoted-tweet context on quote-tweet bookmarks.

New GraphQL sync behavior extracts a QuotedTweetSnapshot from quoted_status_result and stores it on BookmarkRecord.quotedTweet, with tests covering presence/absence. The SQLite index schema is bumped to v4 with a new quoted_tweet_json column, insert/migration updates, and a new updateQuotedTweets() helper.

Introduces ft enrich, which backfills missing quotedTweet snapshots for existing bookmarks by fetching via X’s public syndication endpoint with retry/backoff, deduping quoted IDs, updating both the JSONL cache and the SQLite index.

^{Reviewed by Cursor Bugbot for commit 5f2d022. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ommand The GraphQL bookmarks API already returns full quoted tweet data nested inside tweet.quoted_status_result.result, but it was not being extracted. This adds extraction in convertTweetToRecord() so future syncs automatically include quoted tweet text, author, media, and URL. For existing bookmarks synced without this data, adds `ft enrich` which fetches missing quoted tweets via X's syndication API with retry and rate limiting. Idempotent -- safe to run multiple times. Schema bumped to v4 with a quoted_tweet_json column. DB update logic lives in bookmarks-db.ts following the existing layer separation. Closes afar1#15

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5f2d022. Configure here.}

cursor · 2026-04-06T15:39:32Z

      githubUrls.length ? JSON.stringify(githubUrls) : null,
      null, // domains — populated by classify-domains pass
      null, // primary_domain
+      r.quotedTweet ? JSON.stringify(r.quotedTweet) : null,


Migration skipped due to premature schema version bump

High Severity

In buildIndex, initSchema(db) is called before ensureMigrations(db). initSchema unconditionally sets schema_version to SCHEMA_VERSION (4) in the meta table, even when CREATE TABLE IF NOT EXISTS is a no-op for an existing table. When ensureMigrations runs next, it reads the version as 4, so the version < 4 check is false and the ALTER TABLE ADD COLUMN quoted_tweet_json migration never executes. For any existing database at schema v3, the column is missing, causing SQL errors when insertRecord passes 31 values to a 30-column table or when updateQuotedTweets references the non-existent column.

Additional Locations (2)

src/bookmarks-db.ts#L215-L216

src/bookmarks-db.ts#L232-L238

^{Reviewed by Cursor Bugbot for commit 5f2d022. Configure here.}

BenevolentFutures · 2026-04-06T21:14:56Z

First-Pass Review

Clean, well-structured PR. The GraphQL quoted tweet extraction is correct, the syndication API fallback for backfill is a good design choice, and the schema migration follows existing patterns. One real issue to address.

Must Fix

mediaObjects field mismatch with BookmarkMediaObject type

QuotedTweetSnapshot.mediaObjects is typed as BookmarkMediaObject[], but the objects use url instead of mediaUrl (the field defined on the interface), and expandedUrl which doesn't exist on the interface at all:

// graphql-bookmarks.ts (quoted tweet extraction)
mediaObjects: qtMediaEntities.map((m: any) => ({
  url: m.media_url_https ?? m.media_url,       // should be mediaUrl per interface
  expandedUrl: m.expanded_url,                  // not on BookmarkMediaObject
})),

Same pattern in bookmark-enrich.ts. Note: This is a pre-existing inconsistency — the main tweet's mediaObjects extraction has the same mismatch. The (m: any) cast hides the type error.

Pragmatic fix: update BookmarkMediaObject to use url instead of mediaUrl (matching what's actually serialized in the JSONL). Lower risk than changing all creation sites.

Should Fix

1. Schema version conflict with folder support

This PR uses SCHEMA_VERSION = 4 for quoted_tweet_json. PR #34 (folder support, now ready for review) also uses v4 for folder_ids/folder_names. Whichever merges second will need a rebase to v5, with both migration steps. Not a blocker today but coordinate merge order.

2. EnrichResult double-counts unavailable tweets

When a tweet is unavailable (404/403), fetchTweetWithRetry returns null, which increments failed in the fetch loop. Later, when applying snapshots, the same null resolution increments skipped. So unavailable tweets are counted in both failed AND skipped. Suggestion: reserve failed for actual exceptions and skipped for intentionally unavailable tweets.

3. Consider a --dry-run option

The enrich command mutates both JSONL and SQLite. A dry-run flag that reports how many bookmarks need enrichment without fetching would be useful for scripting and safety.

Nitpick

spinnerIdx referenced in the enrich command handler — compiles fine since it's module-scoped, but a local index would be slightly cleaner.
The token=x parameter in the syndication URL deserves a comment explaining it's a required-but-any-value parameter, not a real auth token.

Verified Correct

GraphQL extraction correctly navigates quoted_status_result.result, handles the tweet wrapper, extracts user from core.user_results.result with proper fallbacks
Null/missing quoted tweet handling is solid — quotedTweet is undefined, quotedStatusId still preserved
Schema migration follows the exact ALTER TABLE + try/catch pattern
Deduplication: quoted tweet IDs are deduped before fetching, then applied to all matching bookmarks
Retry logic mirrors existing fetchPageWithRetry pattern
JSONL + SQLite dual write maintains consistency
Test coverage: two tests covering successful extraction and graceful null handling
CLI registration follows existing patterns

Verdict: Solid work. Fix the mediaObjects type mismatch, then it's ready. Coordinate schema version with #34.

afar1 · 2026-04-07T01:17:43Z

Closing in favor of #35 which landed this. @mindswim — your implementation directly shaped how we built this. The syndication API backfill approach, the snapshot type, the test cases — all solid. We ended up folding it into ft sync --gaps instead of a separate command but the core was yours. Thanks for the great work.

mindswim · 2026-04-07T14:28:44Z

cool, thanks, and thanks for the tool!

…

On Mon, Apr 6, 2026 at 9:18 PM Andrew Farah ***@***.***> wrote: *afar1* left a comment (afar1/fieldtheory-cli#30) <#30 (comment)> Closing this in favor of #35 <#35> which landed the same feature — but want to say thank you @mindswim <https://github.com/mindswim>. Your implementation was really well done and directly informed how we built this. The convertTweetToRecord extraction logic, the syndication API approach for backfill, the QuotedTweetSnapshot type, and the test cases all matched what we ended up shipping. We folded it into ft sync --gaps instead of a separate ft enrich command, but the core idea and approach were yours. Appreciate you taking the time to build this out so thoroughly. 🙏 — Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A27IKOXURXMD5GSFCDK4DTD4URJM3AVCNFSM6AAAAACXOIJ4ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOJVHA4DKMBQGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cursor Bot reviewed Apr 6, 2026

View reviewed changes

afar1 closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extract quoted tweet content for quote-tweet bookmarks#30

feat: extract quoted tweet content for quote-tweet bookmarks#30
mindswim wants to merge 1 commit into
afar1:mainfrom
mindswim:feat/enrich-quoted-tweets

mindswim commented Apr 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 6, 2026

Uh oh!

BenevolentFutures commented Apr 6, 2026

Uh oh!

afar1 commented Apr 7, 2026 •

edited

Loading

Uh oh!

mindswim commented Apr 7, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mindswim commented Apr 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

How it works

Verification

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 6, 2026

Choose a reason for hiding this comment

Migration skipped due to premature schema version bump

Uh oh!

BenevolentFutures commented Apr 6, 2026

First-Pass Review

Must Fix

Should Fix

Nitpick

Verified Correct

Uh oh!

afar1 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mindswim commented Apr 7, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mindswim commented Apr 6, 2026 •

edited by cursor Bot

Loading

afar1 commented Apr 7, 2026 •

edited

Loading