feat: extract quoted tweet content for quote-tweet bookmarks#30
feat: extract quoted tweet content for quote-tweet bookmarks#30mindswim wants to merge 1 commit into
Conversation
…ommand The GraphQL bookmarks API already returns full quoted tweet data nested inside tweet.quoted_status_result.result, but it was not being extracted. This adds extraction in convertTweetToRecord() so future syncs automatically include quoted tweet text, author, media, and URL. For existing bookmarks synced without this data, adds `ft enrich` which fetches missing quoted tweets via X's syndication API with retry and rate limiting. Idempotent -- safe to run multiple times. Schema bumped to v4 with a quoted_tweet_json column. DB update logic lives in bookmarks-db.ts following the existing layer separation. Closes afar1#15
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5f2d022. Configure here.
| githubUrls.length ? JSON.stringify(githubUrls) : null, | ||
| null, // domains — populated by classify-domains pass | ||
| null, // primary_domain | ||
| r.quotedTweet ? JSON.stringify(r.quotedTweet) : null, |
There was a problem hiding this comment.
Migration skipped due to premature schema version bump
High Severity
In buildIndex, initSchema(db) is called before ensureMigrations(db). initSchema unconditionally sets schema_version to SCHEMA_VERSION (4) in the meta table, even when CREATE TABLE IF NOT EXISTS is a no-op for an existing table. When ensureMigrations runs next, it reads the version as 4, so the version < 4 check is false and the ALTER TABLE ADD COLUMN quoted_tweet_json migration never executes. For any existing database at schema v3, the column is missing, causing SQL errors when insertRecord passes 31 values to a 30-column table or when updateQuotedTweets references the non-existent column.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 5f2d022. Configure here.
First-Pass ReviewClean, well-structured PR. The GraphQL quoted tweet extraction is correct, the syndication API fallback for backfill is a good design choice, and the schema migration follows existing patterns. One real issue to address. Must Fix
// graphql-bookmarks.ts (quoted tweet extraction)
mediaObjects: qtMediaEntities.map((m: any) => ({
url: m.media_url_https ?? m.media_url, // should be mediaUrl per interface
expandedUrl: m.expanded_url, // not on BookmarkMediaObject
})),Same pattern in Pragmatic fix: update Should Fix1. Schema version conflict with folder support This PR uses SCHEMA_VERSION = 4 for 2. When a tweet is unavailable (404/403), 3. Consider a The enrich command mutates both JSONL and SQLite. A dry-run flag that reports how many bookmarks need enrichment without fetching would be useful for scripting and safety. Nitpick
Verified Correct
Verdict: Solid work. Fix the mediaObjects type mismatch, then it's ready. Coordinate schema version with #34. |
|
Closing in favor of #35 which landed this. @mindswim — your implementation directly shaped how we built this. The syndication API backfill approach, the snapshot type, the test cases — all solid. We ended up folding it into |
|
cool, thanks, and thanks for the tool!
…On Mon, Apr 6, 2026 at 9:18 PM Andrew Farah ***@***.***> wrote:
*afar1* left a comment (afar1/fieldtheory-cli#30)
<#30 (comment)>
Closing this in favor of #35
<#35> which landed the same
feature — but want to say thank you @mindswim
<https://github.com/mindswim>. Your implementation was really well done
and directly informed how we built this. The convertTweetToRecord
extraction logic, the syndication API approach for backfill, the
QuotedTweetSnapshot type, and the test cases all matched what we ended up
shipping. We folded it into ft sync --gaps instead of a separate ft enrich
command, but the core idea and approach were yours. Appreciate you taking
the time to build this out so thoroughly. 🙏
—
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A27IKOXURXMD5GSFCDK4DTD4URJM3AVCNFSM6AAAAACXOIJ4ESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOJVHA4DKMBQGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|


Summary
Bookmarks that quote another tweet only store the quoted tweet's ID (
quotedStatusId). The quoted tweet's text, author, and media are missing, making many bookmarks hard to understand in isolation.The GraphQL bookmarks API already returns this data nested inside
tweet.quoted_status_result.result-- it just wasn't being extracted. This PR reads it during sync and adds a backfill command for existing bookmarks.Closes #15
What changed
src/types.ts--QuotedTweetSnapshotinterface,quotedTweetfield onBookmarkRecordsrc/graphql-bookmarks.ts-- extractquoted_status_result.resultinconvertTweetToRecord(). Future syncs include quoted tweets automatically.src/bookmarks-db.ts-- schema v4:quoted_tweet_jsoncolumn, migration,updateQuotedTweets()for the DB layersrc/bookmark-enrich.ts-- newft enrichcommand that backfills existing bookmarks via X's syndication API. Retry with exponential backoff (matchingfetchPageWithRetrypattern), rate limiting, idempotent.src/cli.ts-- registerft enrichcommandtests/graphql-bookmarks.test.ts-- 2 tests: extraction with quoted tweet present, graceful handling when absentHow it works
New syncs:
convertTweetToRecord()now reads the nested quoted tweet from the GraphQL response. No extra API calls, no config. It just works.Existing bookmarks:
ft enrichfetches missing quoted tweets viacdn.syndication.twimg.com(no auth required). Deduplicates by quoted tweet ID, retries on 429/5xx, skips deleted/private tweets. Run once to backfill, then never needed again.Verification
Note
Medium Risk
Adds new data extraction and persistence paths (GraphQL parsing + SQLite schema migration) and a networked backfill command, which could affect sync/index correctness and rate-limit behavior but is contained to optional enrichment and a new column.
Overview
Adds first-class support for storing quoted-tweet context on quote-tweet bookmarks.
New GraphQL sync behavior extracts a
QuotedTweetSnapshotfromquoted_status_resultand stores it onBookmarkRecord.quotedTweet, with tests covering presence/absence. The SQLite index schema is bumped to v4 with a newquoted_tweet_jsoncolumn, insert/migration updates, and a newupdateQuotedTweets()helper.Introduces
ft enrich, which backfills missingquotedTweetsnapshots for existing bookmarks by fetching via X’s public syndication endpoint with retry/backoff, deduping quoted IDs, updating both the JSONL cache and the SQLite index.Reviewed by Cursor Bugbot for commit 5f2d022. Bugbot is set up for automated code reviews on this repo. Configure here.