Conversation
There was a problem hiding this comment.
Pull request overview
Adds a one-off runner script intended to backfill missing Search::Record entries for cards/comments that became unsearchable after the search table rebuild on 2025-11-13.
Tip
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.
Changes:
- Add
scripts/reindex_stale_search_records.rbto reindex cards and comments before a cutoff date. - Provide CLI usage for default vs custom cutoff dates.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cutoff = Date.parse(ARGV[0] || "2025-11-13") | ||
|
|
||
| puts "Reindexing cards and comments not updated since #{cutoff}..." | ||
|
|
||
| cards = Card.published.where("updated_at < ?", cutoff).includes(:rich_text_description) |
There was a problem hiding this comment.
The cutoff logic is exclusive ("updated_at < ?" with a Date parsed to midnight). For a default cutoff of 2025-11-13 this will skip records updated on 2025-11-13, even though they were created/updated before the search table rebuild on that date and can still be missing from the index. Consider treating the cutoff as an inclusive day (e.g., compare against cutoff + 1.day) or accept/parse a full timestamp and compare against that timestamp.
| cards = Card.published.where("updated_at < ?", cutoff).includes(:rich_text_description) | ||
| card_count = cards.count | ||
| puts "Found #{card_count} cards to reindex" | ||
|
|
||
| reindexed_cards = 0 | ||
| cards.find_each do |card| | ||
| card.reindex | ||
| reindexed_cards += 1 |
There was a problem hiding this comment.
Selecting cards by updated_at risks missing pre-rebuild cards whose updated_at has been bumped by comment activity (Comment belongs_to :card, touch: true), since a touch triggers Card's after_touch but not Searchable's after_update_commit reindexing. Those cards can still be missing search records while having updated_at >= cutoff. Using created_at as the primary filter (or explicitly detecting missing Search::Record entries) would avoid leaving some stale cards un-reindexed.
| # Reindex cards and comments missing from the search index. | ||
| # | ||
| # On 2025-11-13 the search tables were rebuilt (search_index_* dropped, empty | ||
| # search_records_* created) without a data migration. Cards not created or | ||
| # updated after that date have no search records and are invisible to search | ||
| # and card-table filtering. | ||
| # | ||
| # Usage: | ||
| # bin/rails runner scripts/reindex_stale_search_records.rb # default cutoff: 2025-11-13 | ||
| # bin/rails runner scripts/reindex_stale_search_records.rb 2026-01-01 # custom cutoff |
There was a problem hiding this comment.
This repository already has a script/ directory (including script/maintenance/ and script/migrations/) for operational/one-off scripts. Introducing a separate top-level scripts/ directory makes it easy to miss scripts and splits conventions (shebang + require_relative "../config/environment" vs rails runner). Consider moving this into the existing script/maintenance/ area (or aligning with that structure) to keep script discovery consistent.
| cutoff = Date.parse(ARGV[0] || "2025-11-13") | ||
|
|
There was a problem hiding this comment.
Date.parse will raise on invalid input (and also loses the time-of-day/time-zone information that matters for a migration timestamp). It would be safer to validate the argument and print a short usage message (or default cleanly) when parsing fails, and consider using Time.zone.parse if you want the cutoff to be time-zone aware.
| cutoff = Date.parse(ARGV[0] || "2025-11-13") | |
| default_cutoff = Time.zone.parse("2025-11-13") | |
| cutoff_arg = ARGV[0] | |
| cutoff = | |
| if cutoff_arg.nil? | |
| default_cutoff | |
| else | |
| begin | |
| parsed_cutoff = Time.zone.parse(cutoff_arg) | |
| rescue ArgumentError, TypeError | |
| parsed_cutoff = nil | |
| end | |
| if parsed_cutoff.nil? | |
| warn "Invalid cutoff: #{cutoff_arg.inspect}" | |
| warn "Usage: bin/rails runner scripts/reindex_stale_search_records.rb [YYYY-MM-DD or timestamp]" | |
| exit 1 | |
| end | |
| parsed_cutoff | |
| end |
| comments = Comment.joins(:card).merge(Card.published).where("comments.updated_at < ?", cutoff).includes(:rich_text_body, :card) | ||
| comment_count = comments.count |
There was a problem hiding this comment.
Same as cards: using an exclusive < cutoff comparison with a Date parsed to midnight will skip comments updated on the cutoff date. If the intent is to backfill everything created/updated on or before 2025-11-13, the query should treat the cutoff as inclusive (or accept a full timestamp).
b8e36b0 to
9a94ab6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cards = Card.published.where("created_at < ?", cutoff).includes(:rich_text_description) | ||
| card_count = cards.count | ||
| puts "Found #{card_count} cards to reindex" | ||
|
|
There was a problem hiding this comment.
indexed_card_ids is not scoped to the current account. Since search_records_* tables store multiple accounts per shard, this will pluck IDs for all accounts on the shard, which is both extremely expensive and can cause incorrect “missing” detection. Filter by account_id: account.id when collecting existing records (and similarly for comments below).
|
|
||
| reindexed_comments = 0 | ||
| comments.find_each do |comment| |
There was a problem hiding this comment.
indexed_comment_ids is not scoped to the current account, so it will pluck comment IDs for every account on the same search shard. This can blow up memory/SQL size and distort the missing set. Filter by account_id: account.id here as well.
| puts "Found #{card_count} cards to reindex" | ||
|
|
||
| reindexed_cards = 0 |
There was a problem hiding this comment.
Plucking all indexed IDs into Ruby arrays (pluck(:searchable_id)) can generate huge arrays and very large WHERE NOT IN (...) clauses for big accounts. A more scalable approach is to use a subquery (e.g., where.not(id: shard.where(...).select(:searchable_id))) so the DB can evaluate it without materializing all IDs in memory.
| # bin/rails runner script/maintenance/reindex_stale_search_records.rb # default cutoff: 2026-04-01 | ||
| # bin/rails runner script/maintenance/reindex_stale_search_records.rb 2026-01-01 # custom cutoff | ||
|
|
There was a problem hiding this comment.
The usage comment points to scripts/reindex_stale_search_records.rb, but this file lives under script/maintenance/. This is likely to cause the wrong script to be run (and there are currently two similarly named scripts). Update the usage line to the correct path, and consider consolidating to a single script location to avoid operator confusion.
|
|
||
| reindexed_comments = 0 | ||
| comments.find_each do |comment| |
There was a problem hiding this comment.
Same scalability concern for comments: pluck(:searchable_id) materializes all indexed comment IDs into Ruby and can lead to huge memory use / SQL NOT IN lists. Prefer a DB subquery for the exclusion set to keep the work in SQL.
The data import path (Account::DataTransfer) uses insert_all! which bypasses ActiveRecord callbacks, so imported cards and comments never get search records created and are invisible to search and card-table filtering. This script reindexes all published cards and comments created before a cutoff date (default 2026-04-01). The reindex method is an upsert so already-indexed records get a harmless update. Usage: bin/rails runner script/maintenance/reindex_stale_search_records.rb bin/rails runner script/maintenance/reindex_stale_search_records.rb 2026-01-01
9a94ab6 to
daf5e20
Compare
Summary
20251113190256dropped the oldsearch_index_*tables and created emptysearch_records_*tables without data migration, so all cards not created/updated since then are missing from the search indexsearch:reindex, this script only upserts missing records — it doesn't clear and rebuild everythingBug: https://3.basecamp.com/2914079/buckets/27/card_tables/cards/9782824728