Skip to content

Reindex stale search records#2843

Merged
jorgemanrubia merged 1 commit intomainfrom
jorge/reindex-stale-search-records
Apr 14, 2026
Merged

Reindex stale search records#2843
jorgemanrubia merged 1 commit intomainfrom
jorge/reindex-stale-search-records

Conversation

@jorgemanrubia
Copy link
Copy Markdown
Member

Summary

  • Adds a runner script to backfill search records for cards/comments that became invisible after the search table rebuild on 2025-11-13
  • Migration 20251113190256 dropped the old search_index_* tables and created empty search_records_* tables without data migration, so all cards not created/updated since then are missing from the search index
  • Unlike search:reindex, this script only upserts missing records — it doesn't clear and rebuild everything
bin/rails runner scripts/reindex_stale_search_records.rb              # default cutoff: 2025-11-13
bin/rails runner scripts/reindex_stale_search_records.rb 2026-01-01   # custom cutoff

Bug: https://3.basecamp.com/2914079/buckets/27/card_tables/cards/9782824728

Copilot AI review requested due to automatic review settings April 14, 2026 07:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a one-off runner script intended to backfill missing Search::Record entries for cards/comments that became unsearchable after the search table rebuild on 2025-11-13.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

  • Add scripts/reindex_stale_search_records.rb to reindex cards and comments before a cutoff date.
  • Provide CLI usage for default vs custom cutoff dates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +16
cutoff = Date.parse(ARGV[0] || "2025-11-13")

puts "Reindexing cards and comments not updated since #{cutoff}..."

cards = Card.published.where("updated_at < ?", cutoff).includes(:rich_text_description)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cutoff logic is exclusive ("updated_at < ?" with a Date parsed to midnight). For a default cutoff of 2025-11-13 this will skip records updated on 2025-11-13, even though they were created/updated before the search table rebuild on that date and can still be missing from the index. Consider treating the cutoff as an inclusive day (e.g., compare against cutoff + 1.day) or accept/parse a full timestamp and compare against that timestamp.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +23
cards = Card.published.where("updated_at < ?", cutoff).includes(:rich_text_description)
card_count = cards.count
puts "Found #{card_count} cards to reindex"

reindexed_cards = 0
cards.find_each do |card|
card.reindex
reindexed_cards += 1
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selecting cards by updated_at risks missing pre-rebuild cards whose updated_at has been bumped by comment activity (Comment belongs_to :card, touch: true), since a touch triggers Card's after_touch but not Searchable's after_update_commit reindexing. Those cards can still be missing search records while having updated_at >= cutoff. Using created_at as the primary filter (or explicitly detecting missing Search::Record entries) would avoid leaving some stale cards un-reindexed.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +10
# Reindex cards and comments missing from the search index.
#
# On 2025-11-13 the search tables were rebuilt (search_index_* dropped, empty
# search_records_* created) without a data migration. Cards not created or
# updated after that date have no search records and are invisible to search
# and card-table filtering.
#
# Usage:
# bin/rails runner scripts/reindex_stale_search_records.rb # default cutoff: 2025-11-13
# bin/rails runner scripts/reindex_stale_search_records.rb 2026-01-01 # custom cutoff
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repository already has a script/ directory (including script/maintenance/ and script/migrations/) for operational/one-off scripts. Introducing a separate top-level scripts/ directory makes it easy to miss scripts and splits conventions (shebang + require_relative "../config/environment" vs rails runner). Consider moving this into the existing script/maintenance/ area (or aligning with that structure) to keep script discovery consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +13
cutoff = Date.parse(ARGV[0] || "2025-11-13")

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date.parse will raise on invalid input (and also loses the time-of-day/time-zone information that matters for a migration timestamp). It would be safer to validate the argument and print a short usage message (or default cleanly) when parsing fails, and consider using Time.zone.parse if you want the cutoff to be time-zone aware.

Suggested change
cutoff = Date.parse(ARGV[0] || "2025-11-13")
default_cutoff = Time.zone.parse("2025-11-13")
cutoff_arg = ARGV[0]
cutoff =
if cutoff_arg.nil?
default_cutoff
else
begin
parsed_cutoff = Time.zone.parse(cutoff_arg)
rescue ArgumentError, TypeError
parsed_cutoff = nil
end
if parsed_cutoff.nil?
warn "Invalid cutoff: #{cutoff_arg.inspect}"
warn "Usage: bin/rails runner scripts/reindex_stale_search_records.rb [YYYY-MM-DD or timestamp]"
exit 1
end
parsed_cutoff
end

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +29
comments = Comment.joins(:card).merge(Card.published).where("comments.updated_at < ?", cutoff).includes(:rich_text_body, :card)
comment_count = comments.count
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as cards: using an exclusive < cutoff comparison with a Date parsed to midnight will skip comments updated on the cutoff date. If the intent is to backfill everything created/updated on or before 2025-11-13, the query should treat the cutoff as inclusive (or accept a full timestamp).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 14, 2026 08:09
@jorgemanrubia jorgemanrubia force-pushed the jorge/reindex-stale-search-records branch from b8e36b0 to 9a94ab6 Compare April 14, 2026 08:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +18
cards = Card.published.where("created_at < ?", cutoff).includes(:rich_text_description)
card_count = cards.count
puts "Found #{card_count} cards to reindex"

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexed_card_ids is not scoped to the current account. Since search_records_* tables store multiple accounts per shard, this will pluck IDs for all accounts on the shard, which is both extremely expensive and can cause incorrect “missing” detection. Filter by account_id: account.id when collecting existing records (and similarly for comments below).

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +32

reindexed_comments = 0
comments.find_each do |comment|
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexed_comment_ids is not scoped to the current account, so it will pluck comment IDs for every account on the same search shard. This can blow up memory/SQL size and distort the missing set. Filter by account_id: account.id here as well.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +19
puts "Found #{card_count} cards to reindex"

reindexed_cards = 0
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plucking all indexed IDs into Ruby arrays (pluck(:searchable_id)) can generate huge arrays and very large WHERE NOT IN (...) clauses for big accounts. A more scalable approach is to use a subquery (e.g., where.not(id: shard.where(...).select(:searchable_id))) so the DB can evaluate it without materializing all IDs in memory.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
# bin/rails runner script/maintenance/reindex_stale_search_records.rb # default cutoff: 2026-04-01
# bin/rails runner script/maintenance/reindex_stale_search_records.rb 2026-01-01 # custom cutoff

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage comment points to scripts/reindex_stale_search_records.rb, but this file lives under script/maintenance/. This is likely to cause the wrong script to be run (and there are currently two similarly named scripts). Update the usage line to the correct path, and consider consolidating to a single script location to avoid operator confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +32

reindexed_comments = 0
comments.find_each do |comment|
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same scalability concern for comments: pluck(:searchable_id) materializes all indexed comment IDs into Ruby and can lead to huge memory use / SQL NOT IN lists. Prefer a DB subquery for the exclusion set to keep the work in SQL.

Copilot uses AI. Check for mistakes.
The data import path (Account::DataTransfer) uses insert_all! which
bypasses ActiveRecord callbacks, so imported cards and comments never
get search records created and are invisible to search and card-table
filtering.

This script reindexes all published cards and comments created before
a cutoff date (default 2026-04-01). The reindex method is an upsert
so already-indexed records get a harmless update.

Usage:
  bin/rails runner script/maintenance/reindex_stale_search_records.rb
  bin/rails runner script/maintenance/reindex_stale_search_records.rb 2026-01-01
@jorgemanrubia jorgemanrubia force-pushed the jorge/reindex-stale-search-records branch from 9a94ab6 to daf5e20 Compare April 14, 2026 08:14
@jorgemanrubia jorgemanrubia merged commit dd42f0a into main Apr 14, 2026
13 checks passed
@jorgemanrubia jorgemanrubia deleted the jorge/reindex-stale-search-records branch April 14, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants