Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions script/maintenance/reindex_stale_search_records.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Reindex cards and comments that may be missing from the search index.
#
# The data import path (Account::DataTransfer) uses insert_all! which bypasses
# ActiveRecord callbacks, so imported cards and comments never get search records
# created. This script reindexes cards and comments created before a cutoff date.
#
# Usage:
# bin/rails runner script/maintenance/reindex_stale_search_records.rb # default cutoff: 2025-11-13
# bin/rails runner script/maintenance/reindex_stale_search_records.rb 2026-01-01 # custom cutoff

cutoff = Date.parse(ARGV[0] || "2025-11-13")

puts "Reindexing cards and comments created before #{cutoff}..."

cards = Card.published.where("created_at < ?", cutoff).includes(:rich_text_description)
card_count = cards.count
puts "Found #{card_count} cards to reindex"

Comment on lines +15 to +18
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexed_card_ids is not scoped to the current account. Since search_records_* tables store multiple accounts per shard, this will pluck IDs for all accounts on the shard, which is both extremely expensive and can cause incorrect “missing” detection. Filter by account_id: account.id when collecting existing records (and similarly for comments below).

Copilot uses AI. Check for mistakes.
reindexed_cards = 0
Comment on lines +17 to +19
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plucking all indexed IDs into Ruby arrays (pluck(:searchable_id)) can generate huge arrays and very large WHERE NOT IN (...) clauses for big accounts. A more scalable approach is to use a subquery (e.g., where.not(id: shard.where(...).select(:searchable_id))) so the DB can evaluate it without materializing all IDs in memory.

Copilot uses AI. Check for mistakes.
cards.find_each do |card|
card.reindex
reindexed_cards += 1
print "\rCards: #{reindexed_cards}/#{card_count}" if reindexed_cards % 100 == 0
end
puts "\rCards: #{reindexed_cards}/#{card_count}"

comments = Comment.joins(:card).merge(Card.published).where("comments.created_at < ?", cutoff).includes(:rich_text_body, :card)
comment_count = comments.count
puts "Found #{comment_count} comments to reindex"

reindexed_comments = 0
comments.find_each do |comment|
Comment on lines +30 to +32
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexed_comment_ids is not scoped to the current account, so it will pluck comment IDs for every account on the same search shard. This can blow up memory/SQL size and distort the missing set. Filter by account_id: account.id here as well.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +32
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same scalability concern for comments: pluck(:searchable_id) materializes all indexed comment IDs into Ruby and can lead to huge memory use / SQL NOT IN lists. Prefer a DB subquery for the exclusion set to keep the work in SQL.

Copilot uses AI. Check for mistakes.
comment.reindex
reindexed_comments += 1
print "\rComments: #{reindexed_comments}/#{comment_count}" if reindexed_comments % 100 == 0
end
puts "\rComments: #{reindexed_comments}/#{comment_count}"

puts "Done! Reindexed #{reindexed_cards} cards and #{reindexed_comments} comments."
Loading