Skip to content

VER-304: Optimize get_snippets filters and searching to fix timeout issues#57

Merged
quancao-ea merged 10 commits intomainfrom
fix/optimize-get-snippets-filters
Mar 2, 2026
Merged

VER-304: Optimize get_snippets filters and searching to fix timeout issues#57
quancao-ea merged 10 commits intomainfrom
fix/optimize-get-snippets-filters

Conversation

@rajivsinclair
Copy link
Copy Markdown
Contributor

@rajivsinclair rajivsinclair commented Jan 29, 2026

Summary

Comprehensive database performance optimization fixing timeout errors and improving query efficiency across the application.

Performance Improvements

Function/Query Before After Improvement
get_snippets (starredBy) >30s timeout ~1.5ms 99.99%+
get_snippets (labeledBy) >30s timeout ~50ms 99.8%+
get_snippets (upvotedBy) ~6.7s ~48ms 99.3%
get_snippets (state filter) ~134ms ~50ms 63%
get_snippets (source filter) ~134ms ~50ms 63%
get_filtering_options 3.7s avg ~0.7ms 99.98%
get_trending_topics 3.3s avg ~50ms 98.5%
get_recording_filter_options 1.2s avg ~410ms 65%
fetch_a_snippet_that_has_no_embedding 373ms avg ~98ms 74%

Changes Made

1. Query Pattern Optimizations

  • get_snippets filters: Changed EXISTS subqueries to CTEs + JOINs for starredBy, labeledBy, upvotedBy
  • get_snippets state/source: Changed IN subqueries to CTEs + JOINs (same pattern)
  • get_trending_topics: Changed NOT EXISTS to LEFT JOIN with null check
  • fetch_a_snippet_that_has_no_embedding: Changed to NOT IN with hash pattern

2. Caching

  • Created filter_options_cache materialized view for states/sources/languages
  • Updated get_filtering_options and get_recording_filter_options to use cache
  • Added refresh_filter_options_cache() function for maintenance

3. Index Improvements

Added indexes:

  • idx_comment_reactions_comment_id
  • idx_comments_room_id
  • idx_snippets_stage_1_llm_response
  • idx_user_roles_role
  • idx_user_hide_snippets_snippet
  • idx_snippets_processed_recorded_at (partial index)

Removed unused indexes (~9.5MB saved):

  • idx_audio_files_radio_station (9.5MB) - uses cache now
  • user_hide_snippets_user_idx (16KB) - use snippet index instead
  • idx_label_upvotes_snippet_label_upvoted_by (16KB) - unused composite

Kept sorting indexes (used by get_snippets ORDER BY options):

  • idx_snippets_comment_count, idx_snippets_upvote_count, idx_snippets_like_count, idx_snippets_user_last_activity

4. Maintenance

  • Disabled completed TSV backfill cron job (was running every minute doing nothing)

Migrations Applied

  1. 20260129_optimize_get_snippets_filters.sql - CTE+JOIN pattern for starred/labeled/upvoted filters
  2. 20260129_optimize_filter_functions.sql - Materialized view cache
  3. 20260129_optimize_recording_filter_and_indexes.sql - FK indexes
  4. 20260129_cleanup_unused_indexes.sql - Remove truly unused indexes
  5. 20260130_optimize_state_source_filters.sql - CTE+JOIN pattern for state/source filters

Cache Maintenance

Call refresh_filter_options_cache() after adding new radio stations, states, or languages.

Monitoring

Supabase Logs Explorer

regexp_contains(event_message, 'duration|statement timeout')

pg_stat_statements

SELECT query, calls, round(mean_exec_time::numeric, 2) as avg_ms
FROM pg_stat_statements WHERE calls > 10 ORDER BY mean_exec_time DESC;

Test Plan

  • All filter benchmarks verified
  • Cache populated correctly (183 entries)
  • No timeout errors in recent logs
  • All migrations applied successfully
  • Sorting indexes verified as needed and restored
  • State/source filter optimization applied

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added dislike count tracking for snippets alongside existing like counts.
  • Performance

    • Optimized snippet filtering and search queries for faster results.
    • Implemented caching system for filter options to reduce database load.
    • Introduced optimized trending topics retrieval with improved filter support.
  • Chores

    • Removed deprecated filtering function.
    • Cleaned up unused database indexes.

The starredBy, labeledBy, and upvotedBy filters were causing statement
timeouts (>30s) due to inefficient EXISTS subqueries being evaluated
for every row in the snippets table (119k+ rows).

Root cause:
- EXISTS subqueries like `EXISTS (SELECT 1 FROM user_star_snippets WHERE snippet = s.id)`
  were evaluated for each of the 119k snippets
- This is extremely slow because PostgreSQL scans the large snippets table first,
  then evaluates the EXISTS condition for each row

Solution:
- Changed from EXISTS subqueries to JOINs with pre-filtered CTEs
- Pre-compute the filtered snippet IDs in CTEs (which operate on much smaller tables)
- Then JOIN against these pre-filtered IDs instead of using EXISTS

Performance improvements:
- starredBy filter: timeout (>30s) -> <1s
- labeledBy filter: timeout (>30s) -> <1s
- upvotedBy filter: 6.7s -> <1s
- politicalSpectrum filter: already uses index, no change needed

Technical details:
- Added filter detection flags: has_starred_filter, has_labeled_filter, has_upvoted_filter
- Added pre-filtered CTEs: starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids
- Renamed upvoted filter variables to filter_upvoted_by_me/filter_upvoted_by_others
  to avoid column reference ambiguity with label_summary CTE

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 29, 2026 19:47
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @rajivsinclair, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the performance of the get_snippets PostgreSQL function by resolving timeout issues encountered when applying certain filters. The core problem stemmed from EXISTS subqueries that were inefficiently evaluated against a large dataset. The solution involves refactoring the query to use JOINs with pre-filtered CTEs, which drastically reduces the processing time for filters like "Starred by me/others," "Labeled by me/others," and "Upvoted by me/others," making the application more responsive and preventing statement timeouts.

Highlights

  • Performance Optimization: Addressed critical timeout issues in the get_snippets function, reducing query times from over 30 seconds to under 1 second for starredBy, labeledBy, and upvotedBy filters.
  • Query Refactoring: Replaced inefficient EXISTS subqueries with optimized JOIN operations against pre-filtered Common Table Expressions (CTEs) to improve query plan efficiency.
  • Filter Logic Enhancement: Introduced dedicated boolean flags (has_starred_filter, starred_by_me, etc.) to conditionally apply filter logic, making the query more adaptive and performant.
  • Variable Renaming: Renamed upvotedBy filter variables to filter_upvoted_by_me and filter_upvoted_by_others to prevent column ambiguity within CTEs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces per-row EXISTS/NOT EXISTS checks with precomputed CTEs and JOINs across multiple get_snippets implementations; adds a materialized filter options cache with refresh function; introduces index cleanup and new/updated indexes; enforces auth in filtering functions and updates snippet dislike_count handling.

Changes

Cohort / File(s) Summary
get_snippets implementations
supabase/database/sql/get_snippets_function.sql, supabase/migrations/20260129_optimize_get_snippets_filters.sql, supabase/migrations/20260130_optimize_state_source_filters.sql
Rewrote get_snippets to use pre-filter CTEs (starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids, state_filtered_audio_ids, source_filtered_audio_ids), compute filter-detection flags, replace per-row EXISTS with JOINs, apply language-aware text projections and multi-field search, compute combined upvote/like counts for ordering, and return enriched JSON snippet objects with audio metadata and user flags. Public signature unchanged.
Filter options, trending & cache
supabase/database/sql/get_filtering_options_function.sql, supabase/migrations/20260129_optimize_filter_functions.sql
Added filter_options_cache materialized view, index idx_filter_options_cache_type, and refresh_filter_options_cache(); refactored get_filtering_options to read states/sources/languages from the cache and preserve label pagination; rewrote get_trending_topics to use join-based hidden-snippet exclusion and added auth checks.
Indexing & recording filters
supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql, supabase/migrations/20260129_cleanup_unused_indexes.sql
Dropped legacy get_recording_filter_options(), added several indexes (if not exists) to support joins/queries, and added a migration to drop unused indexes with documented rationale.
Snippet like/dislike update
supabase/database/sql/update_snippet_like_count.sql
Extended the snippet like-count update to also compute and set dislike_count (counts of user_like_snippets.value = -1) alongside existing like_count.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • nhphong

Poem

🐰 I swapped slow EXISTS for CTE rows,
I hopped through JOINs where the wild data grows,
Indexes hum, caches snug in a heap,
Snippets now fetch with a lighter leap,
A carrot code-hop — fast queries, sweet!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly and specifically summarizes the main objective: optimizing get_snippets filters and search functionality to resolve timeout issues, which is the core focus of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/optimize-get-snippets-filters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to c74cc4c in 44 seconds. Click for details.
  • Reviewed 993 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_jc1eRc5h7oRXMSHV

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@supabase/database/sql/get_snippets_function.sql`:
- Line 357: The calculation total_pages := CEIL(total_count::FLOAT / page_size)
can divide by zero when page_size is 0; add input validation at the start of the
function (e.g., check the page_size parameter) and raise a clear exception if
page_size <= 0 or alternatively use NULLIF(page_size,0) in the division to avoid
the error; update the function that declares page_size/total_pages (references
to page_size, total_count, total_pages, CEIL) to include this validation or the
NULLIF-based guard before computing total_pages.
- Around line 306-327: The JSON field name for whether the current user upvoted
a label is inconsistent; update the migration to use the same name used in this
function: change any occurrences of filter_upvoted_by_me to upvoted_by_me so the
label_summary CTE and the jsonb_build_object in snippets_with_labels both emit
'upvoted_by_me' (i.e., ensure the migration's SELECT/alias and
jsonb_build_object key match the label_summary/upvoted_by_me naming).

In `@supabase/migrations/20260129_optimize_get_snippets_filters.sql`:
- Around line 303-324: The label JSON uses the wrong field name: change the
boolean alias and JSON key from filter_upvoted_by_me to upvoted_by_me;
specifically update the SELECT in the label_summary CTE (the expression "lu.id
IS NOT NULL AS filter_upvoted_by_me") to alias as upvoted_by_me, and update the
jsonb_build_object call inside the labels aggregation (currently building
'filter_upvoted_by_me') to use 'upvoted_by_me' so the label JSON produced by
snippets_with_labels matches get_snippets_function.sql and frontend
expectations.
🧹 Nitpick comments (2)
supabase/database/sql/get_snippets_function.sql (2)

295-302: Consider using COALESCE for upvote_count as well.

Line 297 handles NULL for like_count with COALESCE, but upvote_count could also be NULL. For consistency and to avoid potential NULL arithmetic issues:

♻️ Suggested improvement
-                WHEN p_order_by = 'upvotes' THEN s.upvote_count + COALESCE(s.like_count, 0)
+                WHEN p_order_by = 'upvotes' THEN COALESCE(s.upvote_count, 0) + COALESCE(s.like_count, 0)

352-352: Hardcoded dislike_count always returns 0.

This field is hardcoded to 0. If dislike tracking is not implemented, consider either removing this field from the response or adding a comment explaining the intent.

Comment thread supabase/database/sql/get_snippets_function.sql Outdated
) INTO result
FROM snippets_with_labels s;

total_pages := CEIL(total_count::FLOAT / page_size);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential division by zero if page_size is 0.

If a caller passes page_size = 0, this line will raise a division-by-zero error. Consider adding input validation at the start of the function or using NULLIF.

🛡️ Proposed fix with input validation

Add near the beginning of the function (after line 44):

    IF page_size <= 0 THEN
        RAISE EXCEPTION 'page_size must be greater than 0';
    END IF;
🤖 Prompt for AI Agents
In `@supabase/database/sql/get_snippets_function.sql` at line 357, The calculation
total_pages := CEIL(total_count::FLOAT / page_size) can divide by zero when
page_size is 0; add input validation at the start of the function (e.g., check
the page_size parameter) and raise a clear exception if page_size <= 0 or
alternatively use NULLIF(page_size,0) in the division to avoid the error; update
the function that declares page_size/total_pages (references to page_size,
total_count, total_pages, CEIL) to include this validation or the NULLIF-based
guard before computing total_pages.

Comment on lines +303 to +324
label_summary AS (
SELECT
l.id,
CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
sl.upvote_count,
lu.id IS NOT NULL AS filter_upvoted_by_me,
sl.snippet AS snippet_id
FROM snippet_labels sl
JOIN labels l ON l.id = sl.label
LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id
WHERE sl.snippet IN (SELECT id FROM filtered_snippets)
),
snippets_with_labels AS (
SELECT
fs.*,
COALESCE(ld.labels, '[]'::jsonb) AS labels
FROM filtered_snippets fs
LEFT JOIN (
SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels
FROM label_summary
GROUP BY snippet_id
) ld ON fs.id = ld.snippet_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect field name filter_upvoted_by_me in label JSON output.

The label_summary CTE (line 308) and JSON aggregation (line 321) use filter_upvoted_by_me, which is the name of a filter flag variable, not a descriptive name for whether the user upvoted this label. The main function file (get_snippets_function.sql) correctly uses upvoted_by_me.

This will cause the API to return labels with the field name filter_upvoted_by_me instead of the expected upvoted_by_me, potentially breaking frontend consumers.

🐛 Proposed fix to use consistent naming
     label_summary AS (
         SELECT
             l.id,
             CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
             sl.upvote_count,
-            lu.id IS NOT NULL AS filter_upvoted_by_me,
+            lu.id IS NOT NULL AS upvoted_by_me,
             sl.snippet AS snippet_id
         FROM snippet_labels sl
         JOIN labels l ON l.id = sl.label
         LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id
         WHERE sl.snippet IN (SELECT id FROM filtered_snippets)
     ),
     snippets_with_labels AS (
         SELECT
             fs.*,
             COALESCE(ld.labels, '[]'::jsonb) AS labels
         FROM filtered_snippets fs
         LEFT JOIN (
-            SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels
+            SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'upvoted_by_me', upvoted_by_me)) as labels
             FROM label_summary
             GROUP BY snippet_id
         ) ld ON fs.id = ld.snippet_id
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
label_summary AS (
SELECT
l.id,
CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
sl.upvote_count,
lu.id IS NOT NULL AS filter_upvoted_by_me,
sl.snippet AS snippet_id
FROM snippet_labels sl
JOIN labels l ON l.id = sl.label
LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id
WHERE sl.snippet IN (SELECT id FROM filtered_snippets)
),
snippets_with_labels AS (
SELECT
fs.*,
COALESCE(ld.labels, '[]'::jsonb) AS labels
FROM filtered_snippets fs
LEFT JOIN (
SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels
FROM label_summary
GROUP BY snippet_id
) ld ON fs.id = ld.snippet_id
label_summary AS (
SELECT
l.id,
CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
sl.upvote_count,
lu.id IS NOT NULL AS upvoted_by_me,
sl.snippet AS snippet_id
FROM snippet_labels sl
JOIN labels l ON l.id = sl.label
LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id
WHERE sl.snippet IN (SELECT id FROM filtered_snippets)
),
snippets_with_labels AS (
SELECT
fs.*,
COALESCE(ld.labels, '[]'::jsonb) AS labels
FROM filtered_snippets fs
LEFT JOIN (
SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'upvoted_by_me', upvoted_by_me)) as labels
FROM label_summary
GROUP BY snippet_id
) ld ON fs.id = ld.snippet_id
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_get_snippets_filters.sql` around lines
303 - 324, The label JSON uses the wrong field name: change the boolean alias
and JSON key from filter_upvoted_by_me to upvoted_by_me; specifically update the
SELECT in the label_summary CTE (the expression "lu.id IS NOT NULL AS
filter_upvoted_by_me") to alias as upvoted_by_me, and update the
jsonb_build_object call inside the labels aggregation (currently building
'filter_upvoted_by_me') to use 'upvoted_by_me' so the label JSON produced by
snippets_with_labels matches get_snippets_function.sql and frontend
expectations.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c74cc4c67e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread supabase/migrations/20260129_optimize_get_snippets_filters.sql Outdated
Comment on lines +320 to +322
LEFT JOIN (
SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels
FROM label_summary
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve label field name upvoted_by_me

The migration changes the labels JSON to emit filter_upvoted_by_me instead of the previously returned upvoted_by_me. Any client code that reads labels[].upvoted_by_me will now see the field missing and treat all labels as not upvoted by the user. This is a breaking API change that doesn’t appear to be accompanied by client updates in this commit.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the get_snippets database function to fix timeout issues when filtering by starred/labeled/upvoted status. The optimization replaces slow EXISTS subqueries with JOIN-based queries using pre-filtered CTEs, reducing query execution time from >30 seconds (timeout) to <1 second for the affected filters.

Changes:

  • Replaced EXISTS subqueries with LEFT JOIN pattern using pre-filtered CTEs for starred/labeled/upvoted filters
  • Added filter detection boolean flags to enable conditional CTE execution
  • Renamed upvoted filter variables to avoid column name conflicts in the label_summary CTE

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
supabase/migrations/20260129_optimize_get_snippets_filters.sql Migration that applies the optimized function to production database
supabase/database/sql/get_snippets_function.sql Source SQL file that should mirror the migration but has critical inconsistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

END AS text,
CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
sl.upvote_count,
lu.id IS NOT NULL AS upvoted_by_me,
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name in the label_summary CTE is inconsistent between the source SQL file and the migration file. The migration file correctly uses 'filter_upvoted_by_me' (line 308, 321), while the source file uses 'upvoted_by_me' (line 311, 324). This means the source SQL file has not been updated to match the migration that was already applied to production.

The PR description mentions that this rename was necessary "to avoid column reference ambiguity with the label_summary CTE", so the source file should use the same 'filter_upvoted_by_me' naming as the migration.

Copilot uses AI. Check for mistakes.
Comment thread supabase/database/sql/get_snippets_function.sql Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves query performance by optimizing the get_snippets function, replacing inefficient EXISTS subqueries with JOINs on pre-filtered CTEs. However, a critical security issue was identified: the use of SECURITY DEFINER without an explicit search_path could lead to search path hijacking. This should be remediated by adding a SET search_path clause to the function definition in both the migration and the source SQL file. Additionally, further enhancements are needed to address code duplication in result counting and fetching logic, simplify boolean logic within new filter CTEs, investigate a potential regression where dislike_count is hardcoded to 0, and improve clarity for a variable alias in the migration script. Addressing these points will make the new function more maintainable, robust, and secure.

filter_upvoted_by_me := has_upvoted_filter AND p_filter->'upvotedBy' ? 'by_me';
filter_upvoted_by_others := has_upvoted_filter AND p_filter->'upvotedBy' ? 'by_others';

-- Get count using optimized query with JOINs instead of EXISTS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is significant code duplication between the query to get total_count (lines 72-171) and the query to fetch the paginated data (lines 173-355). Both parts define the same CTEs (starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids) and have nearly identical filtering logic in the WHERE clause.

This duplication makes the function difficult to maintain, as any change to the filtering logic must be applied in two places.

Consider refactoring to define the filtered set of snippets once in a CTE, and then reuse that CTE for both counting and pagination. This would look something like this:

WITH
  -- ... your filter ID CTEs (starred_snippet_ids, etc.) ...
  
  filtered_snippets_base AS (
    SELECT s.* -- select all columns needed for ordering and final output
    FROM snippets s
    -- ... all your JOINs and WHERE clauses ...
  ),
  
  count_cte AS (
    SELECT COUNT(*) as total FROM filtered_snippets_base
  ),

  paginated_snippets AS (
    SELECT *
    FROM filtered_snippets_base
    ORDER BY ...
    LIMIT page_size
    OFFSET page * page_size
  )
  
  -- Then use these CTEs to get the count and build the final result set.

This approach would make the function more DRY (Don't Repeat Yourself) and easier to manage.

Comment thread supabase/database/sql/get_snippets_function.sql Outdated
page_size INTEGER,
p_order_by text,
p_search_term text DEFAULT ''
) RETURNS jsonb SECURITY DEFINER AS $$
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The function get_snippets is defined with SECURITY DEFINER but does not set a search_path. This is a security risk in PostgreSQL because it allows the function to be hijacked if an attacker can create objects in a schema that is part of the caller's search_path. For example, an attacker could create a malicious auth.uid() function in their own schema to impersonate other users or bypass security checks.

Recommendation: Add SET search_path = public, extensions, auth; (or the appropriate schemas) to the function definition. This should also be applied to the source file supabase/database/sql/get_snippets_function.sql.

) RETURNS jsonb SECURITY DEFINER SET search_path = public, extensions, auth AS $$

Comment on lines +77 to +81
WHERE has_starred_filter AND (
(starred_by_me AND starred_by_others) OR
(starred_by_me AND NOT starred_by_others AND uss."user" = current_user_id) OR
(starred_by_others AND NOT starred_by_me AND uss."user" != current_user_id)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The boolean logic in this WHERE clause is correct but can be simplified for better readability and maintainability. The same simplification can be applied to the labeled_snippet_ids and upvoted_snippet_ids CTEs.

A simpler equivalent is:

WHERE has_starred_filter AND (
    (starred_by_me AND uss."user" = current_user_id) OR
    (starred_by_others AND uss."user" != current_user_id)
)

This revised logic is more concise and easier to understand at a glance.

        WHERE has_starred_filter AND (
            (starred_by_me AND uss."user" = current_user_id) OR
            (starred_by_others AND uss."user" != current_user_id)
        )

Comment thread supabase/migrations/20260129_optimize_get_snippets_filters.sql Outdated
Additional performance optimizations identified via pg_stat_statements:

1. get_filtering_options: 3.7s avg -> <10ms
   - Root cause: DISTINCT queries on 1M+ row audio_files table
   - Solution: Created materialized view (filter_options_cache) to cache
     states, sources, and languages
   - Added refresh_filter_options_cache() function for updates

2. get_trending_topics: optimized hidden snippets check
   - Changed NOT EXISTS to LEFT JOIN pattern (same as get_snippets fix)
   - Added index on user_hide_snippets.snippet

Performance summary for all optimizations in this PR:
- starredBy filter: timeout (>30s) -> <1s
- labeledBy filter: timeout (>30s) -> <1s
- upvotedBy filter: 6.7s -> <1s
- get_filtering_options: 3.7s -> <10ms

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 34a19c3 in 34 seconds. Click for details.
  • Reviewed 436 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_dWc5kEW3pjvrcJ4j

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@rajivsinclair
Copy link
Copy Markdown
Contributor Author

Additional Optimizations Added

In the second commit, I found and fixed additional slow functions using pg_stat_statements:

get_filtering_options (3.7s avg → <10ms)

Root cause: SELECT DISTINCT queries on the 1M+ row audio_files table to get states and radio stations.

Solution: Created a materialized view filter_options_cache that caches the distinct values. The cache contains:

  • 14 states
  • 59 radio station sources
  • 110 languages

Note: Call SELECT refresh_filter_options_cache(); when new radio stations or states are added.

get_trending_topics (optimized)

Root cause: Same NOT EXISTS pattern for hidden snippets check.

Solution: Changed to LEFT JOIN pattern (same technique as get_snippets).

Summary of All Optimizations

Function/Filter Before After
starredBy filter timeout (>30s) <1s
labeledBy filter timeout (>30s) <1s
upvotedBy filter 6.7s <1s
get_filtering_options 3.7s avg <10ms
get_trending_topics 3.3s avg improved

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@supabase/database/sql/get_filtering_options_function.sql`:
- Around line 60-88: The three SELECTs that read from filter_options_cache
(assigning INTO states, sources, languages) must reference the view with an
explicit schema to avoid search_path hijacking in this SECURITY DEFINER
function; update the three FROM clauses that currently use filter_options_cache
to use public.filter_options_cache so these queries (the ones building jsonb_agg
into the variables states, sources, and languages) consistently match other
schema-qualified references like public.labels and public.audio_files.

In `@supabase/migrations/20260129_optimize_filter_functions.sql`:
- Around line 85-91: Add input validation for p_label_page_size to guard against
zero or negative values: at the start of the routine (before calculating
total_pages or using LIMIT) check if p_label_page_size <= 0 and RAISE EXCEPTION
with a clear message (e.g., "p_label_page_size must be positive");
alternatively, normalize the value using GREATEST(p_label_page_size, 1) where
total_pages is computed and where LIMIT is applied so division by zero and
unexpected LIMIT behavior are prevented. Reference p_label_page_size,
total_labels/total_pages and the later LIMIT usage to ensure both the
CEIL(total_labels::FLOAT / p_label_page_size) calculation and the query LIMIT
are protected.
- Around line 214-227: The hidden_snippets CTE is currently returning snippets
hidden by any user; restrict it to the current user by filtering
user_hide_snippets with the current_user_id (same approach used for
user_star_snippets and user_like_snippets). Update the hidden_snippets CTE (or
the LEFT JOIN to it in filtered_snippets) to include AND
user_hide_snippets."user" = current_user_id so only snippets hidden by the
requesting user are excluded from filtered_snippets.
- Around line 64-68: The SECURITY DEFINER functions get_filtering_options and
get_trending_topics use unqualified object names and must lock down their
search_path to avoid hijacking; update each CREATE OR REPLACE FUNCTION header to
include "SET search_path = public, pg_catalog" (e.g., change the function header
so the SECURITY DEFINER declaration includes the SET search_path clause) so the
function runs with a fixed search_path and still references the same unqualified
names (filter_options_cache, auth.uid(), snippets, audio_files,
user_hide_snippets, snippet_labels, labels).
🧹 Nitpick comments (1)
supabase/migrations/20260129_optimize_filter_functions.sql (1)

55-60: Consider REFRESH ... CONCURRENTLY to avoid blocking reads.

REFRESH MATERIALIZED VIEW takes an exclusive lock; if refresh can overlap with reads, concurrent refresh keeps reads flowing. This requires a unique index that matches row uniqueness.

♻️ Possible adjustment (if uniqueness is guaranteed)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_filter_options_cache_unique
+    ON filter_options_cache(option_type, value, label, secondary_value);
@@
-    REFRESH MATERIALIZED VIEW filter_options_cache;
+    REFRESH MATERIALIZED VIEW CONCURRENTLY filter_options_cache;

Comment on lines +60 to +88
-- Fetch states from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', location_state,
'value', location_state
'label', label,
'value', value
)
) INTO states
FROM unique_states;
FROM filter_options_cache
WHERE option_type = 'states';

-- Fetch unique radio station codes from the audio_files table
WITH unique_sources AS (
SELECT DISTINCT
radio_station_code,
radio_station_name
FROM public.audio_files
WHERE radio_station_code IS NOT NULL
)
-- Fetch sources from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', CASE
WHEN radio_station_name IS NOT NULL
THEN radio_station_name || ' - ' || radio_station_code
ELSE radio_station_code
END,
'value', radio_station_code
'label', label,
'value', value
)
) INTO sources
FROM unique_sources;
FROM filter_options_cache
WHERE option_type = 'sources';

-- Fetch unique primary languages from the snippets table
WITH unique_languages AS (
SELECT DISTINCT language->>'primary_language' AS primary_language
FROM public.snippets
WHERE language IS NOT NULL
)
-- Fetch languages from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', primary_language,
'value', primary_language
'label', label,
'value', value
)
) INTO languages
FROM unique_languages;
FROM filter_options_cache
WHERE option_type = 'languages';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the SQL file
find . -name "get_filtering_options_function.sql" -type f | head -20

Repository: PublicDataWorks/verdad

Length of output: 124


🏁 Script executed:

# Let's also look for the file using git
git ls-files | grep -i "get_filtering_options"

Repository: PublicDataWorks/verdad

Length of output: 122


🏁 Script executed:

cat -n supabase/database/sql/get_filtering_options_function.sql

Repository: PublicDataWorks/verdad

Length of output: 3844


🏁 Script executed:

# Find where filter_options_cache is defined
git ls-files | xargs grep -l "filter_options_cache" | head -20

Repository: PublicDataWorks/verdad

Length of output: 181


🏁 Script executed:

# Search for the CREATE statement of filter_options_cache
rg "CREATE.*filter_options_cache" -A 5

Repository: PublicDataWorks/verdad

Length of output: 1477


🏁 Script executed:

cat -n supabase/migrations/20260129_optimize_filter_functions.sql | head -50

Repository: PublicDataWorks/verdad

Length of output: 1933


🏁 Script executed:

# Check if filter_options_cache is explicitly created with a schema prefix
rg "CREATE.*VIEW.*filter_options_cache" -B 2 -A 2

Repository: PublicDataWorks/verdad

Length of output: 550


🏁 Script executed:

# Check for any explicit schema prefixes in the migration file
cat supabase/migrations/20260129_optimize_filter_functions.sql | grep -E "(filter_options_cache|SET search_path)"

Repository: PublicDataWorks/verdad

Length of output: 423


🏁 Script executed:

# Check if there's a SET search_path clause in the function
cat -n supabase/database/sql/get_filtering_options_function.sql | grep -i "search_path"

Repository: PublicDataWorks/verdad

Length of output: 48


🏁 Script executed:

# Check the full migration file to see if there's a schema-qualified creation
cat supabase/migrations/20260129_optimize_filter_functions.sql | head -100

Repository: PublicDataWorks/verdad

Length of output: 3050


Schema‑qualify filter_options_cache to public.filter_options_cache in this SECURITY DEFINER function.

The function inconsistently qualifies schema names: it explicitly uses public.labels and public.audio_files elsewhere but leaves filter_options_cache unqualified on lines 67, 77, and 87. In SECURITY DEFINER functions, unqualified object names can be hijacked through search_path manipulation (including pg_temp). Qualify the view consistently to prevent that risk.

Proposed fix
-    FROM filter_options_cache
+    FROM public.filter_options_cache
     WHERE option_type = 'states';
 
-    FROM filter_options_cache
+    FROM public.filter_options_cache
     WHERE option_type = 'sources';
 
-    FROM filter_options_cache
+    FROM public.filter_options_cache
     WHERE option_type = 'languages';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- Fetch states from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', location_state,
'value', location_state
'label', label,
'value', value
)
) INTO states
FROM unique_states;
FROM filter_options_cache
WHERE option_type = 'states';
-- Fetch unique radio station codes from the audio_files table
WITH unique_sources AS (
SELECT DISTINCT
radio_station_code,
radio_station_name
FROM public.audio_files
WHERE radio_station_code IS NOT NULL
)
-- Fetch sources from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', CASE
WHEN radio_station_name IS NOT NULL
THEN radio_station_name || ' - ' || radio_station_code
ELSE radio_station_code
END,
'value', radio_station_code
'label', label,
'value', value
)
) INTO sources
FROM unique_sources;
FROM filter_options_cache
WHERE option_type = 'sources';
-- Fetch unique primary languages from the snippets table
WITH unique_languages AS (
SELECT DISTINCT language->>'primary_language' AS primary_language
FROM public.snippets
WHERE language IS NOT NULL
)
-- Fetch languages from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', primary_language,
'value', primary_language
'label', label,
'value', value
)
) INTO languages
FROM unique_languages;
FROM filter_options_cache
WHERE option_type = 'languages';
-- Fetch states from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', label,
'value', value
)
) INTO states
FROM public.filter_options_cache
WHERE option_type = 'states';
-- Fetch sources from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', label,
'value', value
)
) INTO sources
FROM public.filter_options_cache
WHERE option_type = 'sources';
-- Fetch languages from cached view (fast!)
SELECT jsonb_agg(
jsonb_build_object(
'label', label,
'value', value
)
) INTO languages
FROM public.filter_options_cache
WHERE option_type = 'languages';
🤖 Prompt for AI Agents
In `@supabase/database/sql/get_filtering_options_function.sql` around lines 60 -
88, The three SELECTs that read from filter_options_cache (assigning INTO
states, sources, languages) must reference the view with an explicit schema to
avoid search_path hijacking in this SECURITY DEFINER function; update the three
FROM clauses that currently use filter_options_cache to use
public.filter_options_cache so these queries (the ones building jsonb_agg into
the variables states, sources, and languages) consistently match other
schema-qualified references like public.labels and public.audio_files.

Comment on lines +64 to +68
CREATE OR REPLACE FUNCTION get_filtering_options (
p_language TEXT DEFAULT 'english',
p_label_page INT DEFAULT 0,
p_label_page_size INT DEFAULT 5
) RETURNS jsonb SECURITY DEFINER AS $$
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "*20260129_optimize_filter_functions.sql" -o -name "*optimize_filter*" | head -20

Repository: PublicDataWorks/verdad

Length of output: 126


🏁 Script executed:

wc -l ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 130


🏁 Script executed:

sed -n '60,75p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 477


🏁 Script executed:

sed -n '168,180p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 326


🏁 Script executed:

sed -n '160,180p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 589


🏁 Script executed:

sed -n '64,150p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 2389


🏁 Script executed:

sed -n '171,250p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 2944


🏁 Script executed:

grep -n "SET search_path\|SECURITY DEFINER" ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 128


🏁 Script executed:

sed -n '64,175p' ./supabase/migrations/20260129_optimize_filter_functions.sql | grep -A 5 "SECURITY DEFINER"

Repository: PublicDataWorks/verdad

Length of output: 226


🏁 Script executed:

sed -n '1,50p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 1583


🏁 Script executed:

sed -n '100,120p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 637


🏁 Script executed:

sed -n '115,135p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 561


🏁 Script executed:

grep -n "CREATE OR REPLACE FUNCTION" ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 232


🏁 Script executed:

sed -n '56,160p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 3037


🏁 Script executed:

sed -n '165,340p' ./supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 6574


Lock down search_path for both SECURITY DEFINER functions.

These functions reference unqualified objects; with SECURITY DEFINER, that's vulnerable to search_path hijacking (including pg_temp). Add SET search_path = public, pg_catalog to both function definitions.

Applies to:

  • get_filtering_options() (line 68): references unqualified filter_options_cache and auth.uid()
  • get_trending_topics() (line 173): heavily uses unqualified table references (snippets, audio_files, user_hide_snippets, snippet_labels, labels)
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_filter_functions.sql` around lines 64 -
68, The SECURITY DEFINER functions get_filtering_options and get_trending_topics
use unqualified object names and must lock down their search_path to avoid
hijacking; update each CREATE OR REPLACE FUNCTION header to include "SET
search_path = public, pg_catalog" (e.g., change the function header so the
SECURITY DEFINER declaration includes the SET search_path clause) so the
function runs with a fixed search_path and still references the same unqualified
names (filter_options_cache, auth.uid(), snippets, audio_files,
user_hide_snippets, snippet_labels, labels).

Comment on lines +85 to +91
-- Fetch total number of labels
SELECT COUNT(*) INTO total_labels
FROM public.labels;

-- Calculate total pages
total_pages := CEIL(total_labels::FLOAT / p_label_page_size);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "*20260129*optimize_filter_functions.sql" -o -name "*.sql" | head -20

Repository: PublicDataWorks/verdad

Length of output: 1182


🏁 Script executed:

cd supabase/migrations 2>/dev/null && ls -la | grep -E "20260129|optimize" || echo "Directory not found, searching from root"

Repository: PublicDataWorks/verdad

Length of output: 247


🏁 Script executed:

find . -type f -name "*.sql" | grep -i "filter\|label" | head -20

Repository: PublicDataWorks/verdad

Length of output: 588


🏁 Script executed:

cat -n supabase/migrations/20260129_optimize_filter_functions.sql | head -120

Repository: PublicDataWorks/verdad

Length of output: 4452


🏁 Script executed:

cat -n supabase/migrations/20260129_optimize_filter_functions.sql | sed -n '85,150p'

Repository: PublicDataWorks/verdad

Length of output: 2286


🏁 Script executed:

grep -n "p_label_page_size" supabase/migrations/20260129_optimize_filter_functions.sql

Repository: PublicDataWorks/verdad

Length of output: 291


🏁 Script executed:

grep -r "p_label_page_size" supabase/ --include="*.sql"

Repository: PublicDataWorks/verdad

Length of output: 953


🏁 Script executed:

cat -n supabase/migrations/20260129_optimize_filter_functions.sql | sed -n '64,84p'

Repository: PublicDataWorks/verdad

Length of output: 777


🏁 Script executed:

cat -n supabase/database/sql/get_filtering_options_function.sql | head -100

Repository: PublicDataWorks/verdad

Length of output: 3747


Guard against p_label_page_size <= 0 (division by zero).

The parameter accepts any INT value including zero or negative. At line 90, CEIL(total_labels::FLOAT / p_label_page_size) will fail with division by zero error. Additionally, the LIMIT clause at line 106 would also behave unexpectedly with invalid page sizes. Add input validation:

Proposed guard
     current_user_id := auth.uid();
     IF current_user_id IS NULL THEN
         RAISE EXCEPTION 'Only logged-in users can call this function';
     END IF;
+
+    IF p_label_page_size <= 0 THEN
+        RAISE EXCEPTION 'p_label_page_size must be > 0';
+    END IF;
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_filter_functions.sql` around lines 85 -
91, Add input validation for p_label_page_size to guard against zero or negative
values: at the start of the routine (before calculating total_pages or using
LIMIT) check if p_label_page_size <= 0 and RAISE EXCEPTION with a clear message
(e.g., "p_label_page_size must be positive"); alternatively, normalize the value
using GREATEST(p_label_page_size, 1) where total_pages is computed and where
LIMIT is applied so division by zero and unexpected LIMIT behavior are
prevented. Reference p_label_page_size, total_labels/total_pages and the later
LIMIT usage to ensure both the CEIL(total_labels::FLOAT / p_label_page_size)
calculation and the query LIMIT are protected.

Additional optimizations identified via pg_stat_statements and Supabase Database Advisor:

1. get_recording_filter_options: 1.2s avg → ~410ms (65% improvement)
   - Uses filter_options_cache for states/sources/languages
   - Labels subquery still requires JOIN but is acceptable

2. Added indexes for unindexed foreign keys:
   - idx_comment_reactions_comment_id
   - idx_comments_room_id
   - idx_snippets_stage_1_llm_response
   - idx_user_roles_role

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 8aa07ac in 30 seconds. Click for details.
  • Reviewed 75 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_yz8meOipEINCubtZ

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql`:
- Line 46: The function is declared with "LANGUAGE plpgsql SECURITY DEFINER" but
lacks an explicit search_path, which allows an attacker to shadow objects like
filter_options_cache and labels; update the function body (the block ending with
"$$ LANGUAGE plpgsql SECURITY DEFINER;") to set a safe search_path at the start
(for example add "PERFORM set_config('search_path','public,pg_temp', true);" or
"SET LOCAL search_path = public, pg_temp;" as the first statements) so all
subsequent references (filter_options_cache, labels, etc.) resolve to the
intended schemas before any privileged operations run.
- Around line 29-42: The jsonb_agg of label objects may not respect the ORDER BY
inside the inner derived table, so change the aggregation to include ordering:
move or add ORDER BY l.text inside the jsonb_agg call that builds the 'labels'
field (i.e., ensure jsonb_agg(jsonb_build_object(... ) ORDER BY l.text) is used
on the SELECT that references alias l), and similarly add an ORDER BY inside the
jsonb_agg for the radio_stations aggregation to guarantee stable ordering;
update the SELECT that builds these aggregates (the labels aggregation and the
radio_stations aggregation) to use jsonb_agg(... ORDER BY <column>) referencing
l.text and the radio station sort key respectively.
🧹 Nitpick comments (1)
supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql (1)

33-41: Labels query doesn't use cache—verify this is intentional.

States, radio_stations, and languages are fetched from the filter_options_cache materialized view, but labels still performs a 3-way JOIN on live tables. If labels change frequently, this may be intentional. However, if this query is still slow, consider adding labels to the cache or pre-computing them.

Also, the LIMIT 100 lacks a deterministic ordering tie-breaker—if multiple labels share the same text, which 100 are returned is non-deterministic.

) INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SECURITY DEFINER without SET search_path is a security risk.

Functions marked SECURITY DEFINER run with the privileges of the function owner. Without explicitly setting search_path, an attacker could manipulate the search path to substitute malicious objects for filter_options_cache, labels, etc.

🔒 Proposed fix
 END;
-$$ LANGUAGE plpgsql SECURITY DEFINER;
+$$ LANGUAGE plpgsql SECURITY DEFINER SET search_path = public;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$$ LANGUAGE plpgsql SECURITY DEFINER;
$$ LANGUAGE plpgsql SECURITY DEFINER SET search_path = public;
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql` at
line 46, The function is declared with "LANGUAGE plpgsql SECURITY DEFINER" but
lacks an explicit search_path, which allows an attacker to shadow objects like
filter_options_cache and labels; update the function body (the block ending with
"$$ LANGUAGE plpgsql SECURITY DEFINER;") to set a safe search_path at the start
(for example add "PERFORM set_config('search_path','public,pg_temp', true);" or
"SET LOCAL search_path = public, pg_temp;" as the first statements) so all
subsequent references (filter_options_cache, labels, etc.) resolve to the
intended schemas before any privileged operations run.

Additional optimizations:
1. Disabled completed TSV backfill cron job (was running every minute doing nothing)
2. Optimized fetch_a_snippet_that_has_no_embedding: 373ms → 98ms (74% faster)
3. Added idx_snippets_processed_recorded_at partial index
4. Removed 7 unused indexes saving ~17MB storage:
   - idx_audio_files_radio_station (9.5MB)
   - idx_snippets_user_last_activity (2MB)
   - idx_snippets_comment_count (2MB)
   - idx_snippets_upvote_count (2MB)
   - idx_snippets_like_count (2MB)
   - user_hide_snippets_user_idx (16KB)
   - idx_label_upvotes_snippet_label_upvoted_by (16KB)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 05ecffa in 39 seconds. Click for details.
  • Reviewed 38 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_9HhY0diaiyx9xic4

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

The following indexes were incorrectly removed - they ARE used:
- idx_snippets_comment_count (p_order_by = 'comments')
- idx_snippets_upvote_count (p_order_by = 'upvotes')
- idx_snippets_like_count (p_order_by = 'upvotes')
- idx_snippets_user_last_activity (p_order_by = 'activities')

They showed 0 scans because users may rarely use these sort options,
but they should be kept for when they do.

Only safe removals (~9.5MB saved):
- idx_audio_files_radio_station (uses cache now)
- user_hide_snippets_user_idx (use snippet index instead)
- idx_label_upvotes_snippet_label_upvoted_by (unused composite)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 44cb67a in 34 seconds. Click for details.
  • Reviewed 53 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_30lO2tNMgtAMPK4V

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Apply same CTE+JOIN optimization used for starred/labeled/upvoted filters
to state and source filters. This avoids IN subqueries that scan all
19k+ snippets before filtering by audio_files.location_state.

Expected improvement: ~134ms -> <50ms for state filter queries

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed c875082 in 44 seconds. Click for details.
  • Reviewed 576 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_bFoCVdn23MIc5qY7

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql`:
- Line 380: Add input validation to guard against division-by-zero: immediately
after the authentication check (around where the function starts), check the
page_size parameter and raise an exception if page_size <= 0 so the later
computation total_pages := CEIL(total_count::FLOAT / page_size) cannot divide by
zero; reference the page_size parameter and the total_pages assignment when
applying this change.
- Around line 108-119: The migration references a non-existent index
idx_audio_files_location_state used by the state_filtered_audio_ids CTE to speed
up filtering; add a CREATE INDEX for idx_audio_files_location_state on
audio_files(location_state) before the CTE (or, if the optimization is not
intended, update the comment to remove the reference), ensuring the index name
exactly matches idx_audio_files_location_state and targets the audio_files
table's location_state column so the state filter can use it.
- Around line 88-106: The labeled_snippet_ids CTE is incorrectly using the
upvoter column (lu.upvoted_by) to decide who "labeled" a snippet; change those
predicates to use the label creator column sl.applied_by instead. Specifically,
in the labeled_snippet_ids CTE replace conditions that check lu.upvoted_by =
current_user_id and lu.upvoted_by != current_user_id with sl.applied_by =
current_user_id and sl.applied_by != current_user_id respectively, and apply the
same replacement in the second occurrence of the labeled_snippet_ids CTE
elsewhere in the diff.
🧹 Nitpick comments (2)
supabase/migrations/20260130_optimize_state_source_filters.sql (1)

186-227: Consider consolidating duplicated CTEs to reduce maintenance burden.

The five filter CTEs (starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids, state_filtered_audio_ids, source_filtered_audio_ids) are duplicated verbatim between the count query (lines 79-119) and the data query (lines 187-227). This creates ~80 lines of duplication and increases the risk of divergence during future modifications.

While PostgreSQL doesn't share CTEs across separate statements, consider refactoring to a single WITH block that computes both count and paginated data:

WITH
  -- filter CTEs defined once
  ...,
  base_filtered AS (...),
  count_result AS (SELECT COUNT(*) FROM base_filtered),
  paginated AS (SELECT * FROM base_filtered ORDER BY ... LIMIT ... OFFSET ...)
SELECT ...

This is optional—the current approach may yield better query plans in some cases.

supabase/database/sql/get_snippets_function.sql (1)

1-22: Ensure migration and canonical function files stay synchronized.

This file (get_snippets_function.sql) and the migration (20260130_optimize_state_source_filters.sql) both define get_snippets. Any future changes must be applied to both files to avoid drift. Consider documenting this relationship or using a build process that generates the migration from the canonical file.

Comment on lines +88 to +106
labeled_snippet_ids AS (
SELECT DISTINCT sl.snippet
FROM snippet_labels sl
JOIN label_upvotes lu ON lu.snippet_label = sl.id
WHERE has_labeled_filter AND (
(labeled_by_me AND labeled_by_others) OR
(labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id)
)
),
upvoted_snippet_ids AS (
SELECT DISTINCT sl.snippet
FROM snippet_labels sl
JOIN label_upvotes lu ON lu.snippet_label = sl.id
WHERE has_upvoted_filter AND (
(filter_upvoted_by_me AND filter_upvoted_by_others) OR
(filter_upvoted_by_me AND NOT filter_upvoted_by_others AND lu.upvoted_by = current_user_id) OR
(filter_upvoted_by_others AND NOT filter_upvoted_by_me AND lu.upvoted_by != current_user_id)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the migration file name and read it
echo "=== Reading the migration file ==="
fd "20260130_optimize_state_source_filters.sql" -x cat -n

echo ""
echo "=== Searching for snippet_labels table definition ==="
rg -n "CREATE TABLE.*snippet_labels" --type sql -A 25

echo ""
echo "=== Searching for label_upvotes table definition ==="
rg -n "CREATE TABLE.*label_upvotes" --type sql -A 20

Repository: PublicDataWorks/verdad

Length of output: 20613


🏁 Script executed:

#!/bin/bash

echo "=== Searching for all migration files that might define snippet_labels ==="
fd "\.sql$" -x grep -l "snippet_labels" {} \;

echo ""
echo "=== Looking for snippet_labels and label_upvotes table definitions ==="
rg "snippet_labels|label_upvotes" --type sql -B 2 | head -100

echo ""
echo "=== Check if snippet_labels has a created_by or user column ==="
rg "created_by|user.*snippet_labels" --type sql -C 3

Repository: PublicDataWorks/verdad

Length of output: 15097


🏁 Script executed:

#!/bin/bash

echo "=== Complete snippet_labels table definition ==="
rg "CREATE TABLE.*snippet_labels" --type sql -A 30

echo ""
echo "=== Complete label_upvotes table definition ==="
rg "CREATE TABLE.*label_upvotes" --type sql -A 30

echo ""
echo "=== Check snippet_labels columns ==="
rg "snippet_labels" --type sql | grep -E "created_by|upvoted_by|applied_by" | head -20

Repository: PublicDataWorks/verdad

Length of output: 827


🏁 Script executed:

#!/bin/bash

echo "=== Full snippet_labels table definition from remote_schema ==="
rg "CREATE TABLE.*snippet_labels" --type sql -A 50 supabase/migrations/20241029135348_remote_schema.sql

echo ""
echo "=== Full label_upvotes table definition from remote_schema ==="
rg "CREATE TABLE.*label_upvotes" --type sql -A 50 supabase/migrations/20241029135348_remote_schema.sql

echo ""
echo "=== Search for 'applied_by' to understand snippet_labels structure ==="
rg "applied_by" --type sql -B 5 -A 2 supabase/migrations/20241029135348_remote_schema.sql | head -60

Repository: PublicDataWorks/verdad

Length of output: 1890


🏁 Script executed:

#!/bin/bash

echo "=== Search for label_upvotes definition with broader pattern ==="
rg "label_upvotes" --type sql supabase/migrations/20241029135348_remote_schema.sql -B 5 -A 10 | head -80

echo ""
echo "=== Search for CREATE TABLE around line 2000-3000 ==="
cat supabase/migrations/20241029135348_remote_schema.sql | grep -n "label_upvotes" | head -20

Repository: PublicDataWorks/verdad

Length of output: 5662


Fix labeled_snippet_ids CTE to filter by label creator, not upvoter.

The labeled_snippet_ids CTE (lines 88-97) currently filters by lu.upvoted_by, which checks who upvoted the label. To match the semantic intent of "labeledBy", it should filter by sl.applied_by to check who applied the label to the snippet.

Change line 94 from:

(labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id)

To:

(labeled_by_me AND NOT labeled_by_others AND sl.applied_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND sl.applied_by != current_user_id)

Repeat the same fix at lines 202-203 in the second occurrence of labeled_snippet_ids.

🤖 Prompt for AI Agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql` around lines
88 - 106, The labeled_snippet_ids CTE is incorrectly using the upvoter column
(lu.upvoted_by) to decide who "labeled" a snippet; change those predicates to
use the label creator column sl.applied_by instead. Specifically, in the
labeled_snippet_ids CTE replace conditions that check lu.upvoted_by =
current_user_id and lu.upvoted_by != current_user_id with sl.applied_by =
current_user_id and sl.applied_by != current_user_id respectively, and apply the
same replacement in the second occurrence of the labeled_snippet_ids CTE
elsewhere in the diff.

) INTO result
FROM snippets_with_labels s;

total_pages := CEIL(total_count::FLOAT / page_size);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation to prevent division by zero when page_size is 0.

If a caller passes page_size = 0, line 380 will raise a division-by-zero error. Add input validation early in the function.

🛡️ Proposed fix with input validation

Add after the authentication check (around line 41):

    IF page_size <= 0 THEN
        RAISE EXCEPTION 'page_size must be greater than 0';
    END IF;
🤖 Prompt for AI Agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql` at line 380,
Add input validation to guard against division-by-zero: immediately after the
authentication check (around where the function starts), check the page_size
parameter and raise an exception if page_size <= 0 so the later computation
total_pages := CEIL(total_count::FLOAT / page_size) cannot divide by zero;
reference the page_size parameter and the total_pages assignment when applying
this change.

The update_snippet_like_count() function now tracks both likes
(value=1) and dislikes (value=-1) separately, updating the
dislike_count column alongside the existing like_count when
user_like_snippets changes.
Removes the legacy get_recording_filter_options() function which has
zero callers in the codebase and no usage in production logs. The
frontend uses get_filtering_options() instead. Also adds missing
foreign key indexes identified by Supabase Database Advisor to improve
JOIN performance.
- Consolidate two separate queries (count + data) into a single CTE chain,
  eliminating duplicate filter CTEs that ran twice per call
- Use lightweight filtered_snippets CTE (IDs + sort columns only) for count
  and pagination, then fetch heavy columns for ~20 paginated rows only
- Fix dislike_count: use denormalized column instead of hardcoded 0
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql (1)

4-4: Schema-qualify the dropped function for deterministic migrations.

The unqualified DROP FUNCTION depends on search_path and should explicitly target public.get_recording_filter_options(). While IF EXISTS provides safety, schema qualification ensures deterministic execution across different database configurations.

🔧 Proposed change
-DROP FUNCTION IF EXISTS get_recording_filter_options();
+DROP FUNCTION IF EXISTS public.get_recording_filter_options();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql` at
line 4, Replace the unqualified DROP FUNCTION with a schema-qualified drop to
make migrations deterministic: change the DROP of get_recording_filter_options()
to explicitly target public.get_recording_filter_options() (keeping the IF
EXISTS) so the migration does not depend on search_path resolution; update the
statement around the existing DROP FUNCTION IF EXISTS
get_recording_filter_options() accordingly.
supabase/database/sql/update_snippet_like_count.sql (1)

4-19: Consider consolidating the two COUNT subqueries into a single scan.

Both subqueries scan user_like_snippets for the same snippet. Using conditional aggregation reduces this to one table access per trigger invocation.

♻️ Proposed optimization using conditional aggregation
     UPDATE snippets
-    SET
-        like_count = (
-            SELECT COUNT(*)
-            FROM user_like_snippets
-            WHERE snippet = COALESCE(NEW.snippet, OLD.snippet)
-            AND value = 1
-        ),
-        dislike_count = (
-            SELECT COUNT(*)
-            FROM user_like_snippets
-            WHERE snippet = COALESCE(NEW.snippet, OLD.snippet)
-            AND value = -1
-        ),
+    SET
+        like_count = sub.like_count,
+        dislike_count = sub.dislike_count,
         user_last_activity = NOW()
-    WHERE id = COALESCE(NEW.snippet, OLD.snippet);
+    FROM (
+        SELECT
+            COUNT(*) FILTER (WHERE value = 1) AS like_count,
+            COUNT(*) FILTER (WHERE value = -1) AS dislike_count
+        FROM user_like_snippets
+        WHERE snippet = COALESCE(NEW.snippet, OLD.snippet)
+    ) sub
+    WHERE id = COALESCE(NEW.snippet, OLD.snippet);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@supabase/database/sql/update_snippet_like_count.sql` around lines 4 - 19,
Replace the two separate COUNT subqueries in update_snippet_like_count.sql with
a single aggregated scan over user_like_snippets using conditional aggregation:
compute like_count and dislike_count in one SELECT (e.g., SUM(CASE WHEN value =
1 THEN 1 ELSE 0 END) and SUM(CASE WHEN value = -1 THEN 1 ELSE 0 END)) for the
snippet identified by COALESCE(NEW.snippet, OLD.snippet) and assign those
results to snippets.like_count and snippets.dislike_count while still updating
user_last_activity = NOW(); keep references to NEW/OLD.snippet and the snippets
table unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@supabase/database/sql/get_snippets_function.sql`:
- Around line 136-143: The LEFT JOIN to user_hide_snippets (alias uhs) is
missing a per-user filter so any user's hide will hide the snippet for everyone;
modify the JOIN condition in get_snippets_function.sql to include the current
user identifier (e.g., add "AND uhs.user_id = <current_user_param>" using the
function's user param such as p_user_id or equivalent) so uhs only matches hides
by the requesting user, and apply the same change at the other occurrence noted
around line 270.
- Around line 96-115: The labeled_snippet_ids CTE is using label_upvotes
(lu.upvoted_by) instead of the label applicant field, causing "labeled by me" to
mirror "upvoted by me"; change the logic in labeled_snippet_ids to check
snippet_labels.applied_by (sl.applied_by) against current_user_id and
remove/avoid relying on lu.upvoted_by for those labeled_by_* conditions (you can
drop the join to label_upvotes or keep it only if needed for other checks), and
update the WHERE clause to use (labeled_by_me AND NOT labeled_by_others AND
sl.applied_by = current_user_id) and (labeled_by_others AND NOT labeled_by_me
AND sl.applied_by != current_user_id) while preserving has_labeled_filter and
DISTINCT sl.snippet.

---

Nitpick comments:
In `@supabase/database/sql/update_snippet_like_count.sql`:
- Around line 4-19: Replace the two separate COUNT subqueries in
update_snippet_like_count.sql with a single aggregated scan over
user_like_snippets using conditional aggregation: compute like_count and
dislike_count in one SELECT (e.g., SUM(CASE WHEN value = 1 THEN 1 ELSE 0 END)
and SUM(CASE WHEN value = -1 THEN 1 ELSE 0 END)) for the snippet identified by
COALESCE(NEW.snippet, OLD.snippet) and assign those results to
snippets.like_count and snippets.dislike_count while still updating
user_last_activity = NOW(); keep references to NEW/OLD.snippet and the snippets
table unchanged.

In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql`:
- Line 4: Replace the unqualified DROP FUNCTION with a schema-qualified drop to
make migrations deterministic: change the DROP of get_recording_filter_options()
to explicitly target public.get_recording_filter_options() (keeping the IF
EXISTS) so the migration does not depend on search_path resolution; update the
statement around the existing DROP FUNCTION IF EXISTS
get_recording_filter_options() accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c875082 and 08a45e7.

📒 Files selected for processing (3)
  • supabase/database/sql/get_snippets_function.sql
  • supabase/database/sql/update_snippet_like_count.sql
  • supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql

Comment on lines +96 to 115
labeled_snippet_ids AS (
SELECT DISTINCT sl.snippet
FROM snippet_labels sl
JOIN label_upvotes lu ON lu.snippet_label = sl.id
WHERE has_labeled_filter AND (
(labeled_by_me AND labeled_by_others) OR
(labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id)
)
),
upvoted_snippet_ids AS (
SELECT DISTINCT sl.snippet
FROM snippet_labels sl
JOIN label_upvotes lu ON lu.snippet_label = sl.id
WHERE has_upvoted_filter AND (
(filter_upvoted_by_me AND filter_upvoted_by_others) OR
(filter_upvoted_by_me AND NOT filter_upvoted_by_others AND lu.upvoted_by = current_user_id) OR
(filter_upvoted_by_others AND NOT filter_upvoted_by_me AND lu.upvoted_by != current_user_id)
)
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check table structures for snippet_labels and label_upvotes
echo "=== Full Database Schema (SQL files) ==="
find . -name "*.sql" -type f | head -20

echo ""
echo "=== Searching for snippet_labels table definition ==="
rg -n "CREATE TABLE.*snippet_labels" --type sql -A 25

echo ""
echo "=== Searching for label_upvotes table definition ==="
rg -n "CREATE TABLE.*label_upvotes" --type sql -A 25

Repository: PublicDataWorks/verdad

Length of output: 1332


🏁 Script executed:

#!/bin/bash
echo "=== All SQL files in database directory ==="
find ./supabase/database -name "*.sql" -type f | sort

echo ""
echo "=== Search for 'snippet_labels' mentions ==="
rg "snippet_labels" --type sql -B 2 -A 2 | head -60

echo ""
echo "=== Search for 'label_upvotes' mentions ==="
rg "label_upvotes" --type sql -B 2 -A 2 | head -60

echo ""
echo "=== Check for migrations directory ==="
find ./supabase/database -type d | sort

Repository: PublicDataWorks/verdad

Length of output: 13528


🏁 Script executed:

#!/bin/bash
echo "=== snippet_labels table definition ==="
rg "create table.*snippet_labels" --type sql -A 15 supabase/migrations/

echo ""
echo "=== label_upvotes table definition ==="
rg "create table.*label_upvotes" --type sql -A 15 supabase/migrations/

echo ""
echo "=== Check applied_by column in snippet_labels ==="
rg "applied_by" supabase/migrations/ --type sql -B 2 -A 2

Repository: PublicDataWorks/verdad

Length of output: 5524


🏁 Script executed:

#!/bin/bash
echo "=== Extract labeled_snippet_ids and upvoted_snippet_ids from get_snippets_function.sql ==="
sed -n '96,115p' supabase/database/sql/get_snippets_function.sql

echo ""
echo "=== Check how labeled_by_me and filter_upvoted_by_me are defined ==="
rg "labeled_by_me|filter_upvoted_by_me" supabase/database/sql/get_snippets_function.sql -B 2 -A 2

Repository: PublicDataWorks/verdad

Length of output: 2702


Fix labeled_snippet_ids CTE to filter by label applicant, not upvote endorser.

Both labeled_snippet_ids and upvoted_snippet_ids CTEs currently use identical logic, joining snippet_labels with label_upvotes and filtering on lu.upvoted_by. The schema distinguishes between label application (snippet_labels.applied_by) and upvote endorsement (label_upvotes.upvoted_by). The labeled_snippet_ids CTE should filter snippets where the current user applied the label (sl.applied_by = current_user_id), not where they upvoted it. Currently, both filters are semantically equivalent, causing the "labeled by me" filter to behave identically to "upvoted by me."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@supabase/database/sql/get_snippets_function.sql` around lines 96 - 115, The
labeled_snippet_ids CTE is using label_upvotes (lu.upvoted_by) instead of the
label applicant field, causing "labeled by me" to mirror "upvoted by me"; change
the logic in labeled_snippet_ids to check snippet_labels.applied_by
(sl.applied_by) against current_user_id and remove/avoid relying on
lu.upvoted_by for those labeled_by_* conditions (you can drop the join to
label_upvotes or keep it only if needed for other checks), and update the WHERE
clause to use (labeled_by_me AND NOT labeled_by_others AND sl.applied_by =
current_user_id) and (labeled_by_others AND NOT labeled_by_me AND sl.applied_by
!= current_user_id) while preserving has_labeled_filter and DISTINCT sl.snippet.

@PublicDataWorks PublicDataWorks deleted a comment from coderabbitai Bot Feb 28, 2026
@PublicDataWorks PublicDataWorks deleted a comment from coderabbitai Bot Feb 28, 2026
@PublicDataWorks PublicDataWorks deleted a comment from coderabbitai Bot Feb 28, 2026
@PublicDataWorks PublicDataWorks deleted a comment from coderabbitai Bot Feb 28, 2026
@quancao-ea quancao-ea changed the title Optimize get_snippets filters to fix timeout issues VER-304: Optimize get_snippets filters to fix timeout issues Mar 2, 2026
@linear
Copy link
Copy Markdown

linear Bot commented Mar 2, 2026

@quancao-ea quancao-ea changed the title VER-304: Optimize get_snippets filters to fix timeout issues VER-304: Optimize get_snippets filters and searching to fix timeout issues Mar 2, 2026
@quancao-ea quancao-ea merged commit 94e0225 into main Mar 2, 2026
2 checks passed
@quancao-ea quancao-ea deleted the fix/optimize-get-snippets-filters branch March 17, 2026 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants