VER-304: Optimize get_snippets filters and searching to fix timeout issues#57
VER-304: Optimize get_snippets filters and searching to fix timeout issues#57quancao-ea merged 10 commits intomainfrom
Conversation
The starredBy, labeledBy, and upvotedBy filters were causing statement timeouts (>30s) due to inefficient EXISTS subqueries being evaluated for every row in the snippets table (119k+ rows). Root cause: - EXISTS subqueries like `EXISTS (SELECT 1 FROM user_star_snippets WHERE snippet = s.id)` were evaluated for each of the 119k snippets - This is extremely slow because PostgreSQL scans the large snippets table first, then evaluates the EXISTS condition for each row Solution: - Changed from EXISTS subqueries to JOINs with pre-filtered CTEs - Pre-compute the filtered snippet IDs in CTEs (which operate on much smaller tables) - Then JOIN against these pre-filtered IDs instead of using EXISTS Performance improvements: - starredBy filter: timeout (>30s) -> <1s - labeledBy filter: timeout (>30s) -> <1s - upvotedBy filter: 6.7s -> <1s - politicalSpectrum filter: already uses index, no change needed Technical details: - Added filter detection flags: has_starred_filter, has_labeled_filter, has_upvoted_filter - Added pre-filtered CTEs: starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids - Renamed upvoted filter variables to filter_upvoted_by_me/filter_upvoted_by_others to avoid column reference ambiguity with label_summary CTE Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @rajivsinclair, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the performance of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces per-row EXISTS/NOT EXISTS checks with precomputed CTEs and JOINs across multiple get_snippets implementations; adds a materialized filter options cache with refresh function; introduces index cleanup and new/updated indexes; enforces auth in filtering functions and updates snippet dislike_count handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to c74cc4c in 44 seconds. Click for details.
- Reviewed
993lines of code in2files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_jc1eRc5h7oRXMSHV
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@supabase/database/sql/get_snippets_function.sql`:
- Line 357: The calculation total_pages := CEIL(total_count::FLOAT / page_size)
can divide by zero when page_size is 0; add input validation at the start of the
function (e.g., check the page_size parameter) and raise a clear exception if
page_size <= 0 or alternatively use NULLIF(page_size,0) in the division to avoid
the error; update the function that declares page_size/total_pages (references
to page_size, total_count, total_pages, CEIL) to include this validation or the
NULLIF-based guard before computing total_pages.
- Around line 306-327: The JSON field name for whether the current user upvoted
a label is inconsistent; update the migration to use the same name used in this
function: change any occurrences of filter_upvoted_by_me to upvoted_by_me so the
label_summary CTE and the jsonb_build_object in snippets_with_labels both emit
'upvoted_by_me' (i.e., ensure the migration's SELECT/alias and
jsonb_build_object key match the label_summary/upvoted_by_me naming).
In `@supabase/migrations/20260129_optimize_get_snippets_filters.sql`:
- Around line 303-324: The label JSON uses the wrong field name: change the
boolean alias and JSON key from filter_upvoted_by_me to upvoted_by_me;
specifically update the SELECT in the label_summary CTE (the expression "lu.id
IS NOT NULL AS filter_upvoted_by_me") to alias as upvoted_by_me, and update the
jsonb_build_object call inside the labels aggregation (currently building
'filter_upvoted_by_me') to use 'upvoted_by_me' so the label JSON produced by
snippets_with_labels matches get_snippets_function.sql and frontend
expectations.
🧹 Nitpick comments (2)
supabase/database/sql/get_snippets_function.sql (2)
295-302: Consider usingCOALESCEforupvote_countas well.Line 297 handles
NULLforlike_countwithCOALESCE, butupvote_countcould also beNULL. For consistency and to avoid potential NULL arithmetic issues:♻️ Suggested improvement
- WHEN p_order_by = 'upvotes' THEN s.upvote_count + COALESCE(s.like_count, 0) + WHEN p_order_by = 'upvotes' THEN COALESCE(s.upvote_count, 0) + COALESCE(s.like_count, 0)
352-352: Hardcodeddislike_countalways returns 0.This field is hardcoded to 0. If dislike tracking is not implemented, consider either removing this field from the response or adding a comment explaining the intent.
| ) INTO result | ||
| FROM snippets_with_labels s; | ||
|
|
||
| total_pages := CEIL(total_count::FLOAT / page_size); |
There was a problem hiding this comment.
Potential division by zero if page_size is 0.
If a caller passes page_size = 0, this line will raise a division-by-zero error. Consider adding input validation at the start of the function or using NULLIF.
🛡️ Proposed fix with input validation
Add near the beginning of the function (after line 44):
IF page_size <= 0 THEN
RAISE EXCEPTION 'page_size must be greater than 0';
END IF;🤖 Prompt for AI Agents
In `@supabase/database/sql/get_snippets_function.sql` at line 357, The calculation
total_pages := CEIL(total_count::FLOAT / page_size) can divide by zero when
page_size is 0; add input validation at the start of the function (e.g., check
the page_size parameter) and raise a clear exception if page_size <= 0 or
alternatively use NULLIF(page_size,0) in the division to avoid the error; update
the function that declares page_size/total_pages (references to page_size,
total_count, total_pages, CEIL) to include this validation or the NULLIF-based
guard before computing total_pages.
| label_summary AS ( | ||
| SELECT | ||
| l.id, | ||
| CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text, | ||
| sl.upvote_count, | ||
| lu.id IS NOT NULL AS filter_upvoted_by_me, | ||
| sl.snippet AS snippet_id | ||
| FROM snippet_labels sl | ||
| JOIN labels l ON l.id = sl.label | ||
| LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id | ||
| WHERE sl.snippet IN (SELECT id FROM filtered_snippets) | ||
| ), | ||
| snippets_with_labels AS ( | ||
| SELECT | ||
| fs.*, | ||
| COALESCE(ld.labels, '[]'::jsonb) AS labels | ||
| FROM filtered_snippets fs | ||
| LEFT JOIN ( | ||
| SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels | ||
| FROM label_summary | ||
| GROUP BY snippet_id | ||
| ) ld ON fs.id = ld.snippet_id |
There was a problem hiding this comment.
Incorrect field name filter_upvoted_by_me in label JSON output.
The label_summary CTE (line 308) and JSON aggregation (line 321) use filter_upvoted_by_me, which is the name of a filter flag variable, not a descriptive name for whether the user upvoted this label. The main function file (get_snippets_function.sql) correctly uses upvoted_by_me.
This will cause the API to return labels with the field name filter_upvoted_by_me instead of the expected upvoted_by_me, potentially breaking frontend consumers.
🐛 Proposed fix to use consistent naming
label_summary AS (
SELECT
l.id,
CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text,
sl.upvote_count,
- lu.id IS NOT NULL AS filter_upvoted_by_me,
+ lu.id IS NOT NULL AS upvoted_by_me,
sl.snippet AS snippet_id
FROM snippet_labels sl
JOIN labels l ON l.id = sl.label
LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id
WHERE sl.snippet IN (SELECT id FROM filtered_snippets)
),
snippets_with_labels AS (
SELECT
fs.*,
COALESCE(ld.labels, '[]'::jsonb) AS labels
FROM filtered_snippets fs
LEFT JOIN (
- SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels
+ SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'upvoted_by_me', upvoted_by_me)) as labels
FROM label_summary
GROUP BY snippet_id
) ld ON fs.id = ld.snippet_id
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| label_summary AS ( | |
| SELECT | |
| l.id, | |
| CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text, | |
| sl.upvote_count, | |
| lu.id IS NOT NULL AS filter_upvoted_by_me, | |
| sl.snippet AS snippet_id | |
| FROM snippet_labels sl | |
| JOIN labels l ON l.id = sl.label | |
| LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id | |
| WHERE sl.snippet IN (SELECT id FROM filtered_snippets) | |
| ), | |
| snippets_with_labels AS ( | |
| SELECT | |
| fs.*, | |
| COALESCE(ld.labels, '[]'::jsonb) AS labels | |
| FROM filtered_snippets fs | |
| LEFT JOIN ( | |
| SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels | |
| FROM label_summary | |
| GROUP BY snippet_id | |
| ) ld ON fs.id = ld.snippet_id | |
| label_summary AS ( | |
| SELECT | |
| l.id, | |
| CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text, | |
| sl.upvote_count, | |
| lu.id IS NOT NULL AS upvoted_by_me, | |
| sl.snippet AS snippet_id | |
| FROM snippet_labels sl | |
| JOIN labels l ON l.id = sl.label | |
| LEFT JOIN label_upvotes lu ON lu.snippet_label = sl.id AND lu.upvoted_by = current_user_id | |
| WHERE sl.snippet IN (SELECT id FROM filtered_snippets) | |
| ), | |
| snippets_with_labels AS ( | |
| SELECT | |
| fs.*, | |
| COALESCE(ld.labels, '[]'::jsonb) AS labels | |
| FROM filtered_snippets fs | |
| LEFT JOIN ( | |
| SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'upvoted_by_me', upvoted_by_me)) as labels | |
| FROM label_summary | |
| GROUP BY snippet_id | |
| ) ld ON fs.id = ld.snippet_id |
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_get_snippets_filters.sql` around lines
303 - 324, The label JSON uses the wrong field name: change the boolean alias
and JSON key from filter_upvoted_by_me to upvoted_by_me; specifically update the
SELECT in the label_summary CTE (the expression "lu.id IS NOT NULL AS
filter_upvoted_by_me") to alias as upvoted_by_me, and update the
jsonb_build_object call inside the labels aggregation (currently building
'filter_upvoted_by_me') to use 'upvoted_by_me' so the label JSON produced by
snippets_with_labels matches get_snippets_function.sql and frontend
expectations.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c74cc4c67e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| LEFT JOIN ( | ||
| SELECT snippet_id, jsonb_agg(jsonb_build_object('id', id, 'text', text, 'upvote_count', upvote_count, 'filter_upvoted_by_me', filter_upvoted_by_me)) as labels | ||
| FROM label_summary |
There was a problem hiding this comment.
Preserve label field name
upvoted_by_me
The migration changes the labels JSON to emit filter_upvoted_by_me instead of the previously returned upvoted_by_me. Any client code that reads labels[].upvoted_by_me will now see the field missing and treat all labels as not upvoted by the user. This is a breaking API change that doesn’t appear to be accompanied by client updates in this commit.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR optimizes the get_snippets database function to fix timeout issues when filtering by starred/labeled/upvoted status. The optimization replaces slow EXISTS subqueries with JOIN-based queries using pre-filtered CTEs, reducing query execution time from >30 seconds (timeout) to <1 second for the affected filters.
Changes:
- Replaced EXISTS subqueries with LEFT JOIN pattern using pre-filtered CTEs for starred/labeled/upvoted filters
- Added filter detection boolean flags to enable conditional CTE execution
- Renamed upvoted filter variables to avoid column name conflicts in the label_summary CTE
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| supabase/migrations/20260129_optimize_get_snippets_filters.sql | Migration that applies the optimized function to production database |
| supabase/database/sql/get_snippets_function.sql | Source SQL file that should mirror the migration but has critical inconsistency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| END AS text, | ||
| CASE WHEN p_language = 'spanish' THEN l.text_spanish ELSE l.text END AS text, | ||
| sl.upvote_count, | ||
| lu.id IS NOT NULL AS upvoted_by_me, |
There was a problem hiding this comment.
The column name in the label_summary CTE is inconsistent between the source SQL file and the migration file. The migration file correctly uses 'filter_upvoted_by_me' (line 308, 321), while the source file uses 'upvoted_by_me' (line 311, 324). This means the source SQL file has not been updated to match the migration that was already applied to production.
The PR description mentions that this rename was necessary "to avoid column reference ambiguity with the label_summary CTE", so the source file should use the same 'filter_upvoted_by_me' naming as the migration.
There was a problem hiding this comment.
Code Review
This pull request significantly improves query performance by optimizing the get_snippets function, replacing inefficient EXISTS subqueries with JOINs on pre-filtered CTEs. However, a critical security issue was identified: the use of SECURITY DEFINER without an explicit search_path could lead to search path hijacking. This should be remediated by adding a SET search_path clause to the function definition in both the migration and the source SQL file. Additionally, further enhancements are needed to address code duplication in result counting and fetching logic, simplify boolean logic within new filter CTEs, investigate a potential regression where dislike_count is hardcoded to 0, and improve clarity for a variable alias in the migration script. Addressing these points will make the new function more maintainable, robust, and secure.
| filter_upvoted_by_me := has_upvoted_filter AND p_filter->'upvotedBy' ? 'by_me'; | ||
| filter_upvoted_by_others := has_upvoted_filter AND p_filter->'upvotedBy' ? 'by_others'; | ||
|
|
||
| -- Get count using optimized query with JOINs instead of EXISTS |
There was a problem hiding this comment.
There is significant code duplication between the query to get total_count (lines 72-171) and the query to fetch the paginated data (lines 173-355). Both parts define the same CTEs (starred_snippet_ids, labeled_snippet_ids, upvoted_snippet_ids) and have nearly identical filtering logic in the WHERE clause.
This duplication makes the function difficult to maintain, as any change to the filtering logic must be applied in two places.
Consider refactoring to define the filtered set of snippets once in a CTE, and then reuse that CTE for both counting and pagination. This would look something like this:
WITH
-- ... your filter ID CTEs (starred_snippet_ids, etc.) ...
filtered_snippets_base AS (
SELECT s.* -- select all columns needed for ordering and final output
FROM snippets s
-- ... all your JOINs and WHERE clauses ...
),
count_cte AS (
SELECT COUNT(*) as total FROM filtered_snippets_base
),
paginated_snippets AS (
SELECT *
FROM filtered_snippets_base
ORDER BY ...
LIMIT page_size
OFFSET page * page_size
)
-- Then use these CTEs to get the count and build the final result set.This approach would make the function more DRY (Don't Repeat Yourself) and easier to manage.
| page_size INTEGER, | ||
| p_order_by text, | ||
| p_search_term text DEFAULT '' | ||
| ) RETURNS jsonb SECURITY DEFINER AS $$ |
There was a problem hiding this comment.
The function get_snippets is defined with SECURITY DEFINER but does not set a search_path. This is a security risk in PostgreSQL because it allows the function to be hijacked if an attacker can create objects in a schema that is part of the caller's search_path. For example, an attacker could create a malicious auth.uid() function in their own schema to impersonate other users or bypass security checks.
Recommendation: Add SET search_path = public, extensions, auth; (or the appropriate schemas) to the function definition. This should also be applied to the source file supabase/database/sql/get_snippets_function.sql.
) RETURNS jsonb SECURITY DEFINER SET search_path = public, extensions, auth AS $$| WHERE has_starred_filter AND ( | ||
| (starred_by_me AND starred_by_others) OR | ||
| (starred_by_me AND NOT starred_by_others AND uss."user" = current_user_id) OR | ||
| (starred_by_others AND NOT starred_by_me AND uss."user" != current_user_id) | ||
| ) |
There was a problem hiding this comment.
The boolean logic in this WHERE clause is correct but can be simplified for better readability and maintainability. The same simplification can be applied to the labeled_snippet_ids and upvoted_snippet_ids CTEs.
A simpler equivalent is:
WHERE has_starred_filter AND (
(starred_by_me AND uss."user" = current_user_id) OR
(starred_by_others AND uss."user" != current_user_id)
)This revised logic is more concise and easier to understand at a glance.
WHERE has_starred_filter AND (
(starred_by_me AND uss."user" = current_user_id) OR
(starred_by_others AND uss."user" != current_user_id)
)Additional performance optimizations identified via pg_stat_statements:
1. get_filtering_options: 3.7s avg -> <10ms
- Root cause: DISTINCT queries on 1M+ row audio_files table
- Solution: Created materialized view (filter_options_cache) to cache
states, sources, and languages
- Added refresh_filter_options_cache() function for updates
2. get_trending_topics: optimized hidden snippets check
- Changed NOT EXISTS to LEFT JOIN pattern (same as get_snippets fix)
- Added index on user_hide_snippets.snippet
Performance summary for all optimizations in this PR:
- starredBy filter: timeout (>30s) -> <1s
- labeledBy filter: timeout (>30s) -> <1s
- upvotedBy filter: 6.7s -> <1s
- get_filtering_options: 3.7s -> <10ms
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 34a19c3 in 34 seconds. Click for details.
- Reviewed
436lines of code in2files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_dWc5kEW3pjvrcJ4j
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Additional Optimizations AddedIn the second commit, I found and fixed additional slow functions using
|
| Function/Filter | Before | After |
|---|---|---|
| starredBy filter | timeout (>30s) | <1s |
| labeledBy filter | timeout (>30s) | <1s |
| upvotedBy filter | 6.7s | <1s |
| get_filtering_options | 3.7s avg | <10ms |
| get_trending_topics | 3.3s avg | improved |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@supabase/database/sql/get_filtering_options_function.sql`:
- Around line 60-88: The three SELECTs that read from filter_options_cache
(assigning INTO states, sources, languages) must reference the view with an
explicit schema to avoid search_path hijacking in this SECURITY DEFINER
function; update the three FROM clauses that currently use filter_options_cache
to use public.filter_options_cache so these queries (the ones building jsonb_agg
into the variables states, sources, and languages) consistently match other
schema-qualified references like public.labels and public.audio_files.
In `@supabase/migrations/20260129_optimize_filter_functions.sql`:
- Around line 85-91: Add input validation for p_label_page_size to guard against
zero or negative values: at the start of the routine (before calculating
total_pages or using LIMIT) check if p_label_page_size <= 0 and RAISE EXCEPTION
with a clear message (e.g., "p_label_page_size must be positive");
alternatively, normalize the value using GREATEST(p_label_page_size, 1) where
total_pages is computed and where LIMIT is applied so division by zero and
unexpected LIMIT behavior are prevented. Reference p_label_page_size,
total_labels/total_pages and the later LIMIT usage to ensure both the
CEIL(total_labels::FLOAT / p_label_page_size) calculation and the query LIMIT
are protected.
- Around line 214-227: The hidden_snippets CTE is currently returning snippets
hidden by any user; restrict it to the current user by filtering
user_hide_snippets with the current_user_id (same approach used for
user_star_snippets and user_like_snippets). Update the hidden_snippets CTE (or
the LEFT JOIN to it in filtered_snippets) to include AND
user_hide_snippets."user" = current_user_id so only snippets hidden by the
requesting user are excluded from filtered_snippets.
- Around line 64-68: The SECURITY DEFINER functions get_filtering_options and
get_trending_topics use unqualified object names and must lock down their
search_path to avoid hijacking; update each CREATE OR REPLACE FUNCTION header to
include "SET search_path = public, pg_catalog" (e.g., change the function header
so the SECURITY DEFINER declaration includes the SET search_path clause) so the
function runs with a fixed search_path and still references the same unqualified
names (filter_options_cache, auth.uid(), snippets, audio_files,
user_hide_snippets, snippet_labels, labels).
🧹 Nitpick comments (1)
supabase/migrations/20260129_optimize_filter_functions.sql (1)
55-60: ConsiderREFRESH ... CONCURRENTLYto avoid blocking reads.
REFRESH MATERIALIZED VIEWtakes an exclusive lock; if refresh can overlap with reads, concurrent refresh keeps reads flowing. This requires a unique index that matches row uniqueness.♻️ Possible adjustment (if uniqueness is guaranteed)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_filter_options_cache_unique + ON filter_options_cache(option_type, value, label, secondary_value); @@ - REFRESH MATERIALIZED VIEW filter_options_cache; + REFRESH MATERIALIZED VIEW CONCURRENTLY filter_options_cache;
| -- Fetch states from cached view (fast!) | ||
| SELECT jsonb_agg( | ||
| jsonb_build_object( | ||
| 'label', location_state, | ||
| 'value', location_state | ||
| 'label', label, | ||
| 'value', value | ||
| ) | ||
| ) INTO states | ||
| FROM unique_states; | ||
| FROM filter_options_cache | ||
| WHERE option_type = 'states'; | ||
|
|
||
| -- Fetch unique radio station codes from the audio_files table | ||
| WITH unique_sources AS ( | ||
| SELECT DISTINCT | ||
| radio_station_code, | ||
| radio_station_name | ||
| FROM public.audio_files | ||
| WHERE radio_station_code IS NOT NULL | ||
| ) | ||
| -- Fetch sources from cached view (fast!) | ||
| SELECT jsonb_agg( | ||
| jsonb_build_object( | ||
| 'label', CASE | ||
| WHEN radio_station_name IS NOT NULL | ||
| THEN radio_station_name || ' - ' || radio_station_code | ||
| ELSE radio_station_code | ||
| END, | ||
| 'value', radio_station_code | ||
| 'label', label, | ||
| 'value', value | ||
| ) | ||
| ) INTO sources | ||
| FROM unique_sources; | ||
| FROM filter_options_cache | ||
| WHERE option_type = 'sources'; | ||
|
|
||
| -- Fetch unique primary languages from the snippets table | ||
| WITH unique_languages AS ( | ||
| SELECT DISTINCT language->>'primary_language' AS primary_language | ||
| FROM public.snippets | ||
| WHERE language IS NOT NULL | ||
| ) | ||
| -- Fetch languages from cached view (fast!) | ||
| SELECT jsonb_agg( | ||
| jsonb_build_object( | ||
| 'label', primary_language, | ||
| 'value', primary_language | ||
| 'label', label, | ||
| 'value', value | ||
| ) | ||
| ) INTO languages | ||
| FROM unique_languages; | ||
| FROM filter_options_cache | ||
| WHERE option_type = 'languages'; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the SQL file
find . -name "get_filtering_options_function.sql" -type f | head -20Repository: PublicDataWorks/verdad
Length of output: 124
🏁 Script executed:
# Let's also look for the file using git
git ls-files | grep -i "get_filtering_options"Repository: PublicDataWorks/verdad
Length of output: 122
🏁 Script executed:
cat -n supabase/database/sql/get_filtering_options_function.sqlRepository: PublicDataWorks/verdad
Length of output: 3844
🏁 Script executed:
# Find where filter_options_cache is defined
git ls-files | xargs grep -l "filter_options_cache" | head -20Repository: PublicDataWorks/verdad
Length of output: 181
🏁 Script executed:
# Search for the CREATE statement of filter_options_cache
rg "CREATE.*filter_options_cache" -A 5Repository: PublicDataWorks/verdad
Length of output: 1477
🏁 Script executed:
cat -n supabase/migrations/20260129_optimize_filter_functions.sql | head -50Repository: PublicDataWorks/verdad
Length of output: 1933
🏁 Script executed:
# Check if filter_options_cache is explicitly created with a schema prefix
rg "CREATE.*VIEW.*filter_options_cache" -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 550
🏁 Script executed:
# Check for any explicit schema prefixes in the migration file
cat supabase/migrations/20260129_optimize_filter_functions.sql | grep -E "(filter_options_cache|SET search_path)"Repository: PublicDataWorks/verdad
Length of output: 423
🏁 Script executed:
# Check if there's a SET search_path clause in the function
cat -n supabase/database/sql/get_filtering_options_function.sql | grep -i "search_path"Repository: PublicDataWorks/verdad
Length of output: 48
🏁 Script executed:
# Check the full migration file to see if there's a schema-qualified creation
cat supabase/migrations/20260129_optimize_filter_functions.sql | head -100Repository: PublicDataWorks/verdad
Length of output: 3050
Schema‑qualify filter_options_cache to public.filter_options_cache in this SECURITY DEFINER function.
The function inconsistently qualifies schema names: it explicitly uses public.labels and public.audio_files elsewhere but leaves filter_options_cache unqualified on lines 67, 77, and 87. In SECURITY DEFINER functions, unqualified object names can be hijacked through search_path manipulation (including pg_temp). Qualify the view consistently to prevent that risk.
Proposed fix
- FROM filter_options_cache
+ FROM public.filter_options_cache
WHERE option_type = 'states';
- FROM filter_options_cache
+ FROM public.filter_options_cache
WHERE option_type = 'sources';
- FROM filter_options_cache
+ FROM public.filter_options_cache
WHERE option_type = 'languages';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -- Fetch states from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', location_state, | |
| 'value', location_state | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO states | |
| FROM unique_states; | |
| FROM filter_options_cache | |
| WHERE option_type = 'states'; | |
| -- Fetch unique radio station codes from the audio_files table | |
| WITH unique_sources AS ( | |
| SELECT DISTINCT | |
| radio_station_code, | |
| radio_station_name | |
| FROM public.audio_files | |
| WHERE radio_station_code IS NOT NULL | |
| ) | |
| -- Fetch sources from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', CASE | |
| WHEN radio_station_name IS NOT NULL | |
| THEN radio_station_name || ' - ' || radio_station_code | |
| ELSE radio_station_code | |
| END, | |
| 'value', radio_station_code | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO sources | |
| FROM unique_sources; | |
| FROM filter_options_cache | |
| WHERE option_type = 'sources'; | |
| -- Fetch unique primary languages from the snippets table | |
| WITH unique_languages AS ( | |
| SELECT DISTINCT language->>'primary_language' AS primary_language | |
| FROM public.snippets | |
| WHERE language IS NOT NULL | |
| ) | |
| -- Fetch languages from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', primary_language, | |
| 'value', primary_language | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO languages | |
| FROM unique_languages; | |
| FROM filter_options_cache | |
| WHERE option_type = 'languages'; | |
| -- Fetch states from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO states | |
| FROM public.filter_options_cache | |
| WHERE option_type = 'states'; | |
| -- Fetch sources from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO sources | |
| FROM public.filter_options_cache | |
| WHERE option_type = 'sources'; | |
| -- Fetch languages from cached view (fast!) | |
| SELECT jsonb_agg( | |
| jsonb_build_object( | |
| 'label', label, | |
| 'value', value | |
| ) | |
| ) INTO languages | |
| FROM public.filter_options_cache | |
| WHERE option_type = 'languages'; |
🤖 Prompt for AI Agents
In `@supabase/database/sql/get_filtering_options_function.sql` around lines 60 -
88, The three SELECTs that read from filter_options_cache (assigning INTO
states, sources, languages) must reference the view with an explicit schema to
avoid search_path hijacking in this SECURITY DEFINER function; update the three
FROM clauses that currently use filter_options_cache to use
public.filter_options_cache so these queries (the ones building jsonb_agg into
the variables states, sources, and languages) consistently match other
schema-qualified references like public.labels and public.audio_files.
| CREATE OR REPLACE FUNCTION get_filtering_options ( | ||
| p_language TEXT DEFAULT 'english', | ||
| p_label_page INT DEFAULT 0, | ||
| p_label_page_size INT DEFAULT 5 | ||
| ) RETURNS jsonb SECURITY DEFINER AS $$ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "*20260129_optimize_filter_functions.sql" -o -name "*optimize_filter*" | head -20Repository: PublicDataWorks/verdad
Length of output: 126
🏁 Script executed:
wc -l ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 130
🏁 Script executed:
sed -n '60,75p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 477
🏁 Script executed:
sed -n '168,180p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 326
🏁 Script executed:
sed -n '160,180p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 589
🏁 Script executed:
sed -n '64,150p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 2389
🏁 Script executed:
sed -n '171,250p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 2944
🏁 Script executed:
grep -n "SET search_path\|SECURITY DEFINER" ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 128
🏁 Script executed:
sed -n '64,175p' ./supabase/migrations/20260129_optimize_filter_functions.sql | grep -A 5 "SECURITY DEFINER"Repository: PublicDataWorks/verdad
Length of output: 226
🏁 Script executed:
sed -n '1,50p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 1583
🏁 Script executed:
sed -n '100,120p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 637
🏁 Script executed:
sed -n '115,135p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 561
🏁 Script executed:
grep -n "CREATE OR REPLACE FUNCTION" ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 232
🏁 Script executed:
sed -n '56,160p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 3037
🏁 Script executed:
sed -n '165,340p' ./supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 6574
Lock down search_path for both SECURITY DEFINER functions.
These functions reference unqualified objects; with SECURITY DEFINER, that's vulnerable to search_path hijacking (including pg_temp). Add SET search_path = public, pg_catalog to both function definitions.
Applies to:
get_filtering_options()(line 68): references unqualifiedfilter_options_cacheandauth.uid()get_trending_topics()(line 173): heavily uses unqualified table references (snippets,audio_files,user_hide_snippets,snippet_labels,labels)
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_filter_functions.sql` around lines 64 -
68, The SECURITY DEFINER functions get_filtering_options and get_trending_topics
use unqualified object names and must lock down their search_path to avoid
hijacking; update each CREATE OR REPLACE FUNCTION header to include "SET
search_path = public, pg_catalog" (e.g., change the function header so the
SECURITY DEFINER declaration includes the SET search_path clause) so the
function runs with a fixed search_path and still references the same unqualified
names (filter_options_cache, auth.uid(), snippets, audio_files,
user_hide_snippets, snippet_labels, labels).
| -- Fetch total number of labels | ||
| SELECT COUNT(*) INTO total_labels | ||
| FROM public.labels; | ||
|
|
||
| -- Calculate total pages | ||
| total_pages := CEIL(total_labels::FLOAT / p_label_page_size); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "*20260129*optimize_filter_functions.sql" -o -name "*.sql" | head -20Repository: PublicDataWorks/verdad
Length of output: 1182
🏁 Script executed:
cd supabase/migrations 2>/dev/null && ls -la | grep -E "20260129|optimize" || echo "Directory not found, searching from root"Repository: PublicDataWorks/verdad
Length of output: 247
🏁 Script executed:
find . -type f -name "*.sql" | grep -i "filter\|label" | head -20Repository: PublicDataWorks/verdad
Length of output: 588
🏁 Script executed:
cat -n supabase/migrations/20260129_optimize_filter_functions.sql | head -120Repository: PublicDataWorks/verdad
Length of output: 4452
🏁 Script executed:
cat -n supabase/migrations/20260129_optimize_filter_functions.sql | sed -n '85,150p'Repository: PublicDataWorks/verdad
Length of output: 2286
🏁 Script executed:
grep -n "p_label_page_size" supabase/migrations/20260129_optimize_filter_functions.sqlRepository: PublicDataWorks/verdad
Length of output: 291
🏁 Script executed:
grep -r "p_label_page_size" supabase/ --include="*.sql"Repository: PublicDataWorks/verdad
Length of output: 953
🏁 Script executed:
cat -n supabase/migrations/20260129_optimize_filter_functions.sql | sed -n '64,84p'Repository: PublicDataWorks/verdad
Length of output: 777
🏁 Script executed:
cat -n supabase/database/sql/get_filtering_options_function.sql | head -100Repository: PublicDataWorks/verdad
Length of output: 3747
Guard against p_label_page_size <= 0 (division by zero).
The parameter accepts any INT value including zero or negative. At line 90, CEIL(total_labels::FLOAT / p_label_page_size) will fail with division by zero error. Additionally, the LIMIT clause at line 106 would also behave unexpectedly with invalid page sizes. Add input validation:
Proposed guard
current_user_id := auth.uid();
IF current_user_id IS NULL THEN
RAISE EXCEPTION 'Only logged-in users can call this function';
END IF;
+
+ IF p_label_page_size <= 0 THEN
+ RAISE EXCEPTION 'p_label_page_size must be > 0';
+ END IF;🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_filter_functions.sql` around lines 85 -
91, Add input validation for p_label_page_size to guard against zero or negative
values: at the start of the routine (before calculating total_pages or using
LIMIT) check if p_label_page_size <= 0 and RAISE EXCEPTION with a clear message
(e.g., "p_label_page_size must be positive"); alternatively, normalize the value
using GREATEST(p_label_page_size, 1) where total_pages is computed and where
LIMIT is applied so division by zero and unexpected LIMIT behavior are
prevented. Reference p_label_page_size, total_labels/total_pages and the later
LIMIT usage to ensure both the CEIL(total_labels::FLOAT / p_label_page_size)
calculation and the query LIMIT are protected.
Additional optimizations identified via pg_stat_statements and Supabase Database Advisor: 1. get_recording_filter_options: 1.2s avg → ~410ms (65% improvement) - Uses filter_options_cache for states/sources/languages - Labels subquery still requires JOIN but is acceptable 2. Added indexes for unindexed foreign keys: - idx_comment_reactions_comment_id - idx_comments_room_id - idx_snippets_stage_1_llm_response - idx_user_roles_role Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 8aa07ac in 30 seconds. Click for details.
- Reviewed
75lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_yz8meOipEINCubtZ
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql`:
- Line 46: The function is declared with "LANGUAGE plpgsql SECURITY DEFINER" but
lacks an explicit search_path, which allows an attacker to shadow objects like
filter_options_cache and labels; update the function body (the block ending with
"$$ LANGUAGE plpgsql SECURITY DEFINER;") to set a safe search_path at the start
(for example add "PERFORM set_config('search_path','public,pg_temp', true);" or
"SET LOCAL search_path = public, pg_temp;" as the first statements) so all
subsequent references (filter_options_cache, labels, etc.) resolve to the
intended schemas before any privileged operations run.
- Around line 29-42: The jsonb_agg of label objects may not respect the ORDER BY
inside the inner derived table, so change the aggregation to include ordering:
move or add ORDER BY l.text inside the jsonb_agg call that builds the 'labels'
field (i.e., ensure jsonb_agg(jsonb_build_object(... ) ORDER BY l.text) is used
on the SELECT that references alias l), and similarly add an ORDER BY inside the
jsonb_agg for the radio_stations aggregation to guarantee stable ordering;
update the SELECT that builds these aggregates (the labels aggregation and the
radio_stations aggregation) to use jsonb_agg(... ORDER BY <column>) referencing
l.text and the radio station sort key respectively.
🧹 Nitpick comments (1)
supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql (1)
33-41: Labels query doesn't use cache—verify this is intentional.States, radio_stations, and languages are fetched from the
filter_options_cachematerialized view, butlabelsstill performs a 3-way JOIN on live tables. If labels change frequently, this may be intentional. However, if this query is still slow, consider adding labels to the cache or pre-computing them.Also, the
LIMIT 100lacks a deterministic ordering tie-breaker—if multiple labels share the sametext, which 100 are returned is non-deterministic.
| ) INTO result; | ||
| RETURN result; | ||
| END; | ||
| $$ LANGUAGE plpgsql SECURITY DEFINER; |
There was a problem hiding this comment.
SECURITY DEFINER without SET search_path is a security risk.
Functions marked SECURITY DEFINER run with the privileges of the function owner. Without explicitly setting search_path, an attacker could manipulate the search path to substitute malicious objects for filter_options_cache, labels, etc.
🔒 Proposed fix
END;
-$$ LANGUAGE plpgsql SECURITY DEFINER;
+$$ LANGUAGE plpgsql SECURITY DEFINER SET search_path = public;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| $$ LANGUAGE plpgsql SECURITY DEFINER; | |
| $$ LANGUAGE plpgsql SECURITY DEFINER SET search_path = public; |
🤖 Prompt for AI Agents
In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql` at
line 46, The function is declared with "LANGUAGE plpgsql SECURITY DEFINER" but
lacks an explicit search_path, which allows an attacker to shadow objects like
filter_options_cache and labels; update the function body (the block ending with
"$$ LANGUAGE plpgsql SECURITY DEFINER;") to set a safe search_path at the start
(for example add "PERFORM set_config('search_path','public,pg_temp', true);" or
"SET LOCAL search_path = public, pg_temp;" as the first statements) so all
subsequent references (filter_options_cache, labels, etc.) resolve to the
intended schemas before any privileged operations run.
Additional optimizations: 1. Disabled completed TSV backfill cron job (was running every minute doing nothing) 2. Optimized fetch_a_snippet_that_has_no_embedding: 373ms → 98ms (74% faster) 3. Added idx_snippets_processed_recorded_at partial index 4. Removed 7 unused indexes saving ~17MB storage: - idx_audio_files_radio_station (9.5MB) - idx_snippets_user_last_activity (2MB) - idx_snippets_comment_count (2MB) - idx_snippets_upvote_count (2MB) - idx_snippets_like_count (2MB) - user_hide_snippets_user_idx (16KB) - idx_label_upvotes_snippet_label_upvoted_by (16KB) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 05ecffa in 39 seconds. Click for details.
- Reviewed
38lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_9HhY0diaiyx9xic4
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
The following indexes were incorrectly removed - they ARE used: - idx_snippets_comment_count (p_order_by = 'comments') - idx_snippets_upvote_count (p_order_by = 'upvotes') - idx_snippets_like_count (p_order_by = 'upvotes') - idx_snippets_user_last_activity (p_order_by = 'activities') They showed 0 scans because users may rarely use these sort options, but they should be kept for when they do. Only safe removals (~9.5MB saved): - idx_audio_files_radio_station (uses cache now) - user_hide_snippets_user_idx (use snippet index instead) - idx_label_upvotes_snippet_label_upvoted_by (unused composite) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 44cb67a in 34 seconds. Click for details.
- Reviewed
53lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_30lO2tNMgtAMPK4V
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Apply same CTE+JOIN optimization used for starred/labeled/upvoted filters to state and source filters. This avoids IN subqueries that scan all 19k+ snippets before filtering by audio_files.location_state. Expected improvement: ~134ms -> <50ms for state filter queries Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed c875082 in 44 seconds. Click for details.
- Reviewed
576lines of code in2files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_bFoCVdn23MIc5qY7
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql`:
- Line 380: Add input validation to guard against division-by-zero: immediately
after the authentication check (around where the function starts), check the
page_size parameter and raise an exception if page_size <= 0 so the later
computation total_pages := CEIL(total_count::FLOAT / page_size) cannot divide by
zero; reference the page_size parameter and the total_pages assignment when
applying this change.
- Around line 108-119: The migration references a non-existent index
idx_audio_files_location_state used by the state_filtered_audio_ids CTE to speed
up filtering; add a CREATE INDEX for idx_audio_files_location_state on
audio_files(location_state) before the CTE (or, if the optimization is not
intended, update the comment to remove the reference), ensuring the index name
exactly matches idx_audio_files_location_state and targets the audio_files
table's location_state column so the state filter can use it.
- Around line 88-106: The labeled_snippet_ids CTE is incorrectly using the
upvoter column (lu.upvoted_by) to decide who "labeled" a snippet; change those
predicates to use the label creator column sl.applied_by instead. Specifically,
in the labeled_snippet_ids CTE replace conditions that check lu.upvoted_by =
current_user_id and lu.upvoted_by != current_user_id with sl.applied_by =
current_user_id and sl.applied_by != current_user_id respectively, and apply the
same replacement in the second occurrence of the labeled_snippet_ids CTE
elsewhere in the diff.
🧹 Nitpick comments (2)
supabase/migrations/20260130_optimize_state_source_filters.sql (1)
186-227: Consider consolidating duplicated CTEs to reduce maintenance burden.The five filter CTEs (
starred_snippet_ids,labeled_snippet_ids,upvoted_snippet_ids,state_filtered_audio_ids,source_filtered_audio_ids) are duplicated verbatim between the count query (lines 79-119) and the data query (lines 187-227). This creates ~80 lines of duplication and increases the risk of divergence during future modifications.While PostgreSQL doesn't share CTEs across separate statements, consider refactoring to a single
WITHblock that computes both count and paginated data:WITH -- filter CTEs defined once ..., base_filtered AS (...), count_result AS (SELECT COUNT(*) FROM base_filtered), paginated AS (SELECT * FROM base_filtered ORDER BY ... LIMIT ... OFFSET ...) SELECT ...This is optional—the current approach may yield better query plans in some cases.
supabase/database/sql/get_snippets_function.sql (1)
1-22: Ensure migration and canonical function files stay synchronized.This file (
get_snippets_function.sql) and the migration (20260130_optimize_state_source_filters.sql) both defineget_snippets. Any future changes must be applied to both files to avoid drift. Consider documenting this relationship or using a build process that generates the migration from the canonical file.
| labeled_snippet_ids AS ( | ||
| SELECT DISTINCT sl.snippet | ||
| FROM snippet_labels sl | ||
| JOIN label_upvotes lu ON lu.snippet_label = sl.id | ||
| WHERE has_labeled_filter AND ( | ||
| (labeled_by_me AND labeled_by_others) OR | ||
| (labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR | ||
| (labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id) | ||
| ) | ||
| ), | ||
| upvoted_snippet_ids AS ( | ||
| SELECT DISTINCT sl.snippet | ||
| FROM snippet_labels sl | ||
| JOIN label_upvotes lu ON lu.snippet_label = sl.id | ||
| WHERE has_upvoted_filter AND ( | ||
| (filter_upvoted_by_me AND filter_upvoted_by_others) OR | ||
| (filter_upvoted_by_me AND NOT filter_upvoted_by_others AND lu.upvoted_by = current_user_id) OR | ||
| (filter_upvoted_by_others AND NOT filter_upvoted_by_me AND lu.upvoted_by != current_user_id) | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Get the migration file name and read it
echo "=== Reading the migration file ==="
fd "20260130_optimize_state_source_filters.sql" -x cat -n
echo ""
echo "=== Searching for snippet_labels table definition ==="
rg -n "CREATE TABLE.*snippet_labels" --type sql -A 25
echo ""
echo "=== Searching for label_upvotes table definition ==="
rg -n "CREATE TABLE.*label_upvotes" --type sql -A 20Repository: PublicDataWorks/verdad
Length of output: 20613
🏁 Script executed:
#!/bin/bash
echo "=== Searching for all migration files that might define snippet_labels ==="
fd "\.sql$" -x grep -l "snippet_labels" {} \;
echo ""
echo "=== Looking for snippet_labels and label_upvotes table definitions ==="
rg "snippet_labels|label_upvotes" --type sql -B 2 | head -100
echo ""
echo "=== Check if snippet_labels has a created_by or user column ==="
rg "created_by|user.*snippet_labels" --type sql -C 3Repository: PublicDataWorks/verdad
Length of output: 15097
🏁 Script executed:
#!/bin/bash
echo "=== Complete snippet_labels table definition ==="
rg "CREATE TABLE.*snippet_labels" --type sql -A 30
echo ""
echo "=== Complete label_upvotes table definition ==="
rg "CREATE TABLE.*label_upvotes" --type sql -A 30
echo ""
echo "=== Check snippet_labels columns ==="
rg "snippet_labels" --type sql | grep -E "created_by|upvoted_by|applied_by" | head -20Repository: PublicDataWorks/verdad
Length of output: 827
🏁 Script executed:
#!/bin/bash
echo "=== Full snippet_labels table definition from remote_schema ==="
rg "CREATE TABLE.*snippet_labels" --type sql -A 50 supabase/migrations/20241029135348_remote_schema.sql
echo ""
echo "=== Full label_upvotes table definition from remote_schema ==="
rg "CREATE TABLE.*label_upvotes" --type sql -A 50 supabase/migrations/20241029135348_remote_schema.sql
echo ""
echo "=== Search for 'applied_by' to understand snippet_labels structure ==="
rg "applied_by" --type sql -B 5 -A 2 supabase/migrations/20241029135348_remote_schema.sql | head -60Repository: PublicDataWorks/verdad
Length of output: 1890
🏁 Script executed:
#!/bin/bash
echo "=== Search for label_upvotes definition with broader pattern ==="
rg "label_upvotes" --type sql supabase/migrations/20241029135348_remote_schema.sql -B 5 -A 10 | head -80
echo ""
echo "=== Search for CREATE TABLE around line 2000-3000 ==="
cat supabase/migrations/20241029135348_remote_schema.sql | grep -n "label_upvotes" | head -20Repository: PublicDataWorks/verdad
Length of output: 5662
Fix labeled_snippet_ids CTE to filter by label creator, not upvoter.
The labeled_snippet_ids CTE (lines 88-97) currently filters by lu.upvoted_by, which checks who upvoted the label. To match the semantic intent of "labeledBy", it should filter by sl.applied_by to check who applied the label to the snippet.
Change line 94 from:
(labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id)
To:
(labeled_by_me AND NOT labeled_by_others AND sl.applied_by = current_user_id) OR
(labeled_by_others AND NOT labeled_by_me AND sl.applied_by != current_user_id)
Repeat the same fix at lines 202-203 in the second occurrence of labeled_snippet_ids.
🤖 Prompt for AI Agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql` around lines
88 - 106, The labeled_snippet_ids CTE is incorrectly using the upvoter column
(lu.upvoted_by) to decide who "labeled" a snippet; change those predicates to
use the label creator column sl.applied_by instead. Specifically, in the
labeled_snippet_ids CTE replace conditions that check lu.upvoted_by =
current_user_id and lu.upvoted_by != current_user_id with sl.applied_by =
current_user_id and sl.applied_by != current_user_id respectively, and apply the
same replacement in the second occurrence of the labeled_snippet_ids CTE
elsewhere in the diff.
| ) INTO result | ||
| FROM snippets_with_labels s; | ||
|
|
||
| total_pages := CEIL(total_count::FLOAT / page_size); |
There was a problem hiding this comment.
Add validation to prevent division by zero when page_size is 0.
If a caller passes page_size = 0, line 380 will raise a division-by-zero error. Add input validation early in the function.
🛡️ Proposed fix with input validation
Add after the authentication check (around line 41):
IF page_size <= 0 THEN
RAISE EXCEPTION 'page_size must be greater than 0';
END IF;🤖 Prompt for AI Agents
In `@supabase/migrations/20260130_optimize_state_source_filters.sql` at line 380,
Add input validation to guard against division-by-zero: immediately after the
authentication check (around where the function starts), check the page_size
parameter and raise an exception if page_size <= 0 so the later computation
total_pages := CEIL(total_count::FLOAT / page_size) cannot divide by zero;
reference the page_size parameter and the total_pages assignment when applying
this change.
The update_snippet_like_count() function now tracks both likes (value=1) and dislikes (value=-1) separately, updating the dislike_count column alongside the existing like_count when user_like_snippets changes.
Removes the legacy get_recording_filter_options() function which has zero callers in the codebase and no usage in production logs. The frontend uses get_filtering_options() instead. Also adds missing foreign key indexes identified by Supabase Database Advisor to improve JOIN performance.
- Consolidate two separate queries (count + data) into a single CTE chain, eliminating duplicate filter CTEs that ran twice per call - Use lightweight filtered_snippets CTE (IDs + sort columns only) for count and pagination, then fetch heavy columns for ~20 paginated rows only - Fix dislike_count: use denormalized column instead of hardcoded 0
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql (1)
4-4: Schema-qualify the dropped function for deterministic migrations.The unqualified
DROP FUNCTIONdepends onsearch_pathand should explicitly targetpublic.get_recording_filter_options(). WhileIF EXISTSprovides safety, schema qualification ensures deterministic execution across different database configurations.🔧 Proposed change
-DROP FUNCTION IF EXISTS get_recording_filter_options(); +DROP FUNCTION IF EXISTS public.get_recording_filter_options();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql` at line 4, Replace the unqualified DROP FUNCTION with a schema-qualified drop to make migrations deterministic: change the DROP of get_recording_filter_options() to explicitly target public.get_recording_filter_options() (keeping the IF EXISTS) so the migration does not depend on search_path resolution; update the statement around the existing DROP FUNCTION IF EXISTS get_recording_filter_options() accordingly.supabase/database/sql/update_snippet_like_count.sql (1)
4-19: Consider consolidating the two COUNT subqueries into a single scan.Both subqueries scan
user_like_snippetsfor the same snippet. Using conditional aggregation reduces this to one table access per trigger invocation.♻️ Proposed optimization using conditional aggregation
UPDATE snippets - SET - like_count = ( - SELECT COUNT(*) - FROM user_like_snippets - WHERE snippet = COALESCE(NEW.snippet, OLD.snippet) - AND value = 1 - ), - dislike_count = ( - SELECT COUNT(*) - FROM user_like_snippets - WHERE snippet = COALESCE(NEW.snippet, OLD.snippet) - AND value = -1 - ), + SET + like_count = sub.like_count, + dislike_count = sub.dislike_count, user_last_activity = NOW() - WHERE id = COALESCE(NEW.snippet, OLD.snippet); + FROM ( + SELECT + COUNT(*) FILTER (WHERE value = 1) AS like_count, + COUNT(*) FILTER (WHERE value = -1) AS dislike_count + FROM user_like_snippets + WHERE snippet = COALESCE(NEW.snippet, OLD.snippet) + ) sub + WHERE id = COALESCE(NEW.snippet, OLD.snippet);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/database/sql/update_snippet_like_count.sql` around lines 4 - 19, Replace the two separate COUNT subqueries in update_snippet_like_count.sql with a single aggregated scan over user_like_snippets using conditional aggregation: compute like_count and dislike_count in one SELECT (e.g., SUM(CASE WHEN value = 1 THEN 1 ELSE 0 END) and SUM(CASE WHEN value = -1 THEN 1 ELSE 0 END)) for the snippet identified by COALESCE(NEW.snippet, OLD.snippet) and assign those results to snippets.like_count and snippets.dislike_count while still updating user_last_activity = NOW(); keep references to NEW/OLD.snippet and the snippets table unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@supabase/database/sql/get_snippets_function.sql`:
- Around line 136-143: The LEFT JOIN to user_hide_snippets (alias uhs) is
missing a per-user filter so any user's hide will hide the snippet for everyone;
modify the JOIN condition in get_snippets_function.sql to include the current
user identifier (e.g., add "AND uhs.user_id = <current_user_param>" using the
function's user param such as p_user_id or equivalent) so uhs only matches hides
by the requesting user, and apply the same change at the other occurrence noted
around line 270.
- Around line 96-115: The labeled_snippet_ids CTE is using label_upvotes
(lu.upvoted_by) instead of the label applicant field, causing "labeled by me" to
mirror "upvoted by me"; change the logic in labeled_snippet_ids to check
snippet_labels.applied_by (sl.applied_by) against current_user_id and
remove/avoid relying on lu.upvoted_by for those labeled_by_* conditions (you can
drop the join to label_upvotes or keep it only if needed for other checks), and
update the WHERE clause to use (labeled_by_me AND NOT labeled_by_others AND
sl.applied_by = current_user_id) and (labeled_by_others AND NOT labeled_by_me
AND sl.applied_by != current_user_id) while preserving has_labeled_filter and
DISTINCT sl.snippet.
---
Nitpick comments:
In `@supabase/database/sql/update_snippet_like_count.sql`:
- Around line 4-19: Replace the two separate COUNT subqueries in
update_snippet_like_count.sql with a single aggregated scan over
user_like_snippets using conditional aggregation: compute like_count and
dislike_count in one SELECT (e.g., SUM(CASE WHEN value = 1 THEN 1 ELSE 0 END)
and SUM(CASE WHEN value = -1 THEN 1 ELSE 0 END)) for the snippet identified by
COALESCE(NEW.snippet, OLD.snippet) and assign those results to
snippets.like_count and snippets.dislike_count while still updating
user_last_activity = NOW(); keep references to NEW/OLD.snippet and the snippets
table unchanged.
In `@supabase/migrations/20260129_optimize_recording_filter_and_indexes.sql`:
- Line 4: Replace the unqualified DROP FUNCTION with a schema-qualified drop to
make migrations deterministic: change the DROP of get_recording_filter_options()
to explicitly target public.get_recording_filter_options() (keeping the IF
EXISTS) so the migration does not depend on search_path resolution; update the
statement around the existing DROP FUNCTION IF EXISTS
get_recording_filter_options() accordingly.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
supabase/database/sql/get_snippets_function.sqlsupabase/database/sql/update_snippet_like_count.sqlsupabase/migrations/20260129_optimize_recording_filter_and_indexes.sql
| labeled_snippet_ids AS ( | ||
| SELECT DISTINCT sl.snippet | ||
| FROM snippet_labels sl | ||
| JOIN label_upvotes lu ON lu.snippet_label = sl.id | ||
| WHERE has_labeled_filter AND ( | ||
| (labeled_by_me AND labeled_by_others) OR | ||
| (labeled_by_me AND NOT labeled_by_others AND lu.upvoted_by = current_user_id) OR | ||
| (labeled_by_others AND NOT labeled_by_me AND lu.upvoted_by != current_user_id) | ||
| ) | ||
| ), | ||
| upvoted_snippet_ids AS ( | ||
| SELECT DISTINCT sl.snippet | ||
| FROM snippet_labels sl | ||
| JOIN label_upvotes lu ON lu.snippet_label = sl.id | ||
| WHERE has_upvoted_filter AND ( | ||
| (filter_upvoted_by_me AND filter_upvoted_by_others) OR | ||
| (filter_upvoted_by_me AND NOT filter_upvoted_by_others AND lu.upvoted_by = current_user_id) OR | ||
| (filter_upvoted_by_others AND NOT filter_upvoted_by_me AND lu.upvoted_by != current_user_id) | ||
| ) | ||
| ), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check table structures for snippet_labels and label_upvotes
echo "=== Full Database Schema (SQL files) ==="
find . -name "*.sql" -type f | head -20
echo ""
echo "=== Searching for snippet_labels table definition ==="
rg -n "CREATE TABLE.*snippet_labels" --type sql -A 25
echo ""
echo "=== Searching for label_upvotes table definition ==="
rg -n "CREATE TABLE.*label_upvotes" --type sql -A 25Repository: PublicDataWorks/verdad
Length of output: 1332
🏁 Script executed:
#!/bin/bash
echo "=== All SQL files in database directory ==="
find ./supabase/database -name "*.sql" -type f | sort
echo ""
echo "=== Search for 'snippet_labels' mentions ==="
rg "snippet_labels" --type sql -B 2 -A 2 | head -60
echo ""
echo "=== Search for 'label_upvotes' mentions ==="
rg "label_upvotes" --type sql -B 2 -A 2 | head -60
echo ""
echo "=== Check for migrations directory ==="
find ./supabase/database -type d | sortRepository: PublicDataWorks/verdad
Length of output: 13528
🏁 Script executed:
#!/bin/bash
echo "=== snippet_labels table definition ==="
rg "create table.*snippet_labels" --type sql -A 15 supabase/migrations/
echo ""
echo "=== label_upvotes table definition ==="
rg "create table.*label_upvotes" --type sql -A 15 supabase/migrations/
echo ""
echo "=== Check applied_by column in snippet_labels ==="
rg "applied_by" supabase/migrations/ --type sql -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 5524
🏁 Script executed:
#!/bin/bash
echo "=== Extract labeled_snippet_ids and upvoted_snippet_ids from get_snippets_function.sql ==="
sed -n '96,115p' supabase/database/sql/get_snippets_function.sql
echo ""
echo "=== Check how labeled_by_me and filter_upvoted_by_me are defined ==="
rg "labeled_by_me|filter_upvoted_by_me" supabase/database/sql/get_snippets_function.sql -B 2 -A 2Repository: PublicDataWorks/verdad
Length of output: 2702
Fix labeled_snippet_ids CTE to filter by label applicant, not upvote endorser.
Both labeled_snippet_ids and upvoted_snippet_ids CTEs currently use identical logic, joining snippet_labels with label_upvotes and filtering on lu.upvoted_by. The schema distinguishes between label application (snippet_labels.applied_by) and upvote endorsement (label_upvotes.upvoted_by). The labeled_snippet_ids CTE should filter snippets where the current user applied the label (sl.applied_by = current_user_id), not where they upvoted it. Currently, both filters are semantically equivalent, causing the "labeled by me" filter to behave identically to "upvoted by me."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/database/sql/get_snippets_function.sql` around lines 96 - 115, The
labeled_snippet_ids CTE is using label_upvotes (lu.upvoted_by) instead of the
label applicant field, causing "labeled by me" to mirror "upvoted by me"; change
the logic in labeled_snippet_ids to check snippet_labels.applied_by
(sl.applied_by) against current_user_id and remove/avoid relying on
lu.upvoted_by for those labeled_by_* conditions (you can drop the join to
label_upvotes or keep it only if needed for other checks), and update the WHERE
clause to use (labeled_by_me AND NOT labeled_by_others AND sl.applied_by =
current_user_id) and (labeled_by_others AND NOT labeled_by_me AND sl.applied_by
!= current_user_id) while preserving has_labeled_filter and DISTINCT sl.snippet.
Summary
Comprehensive database performance optimization fixing timeout errors and improving query efficiency across the application.
Performance Improvements
Changes Made
1. Query Pattern Optimizations
EXISTSsubqueries to CTEs + JOINs for starredBy, labeledBy, upvotedByINsubqueries to CTEs + JOINs (same pattern)NOT EXISTStoLEFT JOINwith null checkNOT INwith hash pattern2. Caching
filter_options_cachematerialized view for states/sources/languagesget_filtering_optionsandget_recording_filter_optionsto use cacherefresh_filter_options_cache()function for maintenance3. Index Improvements
Added indexes:
idx_comment_reactions_comment_ididx_comments_room_ididx_snippets_stage_1_llm_responseidx_user_roles_roleidx_user_hide_snippets_snippetidx_snippets_processed_recorded_at(partial index)Removed unused indexes (~9.5MB saved):
idx_audio_files_radio_station(9.5MB) - uses cache nowuser_hide_snippets_user_idx(16KB) - use snippet index insteadidx_label_upvotes_snippet_label_upvoted_by(16KB) - unused compositeKept sorting indexes (used by
get_snippetsORDER BY options):idx_snippets_comment_count,idx_snippets_upvote_count,idx_snippets_like_count,idx_snippets_user_last_activity4. Maintenance
Migrations Applied
20260129_optimize_get_snippets_filters.sql- CTE+JOIN pattern for starred/labeled/upvoted filters20260129_optimize_filter_functions.sql- Materialized view cache20260129_optimize_recording_filter_and_indexes.sql- FK indexes20260129_cleanup_unused_indexes.sql- Remove truly unused indexes20260130_optimize_state_source_filters.sql- CTE+JOIN pattern for state/source filtersCache Maintenance
Call
refresh_filter_options_cache()after adding new radio stations, states, or languages.Monitoring
Supabase Logs Explorer
pg_stat_statements
Test Plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Performance
Chores