[f] VER-263 - Replace temp table creation with CTE query#19
Conversation
WalkthroughRefactors the get_snippets SQL function to replace a temp-table-based approach with nested CTEs for likes/dislikes, filtering, pagination, and label aggregation. Introduces like_summary, paginated_snippets, label_summary, and paginated_snippets_with_labels, updates ordering and field wiring, and outputs aggregated JSON with labels and total_count in a single pass. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant DB as Postgres
participant FN as get_snippets()
participant LS as like_summary CTE
participant FS as filtered_snippets / paginated_snippets
participant LB as label_summary
participant PL as paginated_snippets_with_labels
Client->>DB: CALL get_snippets(params)
DB->>FN: Execute function
FN->>LS: Aggregate likes/dislikes per snippet
FN->>FS: Filter + sort + paginate snippets (with total_count)
FN->>LB: Aggregate labels for paginated snippet IDs
FN->>PL: Join paginated snippets with label aggregation
FN-->>Client: JSON result {items with labels, total_count, num_of_snippets}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a solid improvement, refactoring the get_snippets function to use Common Table Expressions (CTEs) instead of a temporary table. This enhances the query's structure and readability. My review focuses on further optimizing performance and improving maintainability. I've suggested using a LATERAL join for more efficient label data aggregation, and refactoring parts of the query to reduce code duplication and make the final JSON construction more robust. Overall, these are great changes.
| LEFT JOIN ( | ||
| SELECT | ||
| snippet_label, | ||
| COUNT(*) AS upvote_count, | ||
| BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me | ||
| FROM label_upvotes lu | ||
| GROUP BY snippet_label | ||
| ) lu ON lu.snippet_label = sl.id |
There was a problem hiding this comment.
The current implementation of the lu subquery calculates aggregates over the entire label_upvotes table for every function call, which can be inefficient if the table grows large. A more performant approach is to use a LATERAL join. This will calculate upvote counts only for the labels belonging to the snippets on the current page, significantly reducing the amount of data processed.
LEFT JOIN LATERAL (
SELECT
COUNT(*) AS upvote_count,
BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me
FROM public.label_upvotes lu
WHERE lu.snippet_label = sl.id
) lu ON TRUE| ORDER BY | ||
| CASE | ||
| WHEN p_order_by = 'upvotes' THEN ps.upvote_count + ps.like_count | ||
| WHEN p_order_by = 'comments' THEN ps.comment_count | ||
| WHEN p_order_by = 'activities' THEN | ||
| CASE | ||
| WHEN ps.user_last_activity IS NULL THEN 0 | ||
| ELSE EXTRACT(EPOCH FROM ps.user_last_activity) | ||
| END | ||
| END DESC, | ||
| ps.recorded_at DESC -- Default for all other cases, including p_order_by = 'latest' |
There was a problem hiding this comment.
This ORDER BY clause is a duplicate of the one in the paginated_snippets CTE (lines 272-282). To improve maintainability and avoid having to update sorting logic in two places, you can remove this ORDER BY clause and apply the ordering directly within the jsonb_agg function in the final SELECT statement.
For example:
SELECT
jsonb_agg(
jsonb_build_object(...)
ORDER BY
CASE ... END DESC,
ps.recorded_at DESC
),
MAX(ps.num_of_snippets)
INTO result, total_count
FROM paginated_snippets_with_labels ps;| jsonb_build_object( | ||
| 'id', ps.id, | ||
| 'recorded_at', ps.recorded_at, | ||
| 'user_last_activity', ps.user_last_activity, | ||
| 'duration', ps.duration, | ||
| 'start_time', ps.start_time, | ||
| 'end_time', ps.end_time, | ||
| 'file_path', ps.file_path, | ||
| 'file_size', ps.file_size, | ||
| 'political_leaning', ps.political_leaning, | ||
| 'title', ps.title, | ||
| 'summary', ps.summary, | ||
| 'explanation', ps.explanation, | ||
| 'confidence_scores', ps.confidence_scores, | ||
| 'language', ps.language, | ||
| 'context', ps.context, | ||
| 'labels', ps.labels, | ||
| 'audio_file', ps.audio_file, | ||
| 'starred_by_user', ps.starred_by_user, | ||
| 'user_like_status', ps.user_like_status, | ||
| 'hidden', ps.hidden, | ||
| 'like_count', ps.like_count, | ||
| 'dislike_count', ps.dislike_count | ||
| ) |
There was a problem hiding this comment.
Instead of manually constructing the JSON object by listing every column, you can use to_jsonb(ps) - 'num_of_snippets' to achieve the same result more concisely and robustly. This approach automatically includes all columns from the paginated_snippets_with_labels CTE (except for the internal num_of_snippets column), making the code easier to maintain. If you add or remove columns from the filtered_snippets CTE in the future, they will be automatically reflected in the output without needing to modify this part of the query.
to_jsonb(ps) - 'num_of_snippets'There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
supabase/database/sql/get_snippets_function.sql(4 hunks)
| MAX(ps.num_of_snippets) | ||
| INTO result, total_count |
There was a problem hiding this comment.
Coalesce total_count so empty pages don’t return NULL pagination metadata
When the filtered set is empty, MAX(ps.num_of_snippets) yields NULL, so total_count stays NULL, CEIL(total_count::FLOAT / page_size) propagates NULL, and the response surfaces num_of_snippets/total_pages as null. That breaks downstream consumers expecting integers (e.g., page counts on the first empty page). Please coalesce the aggregate to 0 before assigning.
- MAX(ps.num_of_snippets)
+ COALESCE(MAX(ps.num_of_snippets), 0)🤖 Prompt for AI Agents
In supabase/database/sql/get_snippets_function.sql around lines 366 to 367, the
MAX(ps.num_of_snippets) aggregate can return NULL for empty result sets so
total_count becomes NULL and breaks pagination; change the assignment to
coalesce the aggregate to 0 (e.g., use COALESCE(MAX(ps.num_of_snippets), 0))
before INTO result, total_count so empty pages yield 0 for total_count and
subsequent CEIL/num_of_snippets calculations return integers.
There was a problem hiding this comment.
Caution
Changes requested ❌
Reviewed everything up to 9ed3b63 in 2 minutes and 32 seconds. Click for details.
- Reviewed
209lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. supabase/database/sql/get_snippets_function.sql:32
- Draft comment:
Good job replacing the temporary table with CTEs. This refactor improves clarity and avoids side‐effects associated with temp tables. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. supabase/database/sql/get_snippets_function.sql:306
- Draft comment:
Consider refactoring the IN (SELECT …) clause in the label_summary CTE to a JOIN if possible. This may improve clarity and performance when filtering snippet labels. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The suggestion appears technically correct - using a JOIN instead of IN (SELECT...) is often more performant in SQL. The change would be straightforward and maintains the same logic. However, both approaches are valid and the performance difference may be negligible since paginated_snippets is already filtered down by LIMIT/OFFSET. The current code is also quite readable and clear in its intent. The performance benefit may be minimal since paginated_snippets is already limited in size. The current code might actually be more readable and explicit about the filtering intent. While the suggestion is technically valid, the current code is clear and the optimization may not provide meaningful benefits given the limited dataset size. This is a valid but minor optimization suggestion. The current code is clear and likely performs well enough given the context.
Workflow ID: wflow_17fiSkMHetPAeyrF
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| ), | ||
| paginated_snippets AS ( | ||
| SELECT fs.*, COUNT(*) OVER() AS num_of_snippets | ||
| FROM filtered_snippets fs |
There was a problem hiding this comment.
The ORDER BY logic is duplicated in both paginated_snippets and paginated_snippets_with_labels CTEs. Consolidating the ordering logic into one place would reduce maintenance overhead and potential inconsistencies.
| LEFT JOIN ( | ||
| SELECT | ||
| snippet_id, | ||
| jsonb_agg( |
There was a problem hiding this comment.
If the order of labels within each snippet is important, consider adding an explicit ORDER BY inside the jsonb_agg call. Without it, the aggregated JSON array order may be unpredictable.
Important
Replaces temporary table with CTEs in
get_snippets_function.sqlto improve query structure and performance.filtered_snippetswith CTEs inget_snippets_function.sql.like_summary,filtered_snippets,paginated_snippets,label_summary, andpaginated_snippets_with_labelsto organize query logic.CREATE TEMP TABLEandDROP TABLEstatements.This description was created by
for 9ed3b63. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit