[f] VER-263 - Replace temp table creation with CTE query by quancao-ea · Pull Request #19 · PublicDataWorks/verdad

quancao-ea · 2025-09-25T10:18:16Z

Important

Replaces temporary table with CTEs in get_snippets_function.sql to improve query structure and performance.

Behavior:
- Replaces temporary table filtered_snippets with CTEs in get_snippets_function.sql.
- Uses CTEs like_summary, filtered_snippets, paginated_snippets, label_summary, and paginated_snippets_with_labels to organize query logic.
- Removes explicit CREATE TEMP TABLE and DROP TABLE statements.
Performance:
- Potentially improves query performance by using CTEs instead of temporary tables.
Misc:
- Adjusts ordering logic to use CTEs for sorting and pagination.

^{This description was created by}^{for 9ed3b63. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Results now include accurate total_count alongside per-page snippets.
Bug Fixes
- Corrected like/dislike counts and label aggregation in snippet results.
- Stabilized sorting by upvotes, comments, activity, and default date ordering.
Refactor
- Streamlined snippet retrieval to compute counts, labels, and pagination in one pass, improving load times and scalability.
- Simplified data flow for more reliable field wiring (upvotes, comments, labels) across result pages.

linear · 2025-09-25T10:18:19Z

VER-263 Replace temp table creation with CTE query

coderabbitai · 2025-09-25T10:18:23Z

Walkthrough

Refactors the get_snippets SQL function to replace a temp-table-based approach with nested CTEs for likes/dislikes, filtering, pagination, and label aggregation. Introduces like_summary, paginated_snippets, label_summary, and paginated_snippets_with_labels, updates ordering and field wiring, and outputs aggregated JSON with labels and total_count in a single pass.

Changes

Cohort / File(s)	Summary
CTE-based refactor and aggregation pipeline `supabase/database/sql/get_snippets_function.sql`	Replaces temp table with CTEs: like_summary, filtered_snippets, paginated_snippets, label_summary, paginated_snippets_with_labels. Rewires like/dislike counts via COALESCE, aggregates labels post-pagination, updates ORDER BY to new fields, and returns final JSON with labels, total_count, and num_of_snippets.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant DB as Postgres
    participant FN as get_snippets()
    participant LS as like_summary CTE
    participant FS as filtered_snippets / paginated_snippets
    participant LB as label_summary
    participant PL as paginated_snippets_with_labels

    Client->>DB: CALL get_snippets(params)
    DB->>FN: Execute function
    FN->>LS: Aggregate likes/dislikes per snippet
    FN->>FS: Filter + sort + paginate snippets (with total_count)
    FN->>LB: Aggregate labels for paginated snippet IDs
    FN->>PL: Join paginated snippets with label aggregation
    FN-->>Client: JSON result {items with labels, total_count, num_of_snippets}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[f] VER-261 - Optimize get snippets with labels #18 — Similar refactor of get_snippets label and upvote aggregation using CTEs and revised current-user/upvote wiring.

Suggested reviewers

nhphong

Poem

I hop through CTE fields so wide,
No temp-tables where I hide.
Likes and labels neatly spun,
Paginated paths—now one run.
JSON carrots, crisp and bright,
Snippets sorted just right. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely describes the main change of the pull request—replacing temporary table creation with a CTE query—and references the relevant issue, making it immediately obvious to reviewers what the primary update is.
Linked Issues Check	✅ Passed	The changes remove the temporary table workflow and implement the equivalent logic using nested CTEs, directly fulfilling the linked issue’s requirement to replace temp table creation with a CTE query.
Out of Scope Changes Check	✅ Passed	All modifications in the pull request are focused on restructuring the query to use CTEs and updating associated aggregations and ordering logic, with no unrelated or extraneous changes introduced outside the scope of replacing the temp table approach.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch features/replace-temp-table-creation-with-CTE-query

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-09-25T10:18:32Z

Summary of Changes

Hello @quancao-ea, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the get_snippets_function.sql stored procedure to enhance both performance and maintainability. The core change involves transitioning from a temporary table approach to a more modern and efficient Common Table Expression (CTE) based architecture. A key optimization introduced is the lazy evaluation of label-related data, ensuring it's only computed for the snippets that are actually returned in the paginated results, thereby improving query execution speed.

Highlights

Refactoring to CTEs: The get_snippets_function.sql stored procedure has been refactored to replace the use of a temporary table (filtered_snippets) with a series of Common Table Expressions (CTEs).
Performance Optimization for Labels: Label data, including upvote counts and user upvote status, is now calculated only for the snippets that are part of the currently requested paginated result set, significantly reducing unnecessary computations.
Streamlined Pagination Logic: The pagination logic (LIMIT, OFFSET) and total count calculation are now integrated more cleanly within the CTE structure, improving query readability and potential performance.
Removal of Temp Table Overhead: The explicit creation and dropping of a temporary table have been eliminated, which can reduce overhead and simplify query execution.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a solid improvement, refactoring the get_snippets function to use Common Table Expressions (CTEs) instead of a temporary table. This enhances the query's structure and readability. My review focuses on further optimizing performance and improving maintainability. I've suggested using a LATERAL join for more efficient label data aggregation, and refactoring parts of the query to reduce code duplication and make the final JSON construction more robust. Overall, these are great changes.

gemini-code-assist · 2025-09-25T10:20:23Z

+        LEFT JOIN (
+            SELECT
+                snippet_label,
+                COUNT(*) AS upvote_count,
+                BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me
+            FROM label_upvotes lu
+            GROUP BY snippet_label
+        ) lu ON lu.snippet_label = sl.id


The current implementation of the lu subquery calculates aggregates over the entire label_upvotes table for every function call, which can be inefficient if the table grows large. A more performant approach is to use a LATERAL join. This will calculate upvote counts only for the labels belonging to the snippets on the current page, significantly reducing the amount of data processed.

LEFT JOIN LATERAL ( SELECT COUNT(*) AS upvote_count, BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me FROM public.label_upvotes lu WHERE lu.snippet_label = sl.id ) lu ON TRUE

gemini-code-assist · 2025-09-25T10:20:23Z

+        ORDER BY
+            CASE
+                WHEN p_order_by = 'upvotes' THEN ps.upvote_count + ps.like_count
+                WHEN p_order_by = 'comments' THEN ps.comment_count
+                WHEN p_order_by = 'activities' THEN
+                    CASE
+                        WHEN ps.user_last_activity IS NULL THEN 0
+                        ELSE EXTRACT(EPOCH FROM ps.user_last_activity)
+                    END
+            END DESC,
+            ps.recorded_at DESC -- Default for all other cases, including p_order_by = 'latest'


This ORDER BY clause is a duplicate of the one in the paginated_snippets CTE (lines 272-282). To improve maintainability and avoid having to update sorting logic in two places, you can remove this ORDER BY clause and apply the ordering directly within the jsonb_agg function in the final SELECT statement.

For example:

SELECT jsonb_agg( jsonb_build_object(...) ORDER BY CASE ... END DESC, ps.recorded_at DESC ), MAX(ps.num_of_snippets) INTO result, total_count FROM paginated_snippets_with_labels ps;

gemini-code-assist · 2025-09-25T10:20:23Z

+            jsonb_build_object(
+                'id', ps.id,
+                'recorded_at', ps.recorded_at,
+                'user_last_activity', ps.user_last_activity,
+                'duration', ps.duration,
+                'start_time', ps.start_time,
+                'end_time', ps.end_time,
+                'file_path', ps.file_path,
+                'file_size', ps.file_size,
+                'political_leaning', ps.political_leaning,
+                'title', ps.title,
+                'summary', ps.summary,
+                'explanation', ps.explanation,
+                'confidence_scores', ps.confidence_scores,
+                'language', ps.language,
+                'context', ps.context,
+                'labels', ps.labels,
+                'audio_file', ps.audio_file,
+                'starred_by_user', ps.starred_by_user,
+                'user_like_status', ps.user_like_status,
+                'hidden', ps.hidden,
+                'like_count', ps.like_count,
+                'dislike_count', ps.dislike_count
+            )


Instead of manually constructing the JSON object by listing every column, you can use to_jsonb(ps) - 'num_of_snippets' to achieve the same result more concisely and robustly. This approach automatically includes all columns from the paginated_snippets_with_labels CTE (except for the internal num_of_snippets column), making the code easier to maintain. If you add or remove columns from the filtered_snippets CTE in the future, they will be automatically reflected in the output without needing to modify this part of the query.

to_jsonb(ps) - 'num_of_snippets'

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b41936 and 9ed3b63.

📒 Files selected for processing (1)

supabase/database/sql/get_snippets_function.sql (4 hunks)

coderabbitai · 2025-09-25T10:20:50Z

+        MAX(ps.num_of_snippets)
+    INTO result, total_count


⚠️ Potential issue | 🟠 Major

Coalesce total_count so empty pages don’t return NULL pagination metadata

When the filtered set is empty, MAX(ps.num_of_snippets) yields NULL, so total_count stays NULL, CEIL(total_count::FLOAT / page_size) propagates NULL, and the response surfaces num_of_snippets/total_pages as null. That breaks downstream consumers expecting integers (e.g., page counts on the first empty page). Please coalesce the aggregate to 0 before assigning.

- MAX(ps.num_of_snippets) + COALESCE(MAX(ps.num_of_snippets), 0)

🤖 Prompt for AI Agents

In supabase/database/sql/get_snippets_function.sql around lines 366 to 367, the MAX(ps.num_of_snippets) aggregate can return NULL for empty result sets so total_count becomes NULL and breaks pagination; change the assignment to coalesce the aggregate to 0 (e.g., use COALESCE(MAX(ps.num_of_snippets), 0)) before INTO result, total_count so empty pages yield 0 for total_count and subsequent CEIL/num_of_snippets calculations return integers.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 9ed3b63 in 2 minutes and 32 seconds. Click for details.

Reviewed 209 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. supabase/database/sql/get_snippets_function.sql:32

Draft comment:
Good job replacing the temporary table with CTEs. This refactor improves clarity and avoids side‐effects associated with temp tables.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. supabase/database/sql/get_snippets_function.sql:306

Draft comment:
Consider refactoring the IN (SELECT …) clause in the label_summary CTE to a JOIN if possible. This may improve clarity and performance when filtering snippet labels.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The suggestion appears technically correct - using a JOIN instead of IN (SELECT...) is often more performant in SQL. The change would be straightforward and maintains the same logic. However, both approaches are valid and the performance difference may be negligible since paginated_snippets is already filtered down by LIMIT/OFFSET. The current code is also quite readable and clear in its intent. The performance benefit may be minimal since paginated_snippets is already limited in size. The current code might actually be more readable and explicit about the filtering intent. While the suggestion is technically valid, the current code is clear and the optimization may not provide meaningful benefits given the limited dataset size. This is a valid but minor optimization suggestion. The current code is clear and likely performs well enough given the context.

Workflow ID: wflow_17fiSkMHetPAeyrF

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-09-25T10:20:52Z

+    ),
+    paginated_snippets AS (
+        SELECT fs.*, COUNT(*) OVER() AS num_of_snippets
+        FROM filtered_snippets fs


The ORDER BY logic is duplicated in both paginated_snippets and paginated_snippets_with_labels CTEs. Consolidating the ordering logic into one place would reduce maintenance overhead and potential inconsistencies.

ellipsis-dev · 2025-09-25T10:20:52Z

+        LEFT JOIN (
+            SELECT
+                snippet_id,
+                jsonb_agg(


If the order of labels within each snippet is important, consider adding an explicit ORDER BY inside the jsonb_agg call. Without it, the aggregated JSON array order may be unpredictable.

quancao-ea added 2 commits September 25, 2025 11:43

[f] Move like_summary to CTE

08ecb2c

[f] Convert temp table creation to CTEs

9ed3b63

quancao-ea self-assigned this Sep 25, 2025

gemini-code-assist Bot reviewed Sep 25, 2025

View reviewed changes

coderabbitai Bot reviewed Sep 25, 2025

View reviewed changes

ellipsis-dev Bot reviewed Sep 25, 2025

View reviewed changes

quancao-ea requested a review from nhphong September 25, 2025 10:36

nhphong approved these changes Sep 26, 2025

View reviewed changes

quancao-ea merged commit ad7aa47 into main Sep 26, 2025
2 checks passed

quancao-ea deleted the features/replace-temp-table-creation-with-CTE-query branch September 30, 2025 04:11

coderabbitai Bot mentioned this pull request Jan 15, 2026

fix: resolve search timeout by using HNSW index #54

Open

3 tasks

coderabbitai Bot mentioned this pull request Jan 29, 2026

VER-304: Optimize get_snippets filters and searching to fix timeout issues #57

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[f] VER-263 - Replace temp table creation with CTE query#19

[f] VER-263 - Replace temp table creation with CTE query#19
quancao-ea merged 2 commits intomainfrom
features/replace-temp-table-creation-with-CTE-query

quancao-ea commented Sep 25, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

linear Bot commented Sep 25, 2025

Uh oh!

coderabbitai Bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Sep 25, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Sep 25, 2025

Uh oh!

ellipsis-dev Bot left a comment

Uh oh!

ellipsis-dev Bot Sep 25, 2025

Uh oh!

ellipsis-dev Bot Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

quancao-ea commented Sep 25, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear Bot commented Sep 25, 2025

Uh oh!

coderabbitai Bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist Bot commented Sep 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quancao-ea commented Sep 25, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 25, 2025 •

edited

Loading