Skip to content

[f] VER-263 - Replace temp table creation with CTE query#19

Merged
quancao-ea merged 2 commits intomainfrom
features/replace-temp-table-creation-with-CTE-query
Sep 26, 2025
Merged

[f] VER-263 - Replace temp table creation with CTE query#19
quancao-ea merged 2 commits intomainfrom
features/replace-temp-table-creation-with-CTE-query

Conversation

@quancao-ea
Copy link
Copy Markdown
Collaborator

@quancao-ea quancao-ea commented Sep 25, 2025

Important

Replaces temporary table with CTEs in get_snippets_function.sql to improve query structure and performance.

  • Behavior:
    • Replaces temporary table filtered_snippets with CTEs in get_snippets_function.sql.
    • Uses CTEs like_summary, filtered_snippets, paginated_snippets, label_summary, and paginated_snippets_with_labels to organize query logic.
    • Removes explicit CREATE TEMP TABLE and DROP TABLE statements.
  • Performance:
    • Potentially improves query performance by using CTEs instead of temporary tables.
  • Misc:
    • Adjusts ordering logic to use CTEs for sorting and pagination.

This description was created by Ellipsis for 9ed3b63. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features
    • Results now include accurate total_count alongside per-page snippets.
  • Bug Fixes
    • Corrected like/dislike counts and label aggregation in snippet results.
    • Stabilized sorting by upvotes, comments, activity, and default date ordering.
  • Refactor
    • Streamlined snippet retrieval to compute counts, labels, and pagination in one pass, improving load times and scalability.
    • Simplified data flow for more reliable field wiring (upvotes, comments, labels) across result pages.

@quancao-ea quancao-ea self-assigned this Sep 25, 2025
@linear
Copy link
Copy Markdown

linear Bot commented Sep 25, 2025

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 25, 2025

Walkthrough

Refactors the get_snippets SQL function to replace a temp-table-based approach with nested CTEs for likes/dislikes, filtering, pagination, and label aggregation. Introduces like_summary, paginated_snippets, label_summary, and paginated_snippets_with_labels, updates ordering and field wiring, and outputs aggregated JSON with labels and total_count in a single pass.

Changes

Cohort / File(s) Summary
CTE-based refactor and aggregation pipeline
supabase/database/sql/get_snippets_function.sql
Replaces temp table with CTEs: like_summary, filtered_snippets, paginated_snippets, label_summary, paginated_snippets_with_labels. Rewires like/dislike counts via COALESCE, aggregates labels post-pagination, updates ORDER BY to new fields, and returns final JSON with labels, total_count, and num_of_snippets.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant DB as Postgres
    participant FN as get_snippets()
    participant LS as like_summary CTE
    participant FS as filtered_snippets / paginated_snippets
    participant LB as label_summary
    participant PL as paginated_snippets_with_labels

    Client->>DB: CALL get_snippets(params)
    DB->>FN: Execute function
    FN->>LS: Aggregate likes/dislikes per snippet
    FN->>FS: Filter + sort + paginate snippets (with total_count)
    FN->>LB: Aggregate labels for paginated snippet IDs
    FN->>PL: Join paginated snippets with label aggregation
    FN-->>Client: JSON result {items with labels, total_count, num_of_snippets}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • nhphong

Poem

I hop through CTE fields so wide,
No temp-tables where I hide.
Likes and labels neatly spun,
Paginated paths—now one run.
JSON carrots, crisp and bright,
Snippets sorted just right. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the main change of the pull request—replacing temporary table creation with a CTE query—and references the relevant issue, making it immediately obvious to reviewers what the primary update is.
Linked Issues Check ✅ Passed The changes remove the temporary table workflow and implement the equivalent logic using nested CTEs, directly fulfilling the linked issue’s requirement to replace temp table creation with a CTE query.
Out of Scope Changes Check ✅ Passed All modifications in the pull request are focused on restructuring the query to use CTEs and updating associated aggregations and ordering logic, with no unrelated or extraneous changes introduced outside the scope of replacing the temp table approach.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch features/replace-temp-table-creation-with-CTE-query

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @quancao-ea, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the get_snippets_function.sql stored procedure to enhance both performance and maintainability. The core change involves transitioning from a temporary table approach to a more modern and efficient Common Table Expression (CTE) based architecture. A key optimization introduced is the lazy evaluation of label-related data, ensuring it's only computed for the snippets that are actually returned in the paginated results, thereby improving query execution speed.

Highlights

  • Refactoring to CTEs: The get_snippets_function.sql stored procedure has been refactored to replace the use of a temporary table (filtered_snippets) with a series of Common Table Expressions (CTEs).
  • Performance Optimization for Labels: Label data, including upvote counts and user upvote status, is now calculated only for the snippets that are part of the currently requested paginated result set, significantly reducing unnecessary computations.
  • Streamlined Pagination Logic: The pagination logic (LIMIT, OFFSET) and total count calculation are now integrated more cleanly within the CTE structure, improving query readability and potential performance.
  • Removal of Temp Table Overhead: The explicit creation and dropping of a temporary table have been eliminated, which can reduce overhead and simplify query execution.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid improvement, refactoring the get_snippets function to use Common Table Expressions (CTEs) instead of a temporary table. This enhances the query's structure and readability. My review focuses on further optimizing performance and improving maintainability. I've suggested using a LATERAL join for more efficient label data aggregation, and refactoring parts of the query to reduce code duplication and make the final JSON construction more robust. Overall, these are great changes.

Comment on lines +298 to +305
LEFT JOIN (
SELECT
snippet_label,
COUNT(*) AS upvote_count,
BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me
FROM label_upvotes lu
GROUP BY snippet_label
) lu ON lu.snippet_label = sl.id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of the lu subquery calculates aggregates over the entire label_upvotes table for every function call, which can be inefficient if the table grows large. A more performant approach is to use a LATERAL join. This will calculate upvote counts only for the labels belonging to the snippets on the current page, significantly reducing the amount of data processed.

        LEFT JOIN LATERAL (
            SELECT
                COUNT(*) AS upvote_count,
                BOOL_OR(upvoted_by = current_user_id) AS upvoted_by_me
            FROM public.label_upvotes lu
            WHERE lu.snippet_label = sl.id
        ) lu ON TRUE

Comment on lines +327 to +337
ORDER BY
CASE
WHEN p_order_by = 'upvotes' THEN ps.upvote_count + ps.like_count
WHEN p_order_by = 'comments' THEN ps.comment_count
WHEN p_order_by = 'activities' THEN
CASE
WHEN ps.user_last_activity IS NULL THEN 0
ELSE EXTRACT(EPOCH FROM ps.user_last_activity)
END
END DESC,
ps.recorded_at DESC -- Default for all other cases, including p_order_by = 'latest'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This ORDER BY clause is a duplicate of the one in the paginated_snippets CTE (lines 272-282). To improve maintainability and avoid having to update sorting logic in two places, you can remove this ORDER BY clause and apply the ordering directly within the jsonb_agg function in the final SELECT statement.

For example:

SELECT
    jsonb_agg(
        jsonb_build_object(...) 
        ORDER BY
            CASE ... END DESC,
            ps.recorded_at DESC
    ),
    MAX(ps.num_of_snippets)
INTO result, total_count
FROM paginated_snippets_with_labels ps;

Comment on lines +341 to +364
jsonb_build_object(
'id', ps.id,
'recorded_at', ps.recorded_at,
'user_last_activity', ps.user_last_activity,
'duration', ps.duration,
'start_time', ps.start_time,
'end_time', ps.end_time,
'file_path', ps.file_path,
'file_size', ps.file_size,
'political_leaning', ps.political_leaning,
'title', ps.title,
'summary', ps.summary,
'explanation', ps.explanation,
'confidence_scores', ps.confidence_scores,
'language', ps.language,
'context', ps.context,
'labels', ps.labels,
'audio_file', ps.audio_file,
'starred_by_user', ps.starred_by_user,
'user_like_status', ps.user_like_status,
'hidden', ps.hidden,
'like_count', ps.like_count,
'dislike_count', ps.dislike_count
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of manually constructing the JSON object by listing every column, you can use to_jsonb(ps) - 'num_of_snippets' to achieve the same result more concisely and robustly. This approach automatically includes all columns from the paginated_snippets_with_labels CTE (except for the internal num_of_snippets column), making the code easier to maintain. If you add or remove columns from the filtered_snippets CTE in the future, they will be automatically reflected in the output without needing to modify this part of the query.

            to_jsonb(ps) - 'num_of_snippets'

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b41936 and 9ed3b63.

📒 Files selected for processing (1)
  • supabase/database/sql/get_snippets_function.sql (4 hunks)

Comment on lines +366 to +367
MAX(ps.num_of_snippets)
INTO result, total_count
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Coalesce total_count so empty pages don’t return NULL pagination metadata

When the filtered set is empty, MAX(ps.num_of_snippets) yields NULL, so total_count stays NULL, CEIL(total_count::FLOAT / page_size) propagates NULL, and the response surfaces num_of_snippets/total_pages as null. That breaks downstream consumers expecting integers (e.g., page counts on the first empty page). Please coalesce the aggregate to 0 before assigning.

-        MAX(ps.num_of_snippets)
+        COALESCE(MAX(ps.num_of_snippets), 0)
🤖 Prompt for AI Agents
In supabase/database/sql/get_snippets_function.sql around lines 366 to 367, the
MAX(ps.num_of_snippets) aggregate can return NULL for empty result sets so
total_count becomes NULL and breaks pagination; change the assignment to
coalesce the aggregate to 0 (e.g., use COALESCE(MAX(ps.num_of_snippets), 0))
before INTO result, total_count so empty pages yield 0 for total_count and
subsequent CEIL/num_of_snippets calculations return integers.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 9ed3b63 in 2 minutes and 32 seconds. Click for details.
  • Reviewed 209 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. supabase/database/sql/get_snippets_function.sql:32
  • Draft comment:
    Good job replacing the temporary table with CTEs. This refactor improves clarity and avoids side‐effects associated with temp tables.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. supabase/database/sql/get_snippets_function.sql:306
  • Draft comment:
    Consider refactoring the IN (SELECT …) clause in the label_summary CTE to a JOIN if possible. This may improve clarity and performance when filtering snippet labels.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The suggestion appears technically correct - using a JOIN instead of IN (SELECT...) is often more performant in SQL. The change would be straightforward and maintains the same logic. However, both approaches are valid and the performance difference may be negligible since paginated_snippets is already filtered down by LIMIT/OFFSET. The current code is also quite readable and clear in its intent. The performance benefit may be minimal since paginated_snippets is already limited in size. The current code might actually be more readable and explicit about the filtering intent. While the suggestion is technically valid, the current code is clear and the optimization may not provide meaningful benefits given the limited dataset size. This is a valid but minor optimization suggestion. The current code is clear and likely performs well enough given the context.

Workflow ID: wflow_17fiSkMHetPAeyrF

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

),
paginated_snippets AS (
SELECT fs.*, COUNT(*) OVER() AS num_of_snippets
FROM filtered_snippets fs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ORDER BY logic is duplicated in both paginated_snippets and paginated_snippets_with_labels CTEs. Consolidating the ordering logic into one place would reduce maintenance overhead and potential inconsistencies.

LEFT JOIN (
SELECT
snippet_id,
jsonb_agg(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the order of labels within each snippet is important, consider adding an explicit ORDER BY inside the jsonb_agg call. Without it, the aggregated JSON array order may be unpredictable.

@quancao-ea quancao-ea requested a review from nhphong September 25, 2025 10:36
@quancao-ea quancao-ea merged commit ad7aa47 into main Sep 26, 2025
2 checks passed
@quancao-ea quancao-ea deleted the features/replace-temp-table-creation-with-CTE-query branch September 30, 2025 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants