-
Notifications
You must be signed in to change notification settings - Fork 4
ENG-431 Separate literal and references in concept data #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis update restructures the "Concept" table by splitting its content into "literal_content" and "reference_content", introduces a generated "refs" column for efficient reference indexing, and adds a suite of recursive SQL functions for traversing and filtering concept relationships. New GIN indexes are created to optimize querying on these new columns. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DB
Client->>DB: Query sub_concepts_rec(target_id)
DB-->>Client: Return all recursively referenced sub-concept IDs
Client->>DB: Query concept_neighbourhood(target_id)
DB-->>Client: Return all related concept IDs (inbound and outbound)
Client->>DB: Query in_concepts_rec_filtered_ids(target_id, concept_types)
DB-->>Client: Return referencing concept IDs filtered by type
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
4cfcea0 to
253026f
Compare
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (11)
packages/database/supabase/migrations/20250609140958_concept_indexing.sql (11)
1-3: Duplicate: apply empty-array default in migration function
Same as the schema version—wraparray_agginCOALESCEhere to avoidNULLon no matches.
10-10: Duplicate: handle NULL aggregates forREFS
Mirror the earlierCOALESCEsuggestion soREFSnever comes backNULL.
12-12: Duplicate: consider JSONB index onreference_content
If you need JSON-level filtering on the raw content, add a GIN index on the JSONB column as well.
50-57: Duplicate: self-join bug in term-filtered recursion
Matches the issue inin_concepts_rec_filtered_ids; apply the same JOIN fix.
73-77: Duplicate: fix invocation inconcept_neighbourhood_filtered_ids
Same change applies here for the filtered-IDs variant.
79-83: Duplicate: fix invocation inconcept_neighbourhood_filtered_terms
Same correction for the term-filtered neighbourhood.
99-103: Duplicate: self-join issue in CL filtered IDs
The same JOIN duplication appears here; collapse into one alias with= ANY().
105-109: Duplicate: self-join issue in CL filtered terms
Apply the same fix for the term-filtered CL version.
140-144: Duplicate: fix invocation in CL neighbourhood
Same SFR-in-SELECT issue—move function call intoFROM.
146-150: Duplicate: fix invocation in filtered CL neighbourhood (IDs)
Apply the same invocation change here.
152-156: Duplicate: fix invocation in filtered CL neighbourhood (terms)
Same correction for the term-filtered CL neighbourhood.
🧹 Nitpick comments (8)
packages/database/supabase/schemas/concept.sql (3)
15-17: Consider defaulting to an empty array inextract_references
Without aCOALESCE,array_aggreturnsNULLwhen there are no array elements. Wrap it to ensure an empty array is returned.CREATE OR REPLACE FUNCTION extract_references(refs JSONB) RETURNS BIGINT[] LANGUAGE SQL IMMUTABLE AS $$ - SELECT array_agg(i::bigint) FROM (SELECT jsonb_array_elements(jsonb_path_query_array(refs, '$.*[*]')) i) exrefs; + SELECT COALESCE(array_agg(i::bigint), '{}') FROM (SELECT jsonb_array_elements(jsonb_path_query_array(refs, '$.*[*]')) i) exrefs; $$;
60-62: Evaluate JSONB index type forliteral_content
jsonb_path_opsonly supports key-existence queries. If you need full JSON search, consider switching tojsonb_ops.
64-64: Consider indexing rawreference_contentJSONB
You’ve indexedrefsfor array lookups, but direct JSONB property searches onreference_contentmay benefit from a GIN index on the JSONB itself.packages/database/supabase/migrations/20250609140958_concept_indexing.sql (5)
8-8: Confirm default{}forreference_contentis intended
Using a non-NULL default hides “no references” vs “empty object” semantics; ensure that aligns with your data model.
23-29: Guard againstNULLinschema_in_type_terms
If no names match,array_aggyieldsNULLand propagates through. Wrap inCOALESCE(type_list, '{}')before passing along.
32-39: Refactor inbound recursion for clarity
Use a direct join and= ANY()instead of a cross-join with overlap for better readability and likely performance:WITH RECURSIVE t(x) AS ( - VALUES (target_id) UNION ALL - SELECT id FROM public."Concept", t - WHERE ARRAY[t.x] && refs + VALUES (target_id) + UNION ALL + SELECT c.id + FROM public."Concept" AS c + JOIN t ON t.x = ANY(c.refs) )
59-65: Optional: useLATERAL UNNESTinsub_concepts_rec
Making the UNNEST explicit via LATERAL improves readability:WITH RECURSIVE t(x) AS ( VALUES (target_id) - UNION ALL - SELECT UNNEST(refs) FROM public."Concept", t WHERE id = t.x + UNION ALL + SELECT ref + FROM public."Concept" AS c + JOIN t ON c.id = t.x + CROSS JOIN LATERAL UNNEST(c.refs) AS ref )
86-89: Consider= ANY()instead of array overlap
Inin_concepts_cl,target_id = ANY(refs)is clearer and may be more efficient thanARRAY[...] && refs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/database/supabase/migrations/20250609140958_concept_indexing.sql(1 hunks)packages/database/supabase/schemas/concept.sql(4 hunks)
🔇 Additional comments (8)
packages/database/supabase/schemas/concept.sql (1)
32-34: Ensure existing data is migrated toliteral_content
Confirm that any pre-existingcontentJSONB values are correctly loaded intoliteral_contentor they’ll be lost.packages/database/supabase/migrations/20250609140958_concept_indexing.sql (7)
5-6: Verify no dependencies on the oldcontentname
Renaming the column and its index must be coordinated with any views, queries, or ORM layers still referencingcontent.
15-21: Approve: simple type-membership test
schema_in_type_idsis concise and performant for checking membership.
91-97: Approve: lateral recursion in CL variant
The LATERAL approach inin_concepts_cl_recis concise and accurate.
112-118: Approve: filtered CL recursion pattern
The LATERAL CTE pattern here is consistent and correct.
121-127: Approve: term-filtered CL recursion
This mirrors its ID-based counterpart correctly.
129-131: Approve: basic outboundsub_concepts
The simpleUNNESTapproach is correct for non-recursive queries.
133-138: Approve: recursive CL-based sub-concepts
The CTE with LATERAL unnest is implemented correctly.
| CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$ | ||
| WITH super_structs AS (SELECT public.in_concepts_rec(target_id) AS id), | ||
| all_concepts_rep AS (SELECT public.sub_concepts_rec(id) AS id FROM super_structs) | ||
| SELECT DISTINCT id FROM all_concepts_rep | ||
| $$; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: incorrect function invocation in concept_neighbourhood
Set-returning functions must be in the FROM clause to expand rows. For example:
WITH super_structs AS (
- SELECT public.in_concepts_rec(target_id) AS id
+ SELECT id FROM public.in_concepts_rec(target_id)
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$ | |
| WITH super_structs AS (SELECT public.in_concepts_rec(target_id) AS id), | |
| all_concepts_rep AS (SELECT public.sub_concepts_rec(id) AS id FROM super_structs) | |
| SELECT DISTINCT id FROM all_concepts_rep | |
| $$; | |
| CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT) | |
| RETURNS TABLE(id BIGINT) | |
| LANGUAGE sql | |
| STABLE AS $$ | |
| WITH super_structs AS ( | |
| SELECT id | |
| FROM public.in_concepts_rec(target_id) | |
| ), | |
| all_concepts_rep AS ( | |
| SELECT public.sub_concepts_rec(id) AS id | |
| FROM super_structs | |
| ) | |
| SELECT DISTINCT id | |
| FROM all_concepts_rep | |
| $$; |
🤖 Prompt for AI Agents
In packages/database/supabase/migrations/20250609140958_concept_indexing.sql
around lines 67 to 71, the function calls to set-returning functions
public.in_concepts_rec and public.sub_concepts_rec are used incorrectly as
scalar expressions. To fix this, rewrite the function calls to appear in the
FROM clause using the proper syntax (e.g., SELECT id FROM
public.in_concepts_rec(target_id)) so that the set-returning functions expand
rows correctly. Adjust the CTEs accordingly to use FROM clauses for these
functions instead of calling them directly as expressions.
| CREATE OR REPLACE FUNCTION public.in_concepts_rec_filtered_ids(target_id BIGINT, concept_types BIGINT[]) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$ | ||
| WITH RECURSIVE t(x) AS ( | ||
| values (target_id) UNION ALL | ||
| SELECT si.id FROM t, public."Concept" AS si | ||
| JOIN public."Concept" USING(id) | ||
| WHERE ARRAY[t.x] && si.refs AND public.schema_in_type_ids(public."Concept".schema_id, concept_types)) | ||
| SELECT DISTINCT x AS id FROM t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: remove redundant self-join in in_concepts_rec_filtered_ids
The JOIN ... USING(id) duplicates rows unnecessarily. Collapse into one alias and use = ANY() for clarity:
WITH RECURSIVE t(x) AS (
VALUES (target_id)
- UNION ALL
- SELECT si.id FROM t, public."Concept" AS si
- JOIN public."Concept" USING(id)
- WHERE ARRAY[t.x] && si.refs AND public.schema_in_type_ids(public."Concept".schema_id, concept_types)
+ UNION ALL
+ SELECT c.id
+ FROM public."Concept" AS c
+ JOIN t ON t.x = ANY(c.refs)
+ WHERE public.schema_in_type_ids(c.schema_id, concept_types)
)🤖 Prompt for AI Agents
In packages/database/supabase/migrations/20250609140958_concept_indexing.sql
around lines 41 to 47, the function in_concepts_rec_filtered_ids contains a
redundant self-join on the Concept table using JOIN ... USING(id), which causes
unnecessary row duplication. Remove the self-join and rewrite the recursive
query to use a single alias for the Concept table. Replace the array overlap
operator with = ANY() for checking if t.x is in si.refs to improve clarity and
correctness.
|
@maparent This is listed as a subtask to ENG-332: Roam Platform to Concept translation, but I am unclear how it is releated. Could you list all of the proposed additions and give examples of how/where they will be used? Remember, less is more for us in this stage. |
fa330ce to
664e622
Compare
Fair enough, I was planning on doing more of this tomorrow. I could delay a lot of the functions; what's important is the table changes (So really the first 15 lines of the migration.) The point is first to have good indexes for the neighbourhood queries; and then to pre-fill some of the neighbourhood queries themselves. It's a toolbox. But sure, they can be inserted back one by one. ETA: The reason I want the first step now is that it will impact how we will insert data; so doing it correctly now will avoid a lot of ugly migrations. |
c3dab1c to
f31f66b
Compare
6a5bb4b to
7c87d8b
Compare
Allows to index of references.
7c87d8b to
85202e8
Compare
mdroidian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
allows for easy indexing of references. Add some recursive function from Conversence.
Summary by CodeRabbit