Skip to content

Conversation

@maparent
Copy link
Collaborator

@maparent maparent commented Jun 9, 2025

allows for easy indexing of references. Add some recursive function from Conversence.

Summary by CodeRabbit

  • New Features
    • Introduced advanced querying capabilities for exploring concept relationships, including functions to find related concepts and neighborhoods based on references and types.
    • Added support for recursive queries to traverse concept hierarchies and connections.
  • Database Changes
    • Split the existing content field into separate literal and reference content fields for improved data organization.
    • Added a new field to store extracted references and optimized querying with new indexes.

@vercel
Copy link

vercel bot commented Jun 9, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
discourse-graph ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 12, 2025 11:54pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This update restructures the "Concept" table by splitting its content into "literal_content" and "reference_content", introduces a generated "refs" column for efficient reference indexing, and adds a suite of recursive SQL functions for traversing and filtering concept relationships. New GIN indexes are created to optimize querying on these new columns.

Changes

File(s) Change Summary
.../migrations/20250609140958_concept_indexing.sql Migration: Splits "content" into "literal_content" and "reference_content", adds "refs" column, renames indexes, and introduces multiple recursive SQL functions for concept relationship traversal and filtering.
.../schemas/concept.sql Schema: Mirrors migration by defining new columns, indexes, and all recursive/non-recursive SQL functions for concept reference graph traversal and filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DB

    Client->>DB: Query sub_concepts_rec(target_id)
    DB-->>Client: Return all recursively referenced sub-concept IDs

    Client->>DB: Query concept_neighbourhood(target_id)
    DB-->>Client: Return all related concept IDs (inbound and outbound)

    Client->>DB: Query in_concepts_rec_filtered_ids(target_id, concept_types)
    DB-->>Client: Return referencing concept IDs filtered by type
Loading

Suggested reviewers

  • mdroidian

Poem

In the garden of concepts, roots intertwine,
Now split and indexed, their branches align.
Recursive bunnies hop, through references deep,
With GIN-powered leaps, no secrets they keep.
The warren grows richer, with functions anew—
Oh, what a fine burrow for data to pursue! 🐇


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@linear
Copy link

linear bot commented Jun 9, 2025

@maparent
Copy link
Collaborator Author

maparent commented Jun 9, 2025

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (11)
packages/database/supabase/migrations/20250609140958_concept_indexing.sql (11)

1-3: Duplicate: apply empty-array default in migration function
Same as the schema version—wrap array_agg in COALESCE here to avoid NULL on no matches.


10-10: Duplicate: handle NULL aggregates for REFS
Mirror the earlier COALESCE suggestion so REFS never comes back NULL.


12-12: Duplicate: consider JSONB index on reference_content
If you need JSON-level filtering on the raw content, add a GIN index on the JSONB column as well.


50-57: Duplicate: self-join bug in term-filtered recursion
Matches the issue in in_concepts_rec_filtered_ids; apply the same JOIN fix.


73-77: Duplicate: fix invocation in concept_neighbourhood_filtered_ids
Same change applies here for the filtered-IDs variant.


79-83: Duplicate: fix invocation in concept_neighbourhood_filtered_terms
Same correction for the term-filtered neighbourhood.


99-103: Duplicate: self-join issue in CL filtered IDs
The same JOIN duplication appears here; collapse into one alias with = ANY().


105-109: Duplicate: self-join issue in CL filtered terms
Apply the same fix for the term-filtered CL version.


140-144: Duplicate: fix invocation in CL neighbourhood
Same SFR-in-SELECT issue—move function call into FROM.


146-150: Duplicate: fix invocation in filtered CL neighbourhood (IDs)
Apply the same invocation change here.


152-156: Duplicate: fix invocation in filtered CL neighbourhood (terms)
Same correction for the term-filtered CL neighbourhood.

🧹 Nitpick comments (8)
packages/database/supabase/schemas/concept.sql (3)

15-17: Consider defaulting to an empty array in extract_references
Without a COALESCE, array_agg returns NULL when there are no array elements. Wrap it to ensure an empty array is returned.

 CREATE OR REPLACE FUNCTION extract_references(refs JSONB) RETURNS BIGINT[] LANGUAGE SQL IMMUTABLE AS $$
-  SELECT array_agg(i::bigint) FROM (SELECT jsonb_array_elements(jsonb_path_query_array(refs, '$.*[*]')) i) exrefs;
+  SELECT COALESCE(array_agg(i::bigint), '{}') FROM (SELECT jsonb_array_elements(jsonb_path_query_array(refs, '$.*[*]')) i) exrefs;
 $$;

60-62: Evaluate JSONB index type for literal_content
jsonb_path_ops only supports key-existence queries. If you need full JSON search, consider switching to jsonb_ops.


64-64: Consider indexing raw reference_content JSONB
You’ve indexed refs for array lookups, but direct JSONB property searches on reference_content may benefit from a GIN index on the JSONB itself.

packages/database/supabase/migrations/20250609140958_concept_indexing.sql (5)

8-8: Confirm default {} for reference_content is intended
Using a non-NULL default hides “no references” vs “empty object” semantics; ensure that aligns with your data model.


23-29: Guard against NULL in schema_in_type_terms
If no names match, array_agg yields NULL and propagates through. Wrap in COALESCE(type_list, '{}') before passing along.


32-39: Refactor inbound recursion for clarity
Use a direct join and = ANY() instead of a cross-join with overlap for better readability and likely performance:

 WITH RECURSIVE t(x) AS (
-  VALUES (target_id) UNION ALL
-  SELECT id FROM public."Concept", t
-  WHERE ARRAY[t.x] && refs
+  VALUES (target_id)
+ UNION ALL
+  SELECT c.id
+  FROM public."Concept" AS c
+  JOIN t ON t.x = ANY(c.refs)
 )

59-65: Optional: use LATERAL UNNEST in sub_concepts_rec
Making the UNNEST explicit via LATERAL improves readability:

 WITH RECURSIVE t(x) AS (
   VALUES (target_id)
-  UNION ALL
-  SELECT UNNEST(refs) FROM public."Concept", t WHERE id = t.x
+  UNION ALL
+  SELECT ref
+  FROM public."Concept" AS c
+  JOIN t ON c.id = t.x
+  CROSS JOIN LATERAL UNNEST(c.refs) AS ref
 )

86-89: Consider = ANY() instead of array overlap
In in_concepts_cl, target_id = ANY(refs) is clearer and may be more efficient than ARRAY[...] && refs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8ab3ef and 253026f.

📒 Files selected for processing (2)
  • packages/database/supabase/migrations/20250609140958_concept_indexing.sql (1 hunks)
  • packages/database/supabase/schemas/concept.sql (4 hunks)
🔇 Additional comments (8)
packages/database/supabase/schemas/concept.sql (1)

32-34: Ensure existing data is migrated to literal_content
Confirm that any pre-existing content JSONB values are correctly loaded into literal_content or they’ll be lost.

packages/database/supabase/migrations/20250609140958_concept_indexing.sql (7)

5-6: Verify no dependencies on the old content name
Renaming the column and its index must be coordinated with any views, queries, or ORM layers still referencing content.


15-21: Approve: simple type-membership test
schema_in_type_ids is concise and performant for checking membership.


91-97: Approve: lateral recursion in CL variant
The LATERAL approach in in_concepts_cl_rec is concise and accurate.


112-118: Approve: filtered CL recursion pattern
The LATERAL CTE pattern here is consistent and correct.


121-127: Approve: term-filtered CL recursion
This mirrors its ID-based counterpart correctly.


129-131: Approve: basic outbound sub_concepts
The simple UNNEST approach is correct for non-recursive queries.


133-138: Approve: recursive CL-based sub-concepts
The CTE with LATERAL unnest is implemented correctly.

Comment on lines 67 to 74
CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$
WITH super_structs AS (SELECT public.in_concepts_rec(target_id) AS id),
all_concepts_rep AS (SELECT public.sub_concepts_rec(id) AS id FROM super_structs)
SELECT DISTINCT id FROM all_concepts_rep
$$;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: incorrect function invocation in concept_neighbourhood
Set-returning functions must be in the FROM clause to expand rows. For example:

 WITH super_structs AS (
-  SELECT public.in_concepts_rec(target_id) AS id
+  SELECT id FROM public.in_concepts_rec(target_id)
 ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$
WITH super_structs AS (SELECT public.in_concepts_rec(target_id) AS id),
all_concepts_rep AS (SELECT public.sub_concepts_rec(id) AS id FROM super_structs)
SELECT DISTINCT id FROM all_concepts_rep
$$;
CREATE OR REPLACE FUNCTION public.concept_neighbourhood(target_id BIGINT)
RETURNS TABLE(id BIGINT)
LANGUAGE sql
STABLE AS $$
WITH super_structs AS (
SELECT id
FROM public.in_concepts_rec(target_id)
),
all_concepts_rep AS (
SELECT public.sub_concepts_rec(id) AS id
FROM super_structs
)
SELECT DISTINCT id
FROM all_concepts_rep
$$;
🤖 Prompt for AI Agents
In packages/database/supabase/migrations/20250609140958_concept_indexing.sql
around lines 67 to 71, the function calls to set-returning functions
public.in_concepts_rec and public.sub_concepts_rec are used incorrectly as
scalar expressions. To fix this, rewrite the function calls to appear in the
FROM clause using the proper syntax (e.g., SELECT id FROM
public.in_concepts_rec(target_id)) so that the set-returning functions expand
rows correctly. Adjust the CTEs accordingly to use FROM clauses for these
functions instead of calling them directly as expressions.

Comment on lines 41 to 50
CREATE OR REPLACE FUNCTION public.in_concepts_rec_filtered_ids(target_id BIGINT, concept_types BIGINT[]) RETURNS TABLE(id BIGINT) LANGUAGE sql STABLE AS $$
WITH RECURSIVE t(x) AS (
values (target_id) UNION ALL
SELECT si.id FROM t, public."Concept" AS si
JOIN public."Concept" USING(id)
WHERE ARRAY[t.x] && si.refs AND public.schema_in_type_ids(public."Concept".schema_id, concept_types))
SELECT DISTINCT x AS id FROM t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: remove redundant self-join in in_concepts_rec_filtered_ids
The JOIN ... USING(id) duplicates rows unnecessarily. Collapse into one alias and use = ANY() for clarity:

 WITH RECURSIVE t(x) AS (
   VALUES (target_id)
-  UNION ALL
-  SELECT si.id FROM t, public."Concept" AS si
-  JOIN public."Concept" USING(id)
-  WHERE ARRAY[t.x] && si.refs AND public.schema_in_type_ids(public."Concept".schema_id, concept_types)
+  UNION ALL
+  SELECT c.id
+  FROM public."Concept" AS c
+  JOIN t ON t.x = ANY(c.refs)
+  WHERE public.schema_in_type_ids(c.schema_id, concept_types)
 )
🤖 Prompt for AI Agents
In packages/database/supabase/migrations/20250609140958_concept_indexing.sql
around lines 41 to 47, the function in_concepts_rec_filtered_ids contains a
redundant self-join on the Concept table using JOIN ... USING(id), which causes
unnecessary row duplication. Remove the self-join and rewrite the recursive
query to use a single alias for the Concept table. Replace the array overlap
operator with = ANY() for checking if t.x is in si.refs to improve clarity and
correctness.

@mdroidian
Copy link
Contributor

@maparent This is listed as a subtask to ENG-332: Roam Platform to Concept translation, but I am unclear how it is releated.

Could you list all of the proposed additions and give examples of how/where they will be used? Remember, less is more for us in this stage.

@maparent
Copy link
Collaborator Author

maparent commented Jun 10, 2025

@maparent This is listed as a subtask to ENG-332: Roam Platform to Concept translation, but I am unclear how it is releated.

Could you list all of the proposed additions and give examples of how/where they will be used? Remember, less is more for us in this stage.

Fair enough, I was planning on doing more of this tomorrow. I could delay a lot of the functions; what's important is the table changes (So really the first 15 lines of the migration.) The point is first to have good indexes for the neighbourhood queries; and then to pre-fill some of the neighbourhood queries themselves. It's a toolbox. But sure, they can be inserted back one by one.

ETA: The reason I want the first step now is that it will impact how we will insert data; so doing it correctly now will avoid a lot of ugly migrations.

Copy link
Contributor

@mdroidian mdroidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@maparent maparent merged commit f304def into main Jun 13, 2025
3 checks passed
@github-project-automation github-project-automation bot moved this to Done in General Jun 13, 2025
@maparent maparent deleted the ENG-431-concept-indexing branch June 13, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants