Skip to content

Add BFS fallback for getting node downstreams with high fanout#1479

Merged
shangyian merged 6 commits intoDataJunction:mainfrom
shangyian:downstreams-bfs-fallback
Aug 27, 2025
Merged

Add BFS fallback for getting node downstreams with high fanout#1479
shangyian merged 6 commits intoDataJunction:mainfrom
shangyian:downstreams-bfs-fallback

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Aug 27, 2025

Summary

When retrieving downstreams for a node, nodes with large downstream graphs will cause the recursive CTE query to hang and eventually fail, after a significant period of database processing.

This PR introduces a BFS-based fallback for retrieving downstream nodes in cases where the node's initial set of children exceeds a configurable fanout threshold. This is initial count is being used as a proxy for the final size of the node's downstream graph.

A BFS approach allows processing level by level, reducing load on the database in cases of excessive recursion. The nodes on each level are processed concurrently, with max concurrency configurable at the server level.

Test Plan

Added a test to compare results from the BFS vs recursive CTE approaches

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify bot commented Aug 27, 2025

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit dff2d1e
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/68af2c4d74e1a00007fab81b

@shangyian shangyian marked this pull request as ready for review August 27, 2025 17:17
@shangyian shangyian merged commit e354562 into DataJunction:main Aug 27, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant