docs: add dataset relationship engine design document and architectur… by hjadm · Pull Request #40116 · apache/superset

hjadm · 2026-05-14T09:52:25Z

…e analysis

superset_relationship_design.md: comprehensive 3-phase technical design for dataset relationships engine (backend models, dual-mode JOIN engine for same-DB and cross-DB, REST API, React Flow visual canvas, cross-filtering, drill-down hierarchies, and filter propagation)
analise_superset.md: detailed architecture analysis of the Apache Superset repository covering monorepo structure, tech stack, code patterns, and improvement suggestions

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

…e analysis - superset_relationship_design.md: comprehensive 3-phase technical design for dataset relationships engine (backend models, dual-mode JOIN engine for same-DB and cross-DB, REST API, React Flow visual canvas, cross-filtering, drill-down hierarchies, and filter propagation) - analise_superset.md: detailed architecture analysis of the Apache Superset repository covering monorepo structure, tech stack, code patterns, and improvement suggestions

bito-code-review · 2026-05-14T09:52:36Z

Bito Automatic Review Skipped - Files Excluded

Bito didn't auto-review this change because all changed files are in the exclusion list for automatic reviews. No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change the excluded files settings here, or contact your Bito workspace admin at evan@preset.io.

netlify · 2026-05-14T09:54:03Z

✅ Deploy Preview for superset-docs-preview ready!

Name	Link
🔨 Latest commit	`bd8a63f`
🔍 Latest deploy log	https://app.netlify.com/projects/superset-docs-preview/deploys/6a059b5d8192be00077cc3fa
😎 Deploy Preview	https://deploy-preview-40116--superset-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

codeant-ai-for-open-source · 2026-05-14T09:57:13Z

+                qry = qry.join(
+                    target_table,
+                    join_condition,
+                    isouter=(rel.join_type == "LEFT"),
+                )
+    # === END NEW ===
+
+    # ... resto do pipeline de filtros, groupby, orderby ...
+    return qry
+```
+
+#### Modo 2 — Cross-Database (Pandas Merge)
+
+Quando os datasets estão em bancos diferentes, o engine executa queries separadas e faz merge dos DataFrames resultantes no nível da aplicação.
+
+**Ponto de injeção:** `superset/models/helpers.py` → `ExploreMixin.get_query_result()`
+
+```python
+# Pseudocódigo da modificação em get_query_result()
+def get_query_result(self, query_obj, ...):
+    # Executa query principal
+    primary_result = self.query(query_obj)
+    primary_df = primary_result.df
+
+    # === NEW: Cross-database merges ===
+    if query_obj.relationships:
+        for rel in query_obj.relationships:
+            if rel.is_cross_database:
+                # Proteção de memória
+                if len(primary_df) > RELATIONSHIP_MAX_MERGE_ROWS:
+                    raise RelationshipMergeError(
+                        f"Primary dataset exceeds {RELATIONSHIP_MAX_MERGE_ROWS} rows limit"
+                    )
+
+                # Executa query no dataset target
+                target_result = rel.target_dataset.query(
+                    _build_target_query(rel, primary_df)
+                )
+                target_df = target_result.df
+
+                if len(target_df) > RELATIONSHIP_MAX_MERGE_ROWS:
+                    raise RelationshipMergeError(
+                        f"Target dataset exceeds {RELATIONSHIP_MAX_MERGE_ROWS} rows limit"
+                    )
+
+                # Merge dos DataFrames
+                primary_df = primary_df.merge(
+                    target_df,
+                    left_on=[cm.source_column for cm in rel.column_mappings],
+                    right_on=[cm.target_column for cm in rel.column_mappings],
+                    how=rel.join_type.lower(),


🟠 Architect Review — HIGH

JOIN semantics in the design are inconsistent with the declared join_type options: in same-database mode only LEFT vs non-LEFT is honored (so RIGHT and FULL behave as INNER), and in cross-database mode join_type is passed directly to pandas.DataFrame.merge(how=...), so FULL becomes "full", which is not a valid how value.

Suggestion: Constrain join_type to the subset both engines can correctly implement or define explicit per-engine mappings (e.g., map FULL to a proper SQL full outer join and to how="outer" in pandas) so that the same join_type yields consistent, valid behavior in both same-DB and cross-DB paths.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood. **Path:** docs/superset_relationship_design.md **Line:** 179:229 **Comment:** *HIGH: JOIN semantics in the design are inconsistent with the declared `join_type` options: in same-database mode only `LEFT` vs non-`LEFT` is honored (so `RIGHT` and `FULL` behave as `INNER`), and in cross-database mode `join_type` is passed directly to `pandas.DataFrame.merge(how=...)`, so `FULL` becomes `"full"`, which is not a valid `how` value. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

codeant-ai-for-open-source · 2026-05-14T09:57:13Z

+    source_dataset_id   INTEGER NOT NULL REFERENCES ab_datasets(id) ON DELETE CASCADE,
+
+    -- Target dataset
+    target_dataset_id   INTEGER NOT NULL REFERENCES ab_datasets(id) ON DELETE CASCADE,
+


🟠 Architect Review — HIGH

The design uses inconsistent dataset table names: the initial DDL references ab_datasets(id) while the ORM and migration snippets use tables.id, conflicting with the actual Superset dataset model (SqlaTable.__tablename__ = "tables").

Suggestion: Normalize the design to reference the real dataset table (tables.id for SqlaTable) consistently across the DDL, ORM model, and migration examples to avoid implementing against a non-existent ab_datasets table.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖

This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood. **Path:** docs/superset_relationship_design.md **Line:** 46:50 **Comment:** *HIGH: The design uses inconsistent dataset table names: the initial DDL references `ab_datasets(id)` while the ORM and migration snippets use `tables.id`, conflicting with the actual Superset dataset model (`SqlaTable.__tablename__ = "tables"`). Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

pull-request-size Bot added the size/XL label May 14, 2026

dosubot Bot added design:proposal Design proposals doc:developer Developer documentation labels May 14, 2026

github-actions Bot added the doc Namespace | Anything related to documentation label May 14, 2026

hjadm closed this May 14, 2026

codeant-ai-for-open-source Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add dataset relationship engine design document and architectur…#40116

docs: add dataset relationship engine design document and architectur…#40116
hjadm wants to merge 1 commit into
apache:masterfrom
hjadm:feature/dataset-relationships-design

hjadm commented May 14, 2026

Uh oh!

bito-code-review Bot commented May 14, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 14, 2026

Uh oh!

codeant-ai-for-open-source Bot May 14, 2026

Uh oh!

codeant-ai-for-open-source Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hjadm commented May 14, 2026

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

bito-code-review Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented May 14, 2026

✅ Deploy Preview for superset-docs-preview ready!

Uh oh!

codeant-ai-for-open-source Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bito-code-review Bot commented May 14, 2026 •

edited

Loading