Skip to content

fix: Lineage works with unions#5550

Merged
max-sixty merged 2 commits intoPRQL:mainfrom
nightscape:fix-union-lineage
Nov 17, 2025
Merged

fix: Lineage works with unions#5550
max-sixty merged 2 commits intoPRQL:mainfrom
nightscape:fix-union-lineage

Conversation

@nightscape
Copy link
Copy Markdown
Contributor

@nightscape nightscape commented Nov 8, 2025

Fixes lineage tracking for append (UNION) operations to correctly track inputs from both source tables.

Previously, when using append to union two tables, the lineage only tracked the top table's inputs. The append function now merges bottom.inputs into top.inputs (similar to how join handles inputs), ensuring both source tables are tracked.

Added test to verify both inputs are tracked and column-level lineage works correctly.

@nightscape nightscape force-pushed the fix-union-lineage branch 3 times, most recently from a09b76b to 350c33b Compare November 8, 2025 17:27
@nightscape
Copy link
Copy Markdown
Contributor Author

The second commit is worth discussing:
If a table is union'd with itself, should lineage track both occurrences?
I have the slightest tendency towards "yes", because it is two different ways to get to the same table, both possibly containing different filters etc.
But if you deem the existing behavior correct, I'll happily adapt the implementation!

@nightscape nightscape force-pushed the fix-union-lineage branch 3 times, most recently from 8ab4cd3 to 755ba66 Compare November 8, 2025 17:38
@nightscape nightscape marked this pull request as ready for review November 8, 2025 17:42
@max-sixty
Copy link
Copy Markdown
Member

thanks for the PR @nightscape !

overall looks good!

I'm a bit worried about the "identical tables have very different behavior from almost-identical tables", though. I haven't looked enough in detail, but how much of a step back would this be:

  diff --git a/prqlc/prqlc/src/semantic/resolver/transforms.rs b/prqlc/prqlc/src/semantic/resolver/transforms.rs
  index 5f07a7499c01..xxxxx 100644
  --- a/prqlc/prqlc/src/semantic/resolver/transforms.rs
  +++ b/prqlc/prqlc/src/semantic/resolver/transforms.rs
  @@ -757,14 +757,12 @@ fn append(mut top: Lineage, bottom: Lineage) -> Result<Lineage, Error> {
                       except: except_b,
                   },
               ) => {
  -                // If both are All columns from the same input, merge the except sets
  -                // Otherwise, keep the top's input_id
  -                // Note: In a union, both inputs should be available, so we keep top's
  +                // Merge except sets from both tables
  +                // This preserves exclusion information even when input_ids differ
  +                // (e.g., "from employees select !{name}" append "from managers select !{salary}")
                   let mut except = except_t;
  -                if input_id_t == input_id_b {
  -                    // Same input, merge except sets
  -                    except.extend(except_b);
  -                }
  +                except.extend(except_b);
  +
                   LineageColumn::All {
                       input_id: input_id_t,
                       except,

(for transparency Claude helped me with the diff)

@nightscape
Copy link
Copy Markdown
Contributor Author

@max-sixty I applied your proposed change and ran the tests, everything still green 👍
I rebased on the latest main and force pushed.
All good to go from my side!

@max-sixty
Copy link
Copy Markdown
Member

thank you!

@max-sixty max-sixty merged commit 27058fd into PRQL:main Nov 17, 2025
36 checks passed
@nightscape nightscape deleted the fix-union-lineage branch November 19, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants