Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Mark to Semi join conversion in statistics propagation #11596

Merged

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Apr 10, 2024

fixes #10950

Statistics propagation happens after unused columns are removed. We should not be changing projection maps at this time, and converting mark joins to semi joins effectively does this.

This logic is/was in #11573, but I am pulling it out because there are actually errors now in 0.10.2 that will be fixed with this. @lnkuiper, to address your question there about projection maps, we cannot just look at the projection map of the parent join. Suppose the mark -> semi join conversion happens on a child that is the right child of another join. Originally the right child was projecting N columns, and now it is projecting N-1 columns.

If some join further up in the plan has a right projection map and is expected something like N - X columns, now it will receive N - X - 1 columns, hence leading to these errors.

I'm open to different ways of implementing this logic. Was also think about some pre-optimizer step to inspect the projection maps beforehand to see what conversions are possible. This is difficult, however, as this knowledge will need to be communicated from the statistics propagator to the filter pushdown optimizer. May stew on this for a bit, but this is still a fix that should probably go in to v0.10.2.

@Tmonster Tmonster requested a review from lnkuiper April 10, 2024 08:28
@Tmonster Tmonster changed the title No Mark to Semi in statistics propagation No Mark to Semi join conversion in statistics propagation Apr 10, 2024
@Tmonster Tmonster marked this pull request as draft April 11, 2024 12:20
@Tmonster Tmonster marked this pull request as ready for review April 11, 2024 12:20
@duckdb-draftbot duckdb-draftbot marked this pull request as draft April 11, 2024 12:21
@Tmonster Tmonster marked this pull request as ready for review April 18, 2024 08:50
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me!

Since projection maps are a common source of pain, we might, at some point, want to look into changing them such that they use ColumnBindings during logical planning instead. I think this would solve a lot of logic errors. Maybe only when going to the physical plan do we use indices, just like when we go from BoundColumnRefExpression to BoundReferenceExpression.

Maybe the struct could look something like this:

struct ProjectionMapEntry {
    ColumnBinding binding; // Used during logical planning
    optional_idx index; // Used in the physical plan
};

@Mytherin Mytherin merged commit 2eb4c3f into duckdb:main Apr 22, 2024
93 checks passed
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 5, 2024
Merge pull request duckdb/duckdb#11596 from Tmonster/no_mark_to_semi_in_statistics_propagation
Merge pull request duckdb/duckdb#11758 from carlopi/pyodide_no_source
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INTERNAL Error: Attempted to access index N within vector of size N
3 participants