No Mark to Semi join conversion in statistics propagation #11596
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #10950
Statistics propagation happens after unused columns are removed. We should not be changing projection maps at this time, and converting mark joins to semi joins effectively does this.
This logic is/was in #11573, but I am pulling it out because there are actually errors now in 0.10.2 that will be fixed with this. @lnkuiper, to address your question there about projection maps, we cannot just look at the projection map of the parent join. Suppose the mark -> semi join conversion happens on a child that is the right child of another join. Originally the right child was projecting N columns, and now it is projecting N-1 columns.
If some join further up in the plan has a right projection map and is expected something like N - X columns, now it will receive N - X - 1 columns, hence leading to these errors.
I'm open to different ways of implementing this logic. Was also think about some pre-optimizer step to inspect the projection maps beforehand to see what conversions are possible. This is difficult, however, as this knowledge will need to be communicated from the statistics propagator to the filter pushdown optimizer. May stew on this for a bit, but this is still a fix that should probably go in to v0.10.2.