Skip to content

Comments

Run ProjectionPushdown after the last RelationCSE#31791

Merged
ggevay merged 3 commits intoMaterializeInc:mainfrom
ggevay:proj-pushdown-after-cse
Mar 13, 2025
Merged

Run ProjectionPushdown after the last RelationCSE#31791
ggevay merged 3 commits intoMaterializeInc:mainfrom
ggevay:proj-pushdown-after-cse

Conversation

@ggevay
Copy link
Contributor

@ggevay ggevay commented Mar 7, 2025

Edit: I've added the lts-backport-v25.1 label, but if this turns out to be too annoying to cherry-pick, then it's ok to skip this, as it's just a performance improvement.

This adds a call to ProjectionPushdown after the last call to RelationCSE. As explained in a code comment:

// `RelationCSE` can create new points of interest for `ProjectionPushdown`: If an MFP
// is cut in half by `RelationCSE`, then we'd like to push projections behind the new
// Get as much as possible. This is because a fork in the plan involves copying the
// data. (But we need `ProjectionPushdown` to skip joins, because it can't deal with
// filled in JoinImplementations.)

Slack discussion here:
https://materializeinc.slack.com/archives/C07SUTL7JMQ/p1741358397379089?thread_ts=1741341927.561309&cid=C07SUTL7JMQ

It's guarded by a feature flag as a precaution, but I've enabled it by default, as it's hard to imagine this hurting plans.

The first commit is the actual functionality, while the second commit just adds the feature flag.

Motivation

  • This PR increases performance.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay force-pushed the proj-pushdown-after-cse branch 2 times, most recently from 015bccd to 0a29afa Compare March 7, 2025 14:39
@ggevay ggevay force-pushed the proj-pushdown-after-cse branch from 0a29afa to b707fb8 Compare March 9, 2025 15:53
@ggevay ggevay requested review from frankmcsherry and mgree March 9, 2025 16:01
@ggevay ggevay added the A-CLUSTER Topics related to the CLUSTER layer label Mar 9, 2025
@ggevay ggevay marked this pull request as ready for review March 9, 2025 16:03
@ggevay ggevay requested review from a team as code owners March 9, 2025 16:03
@ggevay ggevay requested a review from ParkMyCar March 9, 2025 16:03
@ggevay ggevay changed the title WIP Run ProjectionPushdown after the last RelationCSE Run ProjectionPushdown after the last RelationCSE Mar 9, 2025
@ggevay ggevay force-pushed the proj-pushdown-after-cse branch 4 times, most recently from 8aaca69 to 112efc1 Compare March 9, 2025 18:31
Copy link
Contributor

@mgree mgree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me: code is simple enough and SLTs look good.

The join skipping is an unwelcome bit of complexity (it's almost the same logic), but here we are. Perhaps an issue for another time, but how hard would it be to actually look at JoinImplementations here?

@ggevay ggevay force-pushed the proj-pushdown-after-cse branch from 112efc1 to f077085 Compare March 12, 2025 17:17
@ggevay
Copy link
Contributor Author

ggevay commented Mar 12, 2025

Thanks for the review!

The join skipping is an unwelcome bit of complexity (it's almost the same logic), but here we are. Perhaps an issue for another time, but how hard would it be to actually look at JoinImplementations here?

It wouldn't be impossible:

  • We'd have to attend to all the scalar expressions in the JoinImplementation enum.
  • We'd have to attend to the ArrangeBys below the joins.
  • (I think JoinInputCharacteristics are unchanged. E.g., key fields are demanded, so key length is not changing.)
  • (I think we don't need to touch the Rows that are inside JoinImplementation::IndexedFilter, because those fields are involved in join conditions, so are demanded.)

One reason for which I decided to not go for JoinImplementation is that in the long run we want to move the JoinImplementation transform to just before the MIR lowering, and then manipulating the JoinImplementation enum in other transforms won't be needed anymore.

(Running Nightly now, and then merging.)

@ggevay
Copy link
Contributor Author

ggevay commented Mar 13, 2025

It's good that I've run Nightly, because there is an EXPLAIN output change 😅 Discussing here.

@ggevay ggevay force-pushed the proj-pushdown-after-cse branch from f077085 to a314350 Compare March 13, 2025 13:50
@ggevay ggevay force-pushed the proj-pushdown-after-cse branch from a314350 to a2bda1d Compare March 13, 2025 16:22
@ggevay ggevay added the self-managed-backport-v25.1 Needs to be backported into the v25.1 self-managed release label Mar 13, 2025
@ggevay ggevay merged commit 9bacb1a into MaterializeInc:main Mar 13, 2025
237 of 243 checks passed
@def-
Copy link
Contributor

def- commented Mar 24, 2025

Edit: I've added the lts-backport-v25.1 label, but if this turns out to be too annoying to cherry-pick, then it's ok to skip this, as it's just a performance improvement.

There is a bunch of conflicts indeed. I'm not sure if we should backport these kinds of changes in general, since there is also some risk for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLUSTER Topics related to the CLUSTER layer self-managed-backport-v25.1-done self-managed-backport-v25.1 Needs to be backported into the v25.1 self-managed release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants