-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf experiment] A MIR pass removing unneded temporary locals #136788
Conversation
rustbot has assigned @matthewjasper. Use |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
Doesn't GVN already perform these optimizations? |
I don't think so. I run some of those MIR samples with impl<'tcx> crate::MirPass<'tcx> for GVN {
fn is_enabled(&self, sess: &rustc_session::Session) -> bool {
sess.mir_opt_level() >= 2
} I took the "Original" MIR in the example form nightly with opt-level 3: |
Oh, nvm, GVN doesn't propagate non-local operands like this, so it won't replace e.g. Regarding the structure of this pass, I wonder if it should use more sophisticated machinery rather than this new notion of statement pairs. Seems somewhat ad-hoc, whereas we already have tools that use the MIR graph to detect where assignments can be considered live for propagation like this. |
anyways, I guess here's a perf run @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiment] A MIR pass removing unneded temporary locals Experiment with early MIR optimization removing temporary locals # Motivation Currently, a lot of MIR contains unneded assigements to temporaries: ### Rust ```rust fn add_points(lhs:(f32,f32,f32),rhs:(f32,f32,f32))->(f32,f32,f32){ (lhs.0 + rhs.0, lhs.1 + rhs.1, lhs.2 + rhs.2) } ``` ### Orignal MIR ```mir bb0: { StorageLive(_3); StorageLive(_4); // _4 and _5 are not needed! _4 = copy (_1.0: f32); StorageLive(_5); _5 = copy (_2.0: f32); _3 = Add(move _4, move _5); StorageDead(_5); StorageDead(_4); StorageLive(_6); StorageLive(_7); // _7 and _8 are not needed! _7 = copy (_1.1: f32); StorageLive(_8); _8 = copy (_2.1: f32); _6 = Add(move _7, move _8); StorageDead(_8); StorageDead(_7); StorageLive(_9); StorageLive(_10); // _10 and _9 are not needed! _10 = copy (_1.2: f32); StorageLive(_11); _11 = copy (_2.2: f32); _9 = Add(move _10, move _11); StorageDead(_11); StorageDead(_10); _0 = (move _3, move _6, move _9); StorageDead(_9); StorageDead(_6); StorageDead(_3); return; } ``` This pass tries to remove as such many temporaries as possible. This leads to reduced MIR sizes, which should hopefully speed the next passes up. ```mir fn add_points(_1: (f32, f32, f32), _2: (f32, f32, f32)) -> (f32, f32, f32) { debug lhs => _1; debug rhs => _2; let mut _0: (f32, f32, f32); let mut _3: f32; let mut _4: f32; let mut _5: f32; bb0: { StorageLive(_3); _3 = Add(copy (_1.0: f32), copy (_2.0: f32)); StorageLive(_4); _4 = Add(copy (_1.1: f32), copy (_2.1: f32)); StorageLive(_5); _5 = Add(copy (_1.2: f32), copy (_2.2: f32)); _0 = (move _3, move _4, move _5); StorageDead(_5); StorageDead(_4); StorageDead(_3); return; } } ``` **This PR is not yet meant for merging!** The current version is still a bit from being done: it does not optimize locals used in calls, and some parts may need tweaking. Still, I belive it is at least worth timing at this point, which is why I am requesting a perf run.
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e730bff): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -3.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.8%, secondary -2.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 779.273s -> 776.343s (-0.38%) |
87e5f87
to
8f7383b
Compare
This comment has been minimized.
This comment has been minimized.
Maybe let's let GVN handles this. After #132527, GVN can handle this, I guess. // -Copt-level=0 -Zmir-enable-passes=+PropTrivialLocals
fn add_points(lhs: (f32, f32, f32)) -> (f32, f32, f32) {
(lhs.0, lhs.0, lhs.0)
} This should be fn add_points(_1: (f32, f32, f32)) -> (f32, f32, f32) {
debug lhs => _1;
let mut _0: (f32, f32, f32);
let mut _2: f32;
bb0: {
StorageLive(_2);
_2 = copy (_1.0: f32);
_0 = (move _2, copy (_1.0: f32), copy (_1.0: f32));
StorageDead(_2);
return;
}
} The fn add_points(_1: (f32, f32, f32)) -> (f32, f32, f32) {
debug lhs => _1;
let mut _0: (f32, f32, f32);
let mut _2: f32;
bb0: {
_2 = copy (_1.0: f32);
_0 = (copy _2, copy _2, copy _2);
return;
}
} I think the temporary local can make some passes easier, we can directly reuse the local when needed or known places are the same local. After inlined, the temporary local may be reused: fn add_points(lhs: (i32, i32)) -> i32 {
lhs.0 + bar(&lhs)
}
#[inline(always)]
fn bar(lhs: &(i32, i32)) -> i32 {
lhs.0 + lhs.1
} |
Triage: failing build and some comment to consider |
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged. You can start a rebase with the following commands:
The following commits are merge commits: |
The job Click to see the possible cause of the failure (guessed by this bot)
|
Experiment with early MIR optimization removing temporary locals
Motivation
Currently, a lot of MIR contains unneded assigements to temporaries:
Rust
Orignal MIR
This pass tries to remove as such many temporaries as possible. This leads to reduced MIR sizes, which should hopefully speed the next passes up.
This PR is not yet meant for merging!
The current version is still a bit from being done: it does not optimize locals used in calls, and some parts may need tweaking.
Still, I belive it is at least worth timing at this point, which is why I am requesting a perf run.