-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not align cloned loops #48090
Do not align cloned loops #48090
Conversation
Regressions that are seen in certain methods are because we start aligning loops in those method where previously we would have not aligned because of offset where the loop falls. E.g. BubbleSort.VerifySort , GenericEqualityComparer[[Double]].IndexOf The improvement in below method that motivated this fix can be seen here. Lines 19 to 36 in dcda533
Libraries windows x64 superpmi
Detail diffs
Detail diffs
Detail diffs
Microbenchmarks windows x64 superpmi
Detail diffs
Detail diffs
Detail diffs
|
@dotnet/jit-contrib |
In my benchmark stability analysis post "loop alignment" work, I found out an interesting case where we were not removing the
align
flag from cloned loop if it has calls. More details can be seen in #43227 (comment). However, after careful thought, we should never align cloned loop because they are slower loops that will execute only in rare scenarios. Cloned loops are inserted in the flowgraph after the actual loop. With nested loop or in presence of conditions, its instruction group can stretch inside the method. Now if we decide to align the loop, we need to add the over-estimation compensationNOP
or disable VEX prefix encoding until we reach the instruction group having cloned loop. (Details about the compensation code / VEX prefix disabling can be seen in PR description of #44370). Addition of over-estimation compensation can be expensive specially if they get inserted in hot code or loops. Hence, reduce the possibility of adding such compensation by not aligning cloned loop.Also, with
COMPlus_JitDiffableDasm
, it is hard to understand how much padding was added byalign
instruction. Hence output the no. of bytes we padded.Contributes to #43227.