Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10909] Missed optimization for modulo #53299

Open
palimondo mannequin opened this issue Jun 10, 2019 · 11 comments
Open

[SR-10909] Missed optimization for modulo #53299

palimondo mannequin opened this issue Jun 10, 2019 · 11 comments

Comments

@palimondo
Copy link
Mannequin

@palimondo palimondo mannequin commented Jun 10, 2019

Previous ID SR-10909
Radar None
Original Reporter @palimondo
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Performance
Assignee None
Priority Medium

md5: d8c5b88b3f3e6a6164d9118e82dd5949

Issue Description:

In PR #25286 When applying O'Neill's modulo optimization to Lemire's Nearly Divisionless Random Number Generation, Swift does not produce the tight instructions that make it work in C. The original modulo:

let t = (0 &- upperBound) % upperBound

O'Neill's optimization:

var t = (0 &- upperBound)
 if (t >= upperBound) {
  t -= upperBound
  if (t >= upperBound) {
    t %= upperBound
  }
 }

See the assembly in Godbolt's Compiler Explorer:

Probable cause preventing better optimization is the guard against division by zero, but given that the method is already guarded by

_precondition(upperBound != 0, "upperBound cannot be zero." )

it should be possible to eliminate it...

Edit: I suspect I have misinterpreted the situation. The above assembly comparison wasn't fair as I took a too narrow code snippet I suspected to be problematic and took out code that included important hints for the Swift compiler to perform optimization. See comments below.

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 10, 2019

@stephentyrone
Copy link
Member

@stephentyrone stephentyrone commented Jun 10, 2019

My recollection is that _precondition does not imply an assume, but we should also be able to deduce this from the fact that the type is unsigned and earlier we have upperBound > something, IIRC.

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 10, 2019

Yes, the whole modulo operation, where this optimization would come in is inside

if m.low < upperBound {
   
} 

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 10, 2019

Would such optimization be in the realm of Swift, or is this already some LLVM stuff?

@stephentyrone
Copy link
Member

@stephentyrone stephentyrone commented Jun 10, 2019

It's up to Swift to make sure that the necessary information reaches the LLVM layer. It's up to LLVM to take advantage of it to do the optimization. So, both.

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 10, 2019

Could you please ping me on a PR that fixes this? I'd love to learn how that's done.

@belkadan
Copy link
Contributor

@belkadan belkadan commented Jun 10, 2019

Is there a self-contained example that can go in this bug? Neither the precondition nor the `>` check are present in the linked assembly generation.

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 10, 2019

The relevant method is in extension of RandomNumberGenerator in this commit here. I'm not sure how to replicate _precondition outside of stdlib internals. The subtraction on line 102 should probably be overflowing for better performance. For completeness, here it is again with that fix:

@inlinable
  public mutating func next<T: FixedWidthInteger & UnsignedInteger>(
    upperBound: T
  ) -> T {
    _precondition(upperBound != 0, "upperBound cannot be zero.")
    var random: T = next()
    var m = random.multipliedFullWidth(by: upperBound)
    if m.low < upperBound {
      var t = 0 &- upperBound
      if t >= upperBound {
        t &-= upperBound
        if t >= upperBound {
          t %= upperBound
        }
      }
      while m.low < t {
        random = next()
        m = random.multipliedFullWidth(by: upperBound)
      }
    }
    return m.high
  }
}

@belkadan
Copy link
Contributor

@belkadan belkadan commented Jun 11, 2019

Regular precondition is probably close enough. Or does that change the results significantly?

@palimondo
Copy link
Mannequin Author

@palimondo palimondo mannequin commented Jun 11, 2019

I suspect I have misinterpreted the situation. The Swift vs. C assembly comparison wasn't fair as I took a too narrow code snippet I suspected to be problematic and took out code that included important hints for the Swift compiler to perform optimization. Further, the original code was C++!

I re-did the Godbolt Compiler Explorer comparison again:

I believe the generated instructions are largely equivalent.

Looking again at the original article by O'Neill, the modulo optimization provides a slight improvement on Large-Shuffle Benchmark, but on Small-Shuffle Benchmark the version with modulo optimization performs the same as Lemire's original version "Debiased Int Mult (t-opt)" in chart. Our benchmarks in SBS most closely correspond to small shuffle, which would explain why I saw no improvement from modulo optimization on our benchmarks.

@stephentyrone I guess I should come up with a synthetic benchmark to mimic O'Neill's Large-Shuffle and we could re-introduce the modulo optimization together with your rejection sampling. Could you please review my analysis and close this bug as invalid, if you agree?

@zoecarver
Copy link
Collaborator

@zoecarver zoecarver commented May 14, 2020

I think the reason it's not optimized is because of the overflow checks. If you dump the llvm-ir, you can see that clang doesn't do this optimization itself, it just produces IR that LLVM can optimize.

If you change the example a bit and remove the `>=` checks: https://swift.godbolt.org/z/Psgou5 you get the optimization.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants