Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double.remainder() performance issues? #55479

Closed
JosefWN opened this issue Apr 16, 2024 · 4 comments
Closed

double.remainder() performance issues? #55479

JosefWN opened this issue Apr 16, 2024 · 4 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size

Comments

@JosefWN
Copy link

JosefWN commented Apr 16, 2024

Hello everyone! Maybe I'm doing something wrong here, but I'm experiencing a pretty heavy performance hit with double.remainder() on ARM64 (M3) in Dart 3.3.3 on macOS.

As per the documentation, a.remainder(b) is mathematically equivalent to:

double remainder(final double a, final num b) => a - (a ~/ b) * b;

But when I compare the two, the custom implementation is somewhere between one and two orders of magnitude faster:

class RemainderBenchmark extends BenchmarkBase {
  const RemainderBenchmark() : super('Remainder');

  @override
  void run() {
    for (double i = -500; i <= 500; i += 0.75) {
      for (double j = -500; j <= 500; j += 0.75) {
        final _ = i.remainder(j);
      }
    }
  }
}

class RemainderCustomBenchmark extends BenchmarkBase {
  const RemainderCustomBenchmark() : super('RemainderCustom');

  @override
  void run() {
    for (double i = -500; i <= 500; i += 0.75) {
      for (double j = -500; j <= 500; j += 0.75) {
        final _ = remainder(i, j);
      }
    }
  }
}

AOT (dart compile exe):

RemainderCustom(RunTime): 38108.96153846154 us.
Remainder(RunTime): 478124.0 us. <- About 13x slower!

JIT (dart run):

RemainderCustom(RunTime): 9825.791469194313 us.
Remainder(RunTime): 507507.0 us. <- About 52x slower!

Any ideas why this is the case?

@lrhn
Copy link
Member

lrhn commented Apr 16, 2024

Similar results on (..checks...) intel CPU.

An up to 40x slowdown suggests to me that the code is either using a very inefficient assembler instruction, or it's calling into the C runtime for each .remainder call. Even using % is faster, but still much slower than the custom version.

Or maybe remainder on doubles is just amazingly expensive to do correctly, and the approximation won't give the same result in all cases. (And if that's OK, it's a good optimization.)

I do think double.remainder is compiled as a runtime call. The code for it is:

  double remainder(num other) {
    return _remainder(other.toDouble());
  }

  @pragma("vm:external-name", "Double_remainder")
  external double _remainder(double other);

which means calling a function in double.cc. There could be an intrinsified version that replaces this with generated assembler, but I can't find it.

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Apr 16, 2024
@JosefWN
Copy link
Author

JosefWN commented Apr 16, 2024

An up to 40x slowdown suggests to me that the code is either using a very inefficient assembler instruction, or it's calling into the C runtime for each .remainder call. Even using % is faster, but still much slower than the custom version.

Yes, I was also thinking it could be some form of call overhead, it seems unlikely that disastrous assembler is produced across multiple platforms? I actually re-implemented % as well, but it is quite performant on ARM64. I didn't manage to beat it...

Or maybe remainder on doubles is just amazingly expensive to do correctly, and the approximation won't give the same result in all cases. (And if that's OK, it's a good optimization.)

I think it's not a mathematical approximation, but an equivalence. I also considered that the two could be different numerically, but they seem to produce identical results over a pretty large number of inputs, i.e. i.remainder(j) == remainder(i, j) holds, in accordance with the docs (although i.remainder(j) produces a more straight-forward exception when j = 0). Of course, there could be non-trivial edge cases I have overlooked.

@a-siva
Copy link
Contributor

a-siva commented Apr 17, 2024

//cc @alexmarkov @aam

@a-siva a-siva added the triaged Issue has been triaged by sub team label Apr 17, 2024
@alexmarkov alexmarkov added the P2 A bug or feature request we're likely to work on label Apr 17, 2024
@alexmarkov alexmarkov self-assigned this Apr 29, 2024
@alexmarkov
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

4 participants