Investigate possibilities to improve performance of `Object.hash()` #50693

mkustermann · 2022-12-12T14:15:34Z

mkustermann · 2022-12-15T11:32:08Z

lrhn · 2022-12-16T15:14:05Z

As mentioned elsewhere, the goal should be to be as efficient as

finalizeHash(combineHash(combineHash(0, a.hashCode), b.hashCode))

There should be a combineHash call per actual hash code, skipping the first one risks losing some entropy, if the first object's hash code is not perfectly distributed.

Providing a seed, instead of using 0, should not affect performance much, but obviously it needs to come from somewhere. Clearing a register using xor is cheaper than loading from memory, but I don't think that's going to be a measurable part of the computation time of something which calls multiple hashCode getters.

The current implementation is shared between platforms. We could make all the public methods external and allow platforms to specialize any way they want to. There should be no issue with that.

A platform could also choose to use a randomized seed in debug mode, to ensure that users do not end up relying on, say, a particular ordering of a hash map. In production mode, that value could be fixed, or even zero, instead.
I'm not sold on determinism. It comes with a cost too. And I'd rather force nondeterminism into (at least) debug builds, than allow users to start thinking that the current value is somehow prescribed.

If we even consider using a different algorithm, then we should definitely make sure that we document that the hash value is not stable between executions. We've so far ensured that by using a random seed for user calls.

The optional seed parameters on SystemHash.hashX don't need to be optional. It's internal-only code, so we can just make them required, and add , 0 to the calls that don't pass an argument.
I'd encourage compilers to try to inline both Object.hash and those methods.

If it's a problem for the VM that someone might get access to the sentinel value used to detect the actual number of arguments passed, I wouldn't worry overly about it. If Object.hash(o1, o2) does not give the same result as Object.hash(o1, o2, leakedSentinel), even though the library source code looks like it should, just say that the library source code is not normative. The documentation is, the implementation we allow people to see is at most a suggestion.

Issue #50693 Change-Id: Ib587b70bcb57cbd2d16319b7814e2569c7e41213 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/276161 Commit-Queue: Lasse Nielsen <lrn@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>

mkustermann · 2023-01-04T11:35:28Z

... skipping the first one risks losing some entropy, if the first object's hash code is not perfectly distributed.

Aren't we assuming that any obj.hashCode is well distributed?

Users write code such as

class Foo {
  final a;

  int get hashCode => a.hashCode;
  bool operator(other) => identical(this, other) || other is Foo && a == other.a;
}

Once they add a field, it changes to

class Foo {
  final a;
  final b;

  int get hashCode => Object.hash(a, b);
  bool operator(other) => identical(this, other) || other is Foo && a == other.a && b == other.b;
}

In fact Object.hash() requires at least 2 objects to hash. So it encourages this code.

Given the above, the assumption is that a.hashCode is well distributed.

If the primitives in core library have well-distributed .hashCode implementations (int, double, String, ...) and anything that combines such hash codes uses our Object.hash() / ... functions, things are always well distributed.

lrhn · 2023-01-04T11:51:37Z

Aren't we assuming that any obj.hashCode is well distributed?

That's historically not been sound to assume.
Just because we've now fixed int.hashCode doesn't give me faith that there are no other bad hash-codes out there.

Consider a class like;

class SomethingEntity {
   static int _counter = 0;
   // Each instance has a unique ID.
   final int id = _counter++;
   // More efficient than identityHashCode
   int get hashCode => id; 
   // Still use identity for equality.
   bool operator==(Object other);
}

That seems as a safe and correct hash-code implementation, but it's not well-distributed.
It's something reasonable people will do.

(I can see we've change a bunch of => id; into => id.hashCode;, but we haven't done them all, and third-party code probably hasn't either.)

mkustermann added area-vm Use area-vm for VM related issues, including code coverage, FFI, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Dec 12, 2022

mkustermann mentioned this issue Jan 4, 2023

Avoid custom code to combine hash codes, instead use Object.hash / Object.hashAll dart-lang/collection#264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate possibilities to improve performance of `Object.hash()` #50693

Investigate possibilities to improve performance of `Object.hash()` #50693

mkustermann commented Dec 12, 2022 •

edited by rakudrama

mkustermann commented Dec 15, 2022

lrhn commented Dec 16, 2022

mkustermann commented Jan 4, 2023

lrhn commented Jan 4, 2023

Investigate possibilities to improve performance of Object.hash() #50693

Investigate possibilities to improve performance of Object.hash() #50693

Comments

mkustermann commented Dec 12, 2022 • edited by rakudrama

mkustermann commented Dec 15, 2022

lrhn commented Dec 16, 2022

mkustermann commented Jan 4, 2023

lrhn commented Jan 4, 2023

Investigate possibilities to improve performance of `Object.hash()` #50693

Investigate possibilities to improve performance of `Object.hash()` #50693

mkustermann commented Dec 12, 2022 •

edited by rakudrama