Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashCode for immutable objects should be memoized #48948

Closed
gaaclarke opened this issue May 3, 2022 · 17 comments
Closed

hashCode for immutable objects should be memoized #48948

gaaclarke opened this issue May 3, 2022 · 17 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size

Comments

@gaaclarke
Copy link
Contributor

I was profiling the Flutter Gallery and noticed that Locale.hashCode (as in dart:ui.Locale) was taking a significant amount of time because the Flutter Gallery keeps a LinkedHashMap<Locale, DisplayOption>. Locale is intended to be immutable so there is no reason to keep calculating its hashCode. It would be nice if we could avoid recalculating hashCode in this case.

Reproduction code

import 'dart:collection';

class Foo {
  const Foo(this.x, this.y);

  final int x;
  final int y;

  @override
  int get hashCode {
    print('${identityHashCode(this)}');
    return x.hashCode ^ y.hashCode;
  }
}

void main() {
  Map<Foo, int> values = {};
  values[Foo(1, 100)] = 2;
}

Expected output

hashCode is called once

Actual output

hashCode is called twice

@gaaclarke
Copy link
Contributor Author

It's a perfectly rational decision to say this falls on the shoulders of the developer, just throwing it out there as a suggestion as something we might be able to do to give a boost to many apps.

@gaaclarke
Copy link
Contributor Author

Related, if we memoized hashCode we could calculate them at build time for string literals.

@mkustermann
Copy link
Member

I think there's two things here:

  • The VM's default map implementation is calling the hashCode getter twice when inserting elements (dart2js's implementation calls it only once). We should optimize this! (@dcharkes Since you're familiar with our maps, could you look into this?)
  • Proposing a language change that allows caching the returned hashCode value (i.e. not necessarily call the hashCode getter for each invocation). Maybe an issue in https://github.com/dart-lang/language could be opened for that (/cc @lrhn @eernstg for language related things)

Marking this issue as area-vm for the performance optimization.

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels May 4, 2022
@lrhn
Copy link
Member

lrhn commented May 4, 2022

Generally we can't do language promises about hashCode since it's a user-implementable getter. It can do anything (and occasionally does, e.g. in mocks).

Usually, letting the code cache the value is safer than trying to optimize in the compiler for code reading the same getter twice, since we don't know why it does so (it might be a mutable object that has been mutated, that's completely invisible to the type system).

@dcharkes
Copy link
Contributor

dcharkes commented May 4, 2022

  • The VM's default map implementation is calling the hashCode getter twice when inserting elements (dart2js's implementation calls it only once). We should optimize this!

Fix: https://dart-review.googlesource.com/c/sdk/+/243623

@gaaclarke
Copy link
Contributor Author

Yea, that makes sense that it would be tricky. As for calculating hashCode for literals at compile-time, it's a good idea but I have no evidence that it is a good use of our time. I'll just consider this closed after the duplicate call to hashCode is fixed. That will possibly help Flutter Gallery, thanks everyone.

@dnfield
Copy link
Contributor

dnfield commented May 4, 2022

Having hash code change is almost certainly a bug.

This is why the lints to only override hash code on immutable objets exist.

@gaaclarke
Copy link
Contributor Author

Having hash code change is almost certainly a bug.

Yea, I think what @lrhn was saying was we can't know for certain what people are doing in hashCode, so memoizing it is technically a breaking change. Even if it improved all of our users practically, a problematic use-case can block the change, no matter how unlikely it is. That's the unfortunate position Dart has put itself in.

(it might be a mutable object that has been mutated, that's completely invisible to the type system).

The premise for the change was that it would happen for immutable objects. In the case I was looking at Locale in Flutter it is a const object that can't be changed and that is knowable at compile-time.

@dnfield
Copy link
Contributor

dnfield commented May 4, 2022

So let me put this differently:

We already have the problem @lrhn is talking about when using sets or maps, for example

void main() {
  final fooSet = <MutableFoo>{};
  MutableFoo a = MutableFoo(1);
  MutableFoo b = MutableFoo(1);
  MutableFoo c = MutableFoo(2);

  fooSet.add(a);
  print(fooSet);
  
  a.i = 2;
  fooSet.add(b);
  
  b.i = 2;
  print(fooSet);
  
  fooSet.add(c);
  print(fooSet); // {MutableFoo(2), MutableFoo(2), MutableFoo(2)}
  
}

class MutableFoo {
  MutableFoo(this.i);
  
  int i;
  
  @override
  int get hashCode => i;
  
  @override
  bool operator==(Object other) => other is MutableFoo && other.i == i;
  
  @override
  String toString() => 'MutableFoo($i)';
}

Gets you a set where its three elements all currently have identical hash codes and evaluate as equal.

@gaaclarke
Copy link
Contributor Author

I tried to implement the memoization in the user code and in my case isn't actually possible to implement because we don't allow late fields on const classes, filed an issue: dart-lang/language#2225

@gaaclarke
Copy link
Contributor Author

I tried to implement calculating the hashCode in the constructor and saving it and it isn't possible because Dart doesn't support any procedures in the initializers of const constructors. (Also accessing static const Maps is verboten which would have been an alternative for me).

@lrhn
Copy link
Member

lrhn commented May 5, 2022

The way to store values on const classes is an Expando. That's also the reason we can't assume even immutable objects being unchanging.

To cache the hash code, do:

class MyConst {
  final something;
  final other;
  const MyConst(this.something, this.other);
  
  int get hashCode => _hash[this] ??= Object.hash(something, other);
  bool operator==(Object other) => 
      other is MyConst && something = other.something && this.other = other.other;
  static final _hash = Expando<int>();
}

There are multiple ways this can break. First of all, you don't know the object is deeply immutable. You can do non-const instantiations of the class, like new MyConst(mutableSomething, mutableOther). That object may need to change its hash code if its members change.

Also, even a deeply immutable object can pretend to be mutable using an Expando.

class MyTrickyConst {
   final something;
   MyTrickyConst(this.something);

   static final Expando<Object> _other;
   Object? get other => _other[this];
   set other(Object? value) { other[this] = value; }
   
   int get hashCode => Object.hash(something, other);
   bool operator==(Object other) =>
       other is MyTrickyConst && something = other.something && this.other = other.other;
}

This object may be deeply immutable when created using const, but the language or compiler assuming that means the hash-code doesn't change would be unsound.

That means that event if the compiler could determine with 100% certainty that an object is created by a const operation, it can't know that the hashCode won't change. It has to know the entire object graph, and check that every relevant object's hashCode only uses "safe" functions to combine the hash codes of other objects with unchanging hash codes.
Very non-trivial analysis.

You can optimize your own hashCode. You cannot make assumptions about other classes, because the language features of Dart ensures that nothing can actually be assumed about an object's run-time behavior from just its static type.

copybara-service bot pushed a commit that referenced this issue May 5, 2022
The methods to add to hash maps and hash sets are recursive: if the
index needs to be rehashed then the same method is called again after
rehashing.

This CL nests the actual implementation in a private method that takes
the full hashCode as an extra argument.

No significant code size or run time changes are reported on our
benchmarks. (Our benchmarks do not contain purposefully slow hashCodes.)

Note that hashCode can be called again later if rehashing of the index
is required on adding subsequent elements.

Bug: #48948
Change-Id: Ia3ccff9e592d675b4922ac78c4aa7ee0287ecb50
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/243623
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
@gaaclarke
Copy link
Contributor Author

Closing this issue since the fix to double call to hashCode has landed and it has been shown that given how Dart works memoizing without direction from the user would not be possible.

copybara-service bot pushed a commit that referenced this issue May 6, 2022
This reverts commit 438c1ed.

Reason for revert: b/231617607 and b/230945329.
Will reland after b/230945329 is addressed.

Original change's description:
> [vm] Only call `.hashCode` once when adding to `Map` and `Set`
>
> The methods to add to hash maps and hash sets are recursive: if the
> index needs to be rehashed then the same method is called again after
> rehashing.
>
> This CL nests the actual implementation in a private method that takes
> the full hashCode as an extra argument.
>
> No significant code size or run time changes are reported on our
> benchmarks. (Our benchmarks do not contain purposefully slow hashCodes.)
>
> Note that hashCode can be called again later if rehashing of the index
> is required on adding subsequent elements.
>
> Bug: #48948
> Change-Id: Ia3ccff9e592d675b4922ac78c4aa7ee0287ecb50
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/243623
> Reviewed-by: Martin Kustermann <kustermann@google.com>
> Commit-Queue: Daco Harkes <dacoharkes@google.com>

TBR=kustermann@google.com,dacoharkes@google.com

Change-Id: I214251b65ea89e7f729564a125e226f2e6d580c0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: #48948
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/243900
Commit-Queue: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
@gaaclarke
Copy link
Contributor Author

@dcharkes the CL for only calling .hashCode once seems to have dropped our build benchmark 10%: https://flutter-flutter-perf.skia.org/e/?end=1651961508&queries=sub_result%3Dstock_build_iteration%26sub_result%3Dstock_build_iteration_probability_5pct

Nice work. I'm not sure if anything else in that roll could have contributed also to that drop.

@dcharkes
Copy link
Contributor

@gaaclarke we reverted the day after because it made b/230945329 much more likely to trigger, and will reland as soon as that bug is addressed. So if the build-time stayed down it wasn't this bug (unless no rolls happened that include the revert).

@dcharkes dcharkes reopened this May 11, 2022
@mraleph
Copy link
Member

mraleph commented May 11, 2022

As I have mentioned in the chat the 10% improvement is more likely coming from d856e05

copybara-service bot pushed a commit that referenced this issue Jun 17, 2022
Relanding after https://dart-review.googlesource.com/c/sdk/+/244200.

Original change's description:
> [vm] Only call `.hashCode` once when adding to `Map` and `Set`
>
> The methods to add to hash maps and hash sets are recursive: if the
> index needs to be rehashed then the same method is called again
> after rehashing.
>
> This CL nests the actual implementation in a private method that takes
> the full hashCode as an extra argument.
>
> No significant code size or run time changes are reported on our
> benchmarks. (Our benchmarks do not contain purposefully slow hashCodes.)
>
> Note that hashCode can be called again later if rehashing of the index
> is required on adding subsequent elements.
>
> Bug: #48948
> Change-Id: Ia3ccff9e592d675b4922ac78c4aa7ee0287ecb50
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/243623
> Reviewed-by: Martin Kustermann <kustermann@google.com>
> Commit-Queue: Daco Harkes <dacoharkes@google.com>


Change-Id: I033bd7cc29fc812dccb6dccf0c3dca6e22cae2ca
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/248802
Commit-Queue: Tess Strickland <sstrickl@google.com>
Auto-Submit: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Tess Strickland <sstrickl@google.com>
@dcharkes
Copy link
Contributor

This has landed.

This speeds up map copying ~5%-15% in various configurations.

Thanks for reporting @gaaclarke!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

6 participants