-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM: _CompactLinkedHashSet and _InternalLinkedHashMap are too big for small collections #26081
Comments
We can definitely make them a bit more space-efficient (or even better less hashCode calling) for small sets. It is something we have discussed in the past, but have not felt the urgent need due to the Setlet and Maplet use in dart2js for example. The resulting implementation might not be as efficient as Setlet, but definitely better than currently. @rakudrama I am assuming your quoted numbers are on a x64 build, correct? |
Yes, these are x64 measurements. |
Maybe change the initial capacity to 4 instead of 8. |
This is still an issue. My latest experience is debugging a 'huge heap' issue where I have 3.1M objects each with a Map. Most are singleton maps. Each empty or singleton Map is 320 bytes or 6x the size of the owning object. Sometimes it is possible to use a nullable field with the convention ' We could use the Can the default implementations be better?
I tried making the default capacity be 2 elements and it seems to improve the dart2js compile MemoryUse benchmarks by about 4% (2%-10%). At this setting, the VM JSON parse benchmarks are slower, presumably because they grow a Map and do rehashing. There is a TODO to avoid rehashing by reusing the existing hash bits in the index, so perhaps the initial growing could be much faster. I would expect that the large size of empty and small maps is something worth improving for mobile apps. |
I have just been looking at the dart2js heap and rediscovered the above data. In my large program compile scenario, the heap is reduced by 4% with an initial capacity of 2 elements. The mean length of a There are many singleton maps, so an initial non-empty capacity of 1 would save a little more storage (I tried this but the code does not work - it probably expects the mask for the data index to not be zero-width, which could probably be fixed with careful recoding). There are also a few huge maps (> 1M entries) where the size is only a little over some 2k. |
Accessing 'all instances' programatically in Observatory, I can confirm 76% of Maps have two or fewer elements, 85% have four or fewer.
FYI @rmacnak-google |
Empty maps are fairly common; delaying allocation of the backing store saves time and memory for empty maps. Non-empty maps probe an extra time for the first insert. Most maps also have few associations, so reducing the initial backing store size also saves memory on balance. The best value for dart2js would be 2 associations, but this CL changes it to 4 as a compromise with other benchmarks on Golem. Runtime as Score geomean 2.620% MemoryUse geomean -5.233% dart2js CompileSwarmLatest 0% dart2js CompileSwarmLastedMemoryUse -10.51% TEST=ci Bug: #26081 Change-Id: I80a925f698f3df44fae5e97e1804c8dff2ce0c60 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/176583 Reviewed-by: Alexander Markov <alexmarkov@google.com> Reviewed-by: Stephen Adams <sra@google.com> Commit-Queue: Ryan Macnak <rmacnak@google.com>
_CompactLinkedHashSet (default Set implmentation) and _InternalLinkedHashMap (default Map) have good asymptotic storage efficiency as the size of the collection grows. They are always better than conveniently available alternatives above 10 elements and are designed for better locality.
For small collections they have poor storage efficiency. In dart2js there are many small sets and maps, so this is noticeable. dart2js uses two classes,
Setlet
andMaplet
because the poor storage efficiency.It would be nice if this use of specialized implementations was not necessary.
This table shows the size in bytes for various kinds of Set:
Maps show a similar pattern, just a bit bigger. The empty map {} is 304 bytes.
Can the VM-provided implementations be more efficient in the common case of a small collections?
I tried switching from Setlet to Set. One thing surprised me: more calls to get:hashCode. Setlet can add an element to an empty set without calling get:hashCode. Perhaps special handling for small numbers of elements could also have this efficiency.
The text was updated successfully, but these errors were encountered: