`~lib/array/Array#push` is unnecessarily garbage-y #1798

surma · 2021-04-11T13:13:32Z

In a benchmark I discovered that AssemblyScript is significantly slower than JavaScript when using --runtime incremental when filling Arrays.

Language	Runtime	Average	vs JS
JavaScript		233.68ms	1.0x
AssemblyScript	stub	345.35ms	1.5x
AssemblyScript	minimal	354.60ms	1.5x
AssemblyScript	incremental	18758.50ms	80.3x

A d8 trace revealed the following profile:

[Bottom up (heavy) profile]:
  ticks parent  name
  18670   96.1%  /usr/lib/system/libsystem_platform.dylib
  13530   72.5%    Function: *~lib/rt/itcms/__renew
  13530  100.0%      Function: *~lib/array/ensureSize
  13530  100.0%        Function: *~lib/array/Array#push
  13530  100.0%          Function: *binaryheap_optimized/BinaryHeap#push
  13530  100.0%            Function: *binaryheap_optimized/push
   5119   27.4%    Function: *~lib/rt/itcms/__new
   5119  100.0%      Function: *~lib/rt/itcms/__renew
   5119  100.0%        Function: *~lib/array/ensureSize
   5119  100.0%          Function: *~lib/array/Array#push
   5119  100.0%            Function: *binaryheap_optimized/BinaryHeap#push

which led me to investigate the Array#push() implementation, which callsensureSize(). ensureSize() will create a new buffer with exactly one slot more if the previous buffer was full. This means that once you ran out of capacity once, every subsequent call to push() will create a new buffer copy all the data to the new buffer and create work for the garbage collector as the old buffer is now collectible.

Both Rust and Go amortize that work by doubling the capacity every time the buffer is exhausted. I’m sure C++ does the same but I couldn’t get through the stdlib code 😅

In an experiment, replacing the ~lib/array/Array with a CustomArray<T> that has exponential buffer growth, the performance problems go away:

Language	Variant	Runtime	Average	vs JS
JavaScript			233.68ms	1.0x
AssemblyScript	customarray	stub	329.43ms	1.4x
AssemblyScript	customarray	minimal	329.23ms	1.4x
AssemblyScript	customarray	incremental	335.35ms	1.4x
AssemblyScript	~lib	stub	345.35ms	1.5x
AssemblyScript	~lib	minimal	354.60ms	1.5x
AssemblyScript	~lib	incremental	18758.50ms	80.3x

I would like to PR in a fix, but wanted to check first if there are any concerns about adopting a similar approach for ASC’s Array<T>?

The text was updated successfully, but these errors were encountered:

MaxGraey · 2021-04-11T13:38:35Z

That's well known problem with incremental runtime and constantly grown array (via serial of push for example).
cc @dcodeIO

dcodeIO · 2021-04-11T17:33:15Z

That's indeed odd, yeah. Looking at the code again, ensureSize kinda relies on __renew to pre-alloc to the next FL, but __realloc actually only pre-allocs to the next SL, leading to a ton of unnecessary allocations. Need to check in more detail, but that may well have been wrong for a long time already, even predating the GC rework.

surma · 2021-04-14T21:52:12Z

I don’t know what FL or SL are 🙈

What I don’t think either of you have answered: Are you opposed to adjusting the Array<T> impl. to grow exponentially? One of the V8 folks told me that it doesn’t have to be 2x, just a factor >1 of the previous capacity. It seems to be fairly common, even recommended practice.

dcodeIO · 2021-04-14T22:00:31Z

It's a regression that it currently doesn't do this (anymore, not sure when exactly I messed this up), so absolutely, go for it. If we can tweak the factor a bit, the better. The only place where we may not want arrays to grow like this is when setting Array#length I think, which also uses ensureSize but there it doesn't make sense to grow by a factor.

One nice-to-have aspect there would be to also take block size of TLSF into account, so instead of growing unconditionally when length is exceeded, arrays would only need to grow when the TLSF block cannot be extended anymore. For instance, when a block's payload is just one byte large as would be the case for an array of just one i8 element, it can grow up to a certain limit (by just modifying the block) until the block is actually exhausted instead of reallocating.

FL and SL are the first and second level lists of TLSF, indicating block size. The payload size of a block is somewhere in between where the previous list would end and the next list would start, so that's where we can just modify the block to consider a larger payload while not needing to alloc a new one. Current size of the block is in mmInfo & ~3.

MaxGraey · 2021-04-15T08:53:29Z

Could be just easily fix from:

var newLength = length + 1;

to

var newLength = min(16, length << 1);

here:
https://github.com/AssemblyScript/assemblyscript/blob/master/std/assembly/array.ts#L206

surma · 2021-04-15T09:48:26Z

Is it that simple? 😅 I think that would still cause a reallocation on every push, just with a bigger buffer size so that half of it is unused.

carlopi · 2021-04-16T13:03:55Z

Something like:

function biggerPowerOf2(N)
{
    N = N - 1;
    N |= N >> 1;
    N |= N >> 2;
    N |= N >> 4;
    N |= N >> 8;
    N |= N >> 16;
    N |= N >> 32;  //redundant already?
    return N+1;
}
**** as noted by @MaxGraey: right shifts!

ensures (given an unsigned integer) a power of 2 bigger or equal of the starting value is returned.
Then it's just a matter of calling
usize = biggerPowerOf2(usize); inside ensureSize to force the dimension to be a power of 2.
Not sure what an equivalent bit-twiddling trick could be for choosing arbitrary multipliers, but for starters 2x seems a safe solution.

Note that calling only on entrance (=push codepath) solves the problem only partially, since also the [] path has to be checked, and seems more solid to do inside ensureSize.

MaxGraey · 2021-04-16T13:08:50Z

if you need nextPowOf2 it could be implemented as n == 0 ? 0 : 1 << (32 - clz(n - 1)). Also biggerPowerOf2 should have right shifts instead left shifts.

Problem with growing to next pow of 2 is much more significant non-used memory footprint

carlopi · 2021-04-16T13:20:50Z

Thanks for pointing out the mistake.
I imagine there is plenty of literature to be researched, but given a primitive to find the highest set bit there are plenty of constant time solutions to grow less than 2x.
Eg.

find highest set bit (calling this number N).
either go to N * 2 or to N * 1.414 (by checking with a single comparison against N * 1.414).
Here the multiplier becomes sqrt(2) = 1.414, and with a similar trick smaller multiplier could be used.

MaxGraey · 2021-04-16T13:21:07Z

I guess we could test / bench 3 solutions:

glowing by factor 2
glowing by next power of 2
hybrid approach. Use (2) until 2 ** 16 and use (1) which more economic after this threshold

dcodeIO · 2021-04-16T13:33:52Z

Note that the bit math above (relying on clz) is about what the memory manager is already doing for FLs and SLs, so that's where my suggestion comes in to build something that cooperates with the MM :)

assemblyscript/std/assembly/rt/tlsf.ts

Lines 5 to 9 in 7e20ad2

    
           // === The TLSF (Two-Level Segregate Fit) memory allocator === 
        
           // see: http://www.gii.upv.es/tlsf/ 
        
           // - `ffs(x)` is equivalent to `ctz(x)` with x != 0 
        
           // - `fls(x)` is equivalent to `sizeof(x) * 8 - clz(x) - 1`

MaxGraey · 2021-04-16T14:54:49Z

I figured out that nextPowOf2(n + 1) and n << 1 is the same if n0 (initial value) already have power of two value like 16. So much more interesting use exponential glow + grow by factor of 2. Something like this:

function log2(n: u32): u32 {
  return 31 - clz(n | 1);
}

function computeGrowFactor(len: u32): u32 {
  if (len < (1 << 16)) {
    return len * (1 + log2(len));
  } else {
    return min(len << 1, 1 << 30);
  }
}

In this case it takes only 4 grows (reallocs) between [0, 65536] and 15 grows between [0, 67108864]

MaxGraey · 2021-04-21T12:46:28Z

It seems I found the best empirical expansion (logarithmic):

function computeGrowFactor(len: u32): u32 {
  return len * (1 << (2 - ((log2(len - 8) - 8) >>> 4)));
}

which produce this series - (length value, grow factor):

8 8
64 8
512 8
2048 4
8192 4
32768 4
131072 4
524288 4
2097152 4
8388608 4
33554432 4
67108864 2
134217728 2
268435456 2
536870912 2
1073741824 2
2147483648 2

MaxGraey added enhancement performance labels Apr 11, 2021

MaxGraey mentioned this issue Apr 22, 2021

Adding changes to readAll so that it reads the file in one jedisct1/as-wasi#109

Closed

MaxGraey mentioned this issue Apr 30, 2021

fix: Fix growing factor for capacity in Array #1841

Merged

1 task

MaxGraey closed this as completed in #1841 May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`~lib/array/Array#push` is unnecessarily garbage-y #1798

`~lib/array/Array#push` is unnecessarily garbage-y #1798

surma commented Apr 11, 2021 •

edited

Loading

MaxGraey commented Apr 11, 2021

dcodeIO commented Apr 11, 2021

surma commented Apr 14, 2021

dcodeIO commented Apr 14, 2021 •

edited

Loading

MaxGraey commented Apr 15, 2021 •

edited

Loading

surma commented Apr 15, 2021

carlopi commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 16, 2021 •

edited

Loading

carlopi commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 16, 2021

dcodeIO commented Apr 16, 2021

MaxGraey commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 21, 2021 •

edited

Loading

~lib/array/Array#push is unnecessarily garbage-y #1798

~lib/array/Array#push is unnecessarily garbage-y #1798

Comments

surma commented Apr 11, 2021 • edited Loading

MaxGraey commented Apr 11, 2021

dcodeIO commented Apr 11, 2021

surma commented Apr 14, 2021

dcodeIO commented Apr 14, 2021 • edited Loading

MaxGraey commented Apr 15, 2021 • edited Loading

surma commented Apr 15, 2021

carlopi commented Apr 16, 2021 • edited Loading

MaxGraey commented Apr 16, 2021 • edited Loading

carlopi commented Apr 16, 2021 • edited Loading

MaxGraey commented Apr 16, 2021

dcodeIO commented Apr 16, 2021

MaxGraey commented Apr 16, 2021 • edited Loading

MaxGraey commented Apr 21, 2021 • edited Loading

`~lib/array/Array#push` is unnecessarily garbage-y #1798

`~lib/array/Array#push` is unnecessarily garbage-y #1798

surma commented Apr 11, 2021 •

edited

Loading

dcodeIO commented Apr 14, 2021 •

edited

Loading

MaxGraey commented Apr 15, 2021 •

edited

Loading

carlopi commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 16, 2021 •

edited

Loading

carlopi commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 16, 2021 •

edited

Loading

MaxGraey commented Apr 21, 2021 •

edited

Loading