Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RBTree with static leaf #543

Open
timohanke opened this issue Mar 12, 2023 · 9 comments
Open

Improve RBTree with static leaf #543

timohanke opened this issue Mar 12, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@timohanke
Copy link

The current implementation seems to be creating a lot of individual leafs on the heap that are structurally all the same (#leaf). There could be one static let leaf_ = #leaf; in the module and nodes could be created as #node (c, leaf_, k, v, leaf_) instead of #node (c, #leaf, k, v, #leaf), saving a lot of memory.

@nomeata
Copy link
Contributor

nomeata commented Mar 12, 2023

Are you sure that this is not something the compiler already optimizes way? I have a hunch that it could, and if it doesn’t, that it should :-)

@timohanke
Copy link
Author

It doesn't seem so. According to my measurements it saves 24 bytes per node of that form (each #leaf costs 12 bytes and it doesn't get optimized away).

@timohanke
Copy link
Author

timohanke commented Mar 12, 2023

This other, more efficient representation that is mentioned in the comments of RBTree.mo would benefit just the same:

// TODO: a faster, more compact and less indirect representation would be:
// type Tree<K, V> = {
//  #red : (Tree<K, V>, K, V, Tree<K, V>);
//  #black : (Tree<K, V>, K, V, Tree<K, V>);
//  #leaf
//};

@ggreif
Copy link
Contributor

ggreif commented Mar 12, 2023

As Joachim says, singleton typed objects (such as #leaf ()) should end up on the static heap and not on the dynamic one. Can you wasm2wat your file and paste here a short function (or snippet) that allocates a #leaf dynamically?

@timohanke
Copy link
Author

I am running

actor {
    type Color = { #R; #B };

    type Tree = {
      #node : (Color, Tree, Nat, Tree);
      #leaf
    };

    let l : Tree = #leaf;
    let a = Prim.rts_heap_size();
    let t1 : Tree = #node (#R, #leaf, 0, #leaf);
    let b = Prim.rts_heap_size();
    let t2 : Tree = #node (#R, l, 0, l);
    let c = Prim.rts_heap_size();

    Debug.print("Tree1 : " # debug_show (b - a));
    Debug.print("Tree2 : " # debug_show (c - b));
   }

with output

[Canister uwugh-uaaaa-aaaaa-aaa5q-cai] Tree1 : 72
[Canister uwugh-uaaaa-aaaaa-aaa5q-cai] Tree2 : 48

@ggreif
Copy link
Contributor

ggreif commented Mar 12, 2023

Bug is confirmed:
For this program

$ cat tree.mo
type Tree = {#leaf; #node : (Tree, Tree) };

func leaf() : Tree = #leaf;

func node(l : Tree, r : Tree) : Tree = #node (l, r);

ignore node(leaf(), leaf());

I get allocations for #leaf :-(

  (func $leaf (type 16) (param $clos i32) (result i32)
    (local $heap_object i32)
    i32.const 3
    call 142
    local.tee $heap_object
    i32.const 15
    i32.store offset=1
    local.get $heap_object
    i32.const 1202717598
    i32.store offset=5
    local.get $heap_object
    i32.const 0
    i32.store offset=9
    local.get $heap_object)

@ggreif ggreif added the bug Something isn't working label Mar 12, 2023
@nomeata
Copy link
Contributor

nomeata commented Mar 12, 2023

Can you open an issue in the motoko repo?
ir_passes/const.ml misses a case for PrimE (TagPrim, es) and then in codegen/compile.ml the module Const and compile_const_exp needs to learn about constant variants.

I don’t recall a reason why we don't have it already, probably just oversight.

Maybe do OptPrim while we are at it. Do you want to do it, or should I do it once I get around to it?

@ggreif
Copy link
Contributor

ggreif commented Mar 12, 2023

I am working on a patch. See dfinity/motoko#3878.

@timohanke
Copy link
Author

A compiler patch will fix the same for the Color as #R and #B will become static, too.

After the patch, what will be the most compact representation for a node in RBTree?

Ignoring the K-V pair, we have (current type)

 type Color = { #R; #B };

type Tree = {
   #node : (Color, Tree, Tree);
   #leaf
};

at 32 bytes per #node.

Proposed type (as per comment in RBTree.mo)

type Tree = {
  #red : (Tree, Tree);
  #black : (Tree, Tree);
  #leaf
};

at 28 bytes per #red/#black.

But the most compact would be

type Tree = (Color, ?Tree, ?Tree);

at 20 bytes.

mergify bot pushed a commit to dfinity/motoko that referenced this issue Mar 12, 2023
Variants with constant payloads should be constant and end up in the static heap. This will speed up all tree-like data structures that have unit-payload leaves (not to speak of allocation wins!). See dfinity/motoko-base#543 for motivation.

For below program
``` Motoko
import P = "mo:⛔️";

type Tree = { #leaf; #node : (Tree, Tree) };

func leaf() : (Tree, (Nat, Nat)) = (#leaf, (5, 42));

P.debugPrint(debug_show leaf());
```
the function `leaf()` now looks like
``` wasm
  (func $leaf (type 5) (param $clos i32)
    i32.const 2097267
    i32.const 2097279
    global.set 18
    global.set 17)
```
looks as expected. It also runs happily
```
$ wasmtime tree.wasm 
(#leaf, (5, 42))
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants