Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush unused blocks. #32

Closed
ghost opened this issue Dec 23, 2015 · 15 comments
Closed

Flush unused blocks. #32

ghost opened this issue Dec 23, 2015 · 15 comments

Comments

@ghost
Copy link

ghost commented Dec 23, 2015

Seeing many unused blocks, not just the $topmost, produced via asm2wasm.

@kripken
Copy link
Member

kripken commented Dec 24, 2015

Any example? Looking at the test outputs in test/*.wast, nothing obvious pops out.

@ghost
Copy link
Author

ghost commented Dec 24, 2015

From the zlib benchmark:

Unnecessary block:

  (func $establishStackSpace (param $i1 i32) (param $i2 i32)
    (block
      (i32.store align=4
        (i32.const 8)
        (get_local $i1)
      )
      (i32.store align=4
        (i32.const 16)
        (get_local $i2)
      )
    )
  )

Unnecessary block $topmost, and unused block label.

  (func $b0 (param $i1 i32) (result i32)
    (block $topmost
      (call_import $abort
        (i32.const 0)
      )
      (i32.const 0)
    )
  )

This one could be eliminated:

(func $___unlockfile (param $i1 i32)
    (block $topmost
      (br $topmost)
    )
  )

Two unnecessary block $topmost use:

(func $_bitshift64Ashr (param $i1 i32) (param $i2 i32) (param $i3 i32) (result i32)
    (block $topmost
      (if
        (i32.lt_s
          (get_local $i3)
          (i32.const 32)
        )
        (block
          (i32.store align=4
            (i32.const 168)
            (i32.shr_s
              (get_local $i2)
              (get_local $i3)
            )
          )
          (br $topmost
            (i32.or
              (i32.shr_u
                (get_local $i1)
                (get_local $i3)
              )
              (i32.shl
                (i32.and
                  (get_local $i2)
                  (i32.sub
                    (i32.shl
                      (i32.const 1)
                      (get_local $i3)
                    )
                    (i32.const 1)
                  )
                )
                (i32.sub
                  (i32.const 32)
                  (get_local $i3)
                )
              )
            )
          )
        )
      )
      (i32.store align=4
        (i32.const 168)
        (if_else
          (i32.lt_s
            (get_local $i2)
            (i32.const 0)
          )
          (i32.const -1)
          (i32.const 0)
        )
      )
      (i32.shr_s
        (get_local $i2)
        (i32.sub
          (get_local $i3)
          (i32.const 32)
        )
      )
    )
  )

Two unnecessary blocks:

  (func $___muldi3 (param $i1 i32) (param $i2 i32) (param $i3 i32) (param $i4 i32) (result i32)
    (local $i5 i32)
    (local $i6 i32)
    (block $topmost
      (set_local $i5
        (call $___muldsi3
          (get_local $i1)
          (get_local $i3)
        )
      )
      (set_local $i6
        (i32.load align=4
          (i32.const 168)
        )
      )
      (block
        (i32.store align=4
          (i32.const 168)
          (i32.or
            (i32.add
              (i32.add
                (i32.mul
                  (get_local $i2)
                  (get_local $i3)
                )
                (i32.mul
                  (get_local $i4)
                  (get_local $i1)
                )
              )
              (get_local $i6)
            )
            (i32.and
              (get_local $i6)
              (i32.const 0)
            )
          )
        )
        (get_local $i5)
      )
    )
  )

@kripken
Copy link
Member

kripken commented Jan 4, 2016

Thanks for the details. Notes:

  • That first block is there because binaryen functions have just a single child Expression for the body, not a list of expressions. So it has to be a block if there is more than one expression. If the .s format allows a list, we could avoid printing the block, I suppose, as an optimization.
  • We emit a topmost block when there is a return value. In the second example, there is a return value but not a branch, so we could remove the name of the block (but not the block itself because of the previous point). Perhaps we could make a pass that finds blocks with names that are never branched to, and un-names them.
  • In unlockfile, we have a return without a value. I can't think of a trivial way to eliminate that one, we would need to see that control flow goes to the right place anyhow, and get rid of the branch; then we can get rid of the block name, and then the entire block. Might need multiple passes here.
  • Looks like we can also "unfold" blocks in the last examples, a block is already a list so we can just embed a child block's list in it. Seems like a simple pass.

@ghost
Copy link
Author

ghost commented Jan 4, 2016

Note sure about your first point. Functions seem to accept a list of expressions so the top block seems unnecessary (or perhaps this is just a difference between v8 and the current spec)?

Explored some clean up passes (but just hacks in lisp):

  1. Firstly removed unused block labels. Walk the code, maintaining a stack of blocks with their labels and a use-counter. At exit of the block scope, flush the label is unused. Special case a single trailing break out of the block and remove it and the label.
  2. Flatten the blocks within contexts that already accept a list of blocks. Walk the code noting blocks and loops and lift un-labeled blocks into the parent list of expressions.

This seems to make the code a little more readable, and does reduce the compressed size. Also experimenting with when and unless for the common `if-block' patterns, which also improves readability and size, and removes some more explicit blocks.

@kripken
Copy link
Member

kripken commented Jan 4, 2016

Sorry if I wasn't clear. In the first point, I was saying that the binaryen AST is perhaps not identical to the spec. I discussed this in issues on the spec repo, and in the end it wasn't clear that there was a "best" way to represent the AST. For example, the binaryen Function doesn't directly have a list, instead they have a single child, which often is a block (which does have a list). That's why there are some of those blocks.

I'm interested in those experiments. It should be fairly easy to do them in binaryen, at least I've tried to make it easy. Basically you can write a pass in src/passes/, and then run bin/binaryen-shell and tell it to run that pass. I can document this better if that would help.

@kripken
Copy link
Member

kripken commented Jan 5, 2016

I wrote RemoveUnusedNames, which removes the name of toplevel (and other blocks) when never branched to.

@kripken
Copy link
Member

kripken commented Jan 5, 2016

(that's in 15043f7 and 4abf3e0)

@ghost
Copy link
Author

ghost commented Jan 5, 2016

Thank you for making a start. There is a little more to it. Need to keep a stack of the block labels to be able to account for those in scope. Fwiw, here are the CL implementations:

(defun remove-unused-block-labels (ast)
  "Remove named blocks that are not referenced. Destructively modifies
  the 'ast."
  (flet ((walk (name args results locals body)
           (declare (ignore name args results locals))
           (let ((blocks nil))
             (labels ((remove-unused-block-labels* (ast)
                        (cond ((not (consp ast)))
                              ((and (eq (first ast) 'block) (symbolp (second ast)))
                               (let* ((label (second ast))
                                      (binding (cons label 0)))
                                 (push binding blocks)
                                 (dolist (expression (rest (rest ast)))
                                   (remove-unused-block-labels* expression))
                                 ;; Check that that 'blocks stack has been restored.
                                 (assert (eq (first blocks) binding))
                                 (cond ((zerop (cdr binding))
                                        ;; Remove the label from the block.
                                        (setf (rest ast) (rest (rest ast))))
                                       ((= (cdr binding) 1)
                                        (let* ((last (last ast))
                                               (last-expression (first last)))
                                          (when (and (consp last-expression)
                                                     (eq (first last-expression) 'br)
                                                     (eq (second last-expression) label))
                                            ;; Block ending is a br out of the block, so eliminate the label.
                                            (cond ((cddr last-expression)
                                                   ;; A 'br' with an expression. Expect just one value.
                                                   (assert (not (cdddr last-expression)))
                                                   (setf (first last) (third last-expression)))
                                                  (t
                                                   ;; No expression, insert a nop?
                                                   (setf (first last) `(nop))))
                                            (setf (rest ast) (rest (rest ast)))))))
                                 (pop blocks)))
                              ((member (first ast) '(br br_if))
                               (let ((label (if (eq (first ast) 'br_if) (third ast) (second ast))))
                                 (assert (symbolp label))
                                 (let ((binding (assoc label blocks)))
                                   (when binding
                                     ;; Found a reference to a block label within scope.
                                     (incf (cdr binding))))))
                              (t
                               ;; Some other form - walk the elements.
                               (dolist (el ast)
                                 (remove-unused-block-labels* el)))
                              )))
               (remove-unused-block-labels* body)))))
    (map-wasm-functions #'walk ast))
  ast)

(defun flatten-blocks (ast)
  "Flatten unnecessary blocks, blocks with no label and used in a context that
  already permits a list of expressions. Destructively modifies the 'ast."
  (flet ((walk (name args results locals body)
           (declare (ignore name args results locals))
           (labels ((flatten (bexp)
                      ;; Flatten child block into the 'bexp list of block expressions.
                      ;; Expecting a cons to be updated if necessary.
                      (assert (consp bexp))
                      (do ((bexp bexp (rest bexp)))
                          ((endp bexp))
                        (let ((first (first bexp)))
                          (when (consp first)
                            (walk first)
                            (when (and (eq (first first) 'block)
                                       ;; If not a br target then flatten.
                                       (not (symbolp (second first))))
                              ;; Insert it into the current block expression list.
                              (setf (first bexp) (second first))
                              (setf (rest (last first)) (rest bexp))
                              (setf (rest bexp) (rest (rest first))))))))
                    (walk (ast)
                      (when (consp ast)
                        (let ((first (first ast)))
                          (cond ((eq first 'block)
                                 (cond ((symbolp (second ast))
                                          (flatten (rest (rest ast))))
                                       (t
                                        (flatten (rest ast)))))
                                ((member first '(let))
                                 (flatten (rest (rest ast))))
                                ((member first '(when unless))
                                 (walk (second ast))
                                 (flatten (rest (rest ast))))
                                ((member first '(loop))
                                 (assert (symbolp (second ast)))
                                 (assert (symbolp (third ast)))
                                 (flatten (rest (rest (rest ast)))))
                                (t
                                 (dolist (el ast)
                                   (walk el)))))))
                    )
             (flatten body))))
    (map-wasm-functions #'walk ast))
  ast)

kripken added a commit that referenced this issue Jan 5, 2016
kripken added a commit that referenced this issue Jan 5, 2016
@kripken
Copy link
Member

kripken commented Jan 6, 2016

I pushed several more commits that should address most of the points raised here. I'd appreciate it if you could take a look at the current output and see if anything looks easily optimizable.

@ghost
Copy link
Author

ghost commented Jan 6, 2016

Thank you.

  1. There is the case of a trailing br in a block targeting the same block. Where there is no br result expression can we just replace it with nop. For example:

    (func $_zcfree (param $i1 i32) (param $i2 i32)
      (block $topmost
        (call $_free
          (get_local $i2)
        )
        (br $topmost)
      )
    )
    
  2. Opportunity missed here:

  (func $_i64Subtract (param $i1 i32) (param $i2 i32) (param $i3 i32) (param $i4 i32) (result i32)
    (local $i5 i32)
    (set_local $i5
      (i32.sub
        (get_local $i2)
        (get_local $i4)
      )
    )
    (set_local $i5
      (i32.sub
        (i32.sub
          (get_local $i2)
          (get_local $i4)
        )
        (i32.gt_u
          (get_local $i3)
          (get_local $i1)
        )
      )
    )
    (block
      (i32.store align=4
        (i32.const 168)
        (get_local $i5)
      )
      (i32.sub
        (get_local $i1)
        (get_local $i3)
      )
    )
  )
  1. Block in a loop.
      (loop $label$break$L17 $label$continue$L17
        (block
          (block $label$break$L19
          (block $label$break$L19
            (tableswitch $switch$0
  1. Block in a case (just noticed this one, and it's not handled in my code yet either). Also a dead br here and could this be emitted to use the switch break label.
              (case $switch-case$1
                (block
                  (set_local $i16
                    (get_local $i14)
                  )
...
                  (br $label$break$L17)
                  (br $switch$0)
                )

@ghost
Copy link
Author

ghost commented Jan 6, 2016

Updated flatten-blocks to better handle loop and case.

Added the canonicalize-block-targets pass. Here's an example:

(block $a (exp) (block $b (exp (br $b)) (br $b)))
=>
(block $a (exp) (block $b (exp (br $a)) (nop)))

The other passes can then clean this up to:

(block $a (exp) (exp (br $a)) (nop))

Practically only the elimination of the redundant trailing breaks seems useful on the asm2wasm output.

Here's a real example. The (br $do-once$0) is eliminated as it falls through to the target anyway, and this allows the other passes to optimize away its target block.

     (block $topmost
       (block $do-once$0
         (if (i32.eq (i32.load (i32.const 9500)) (i8.const 0))
             (block (set_local $i1 (call_function $_sysconf (i8.const 30)))
               (if_else
                (i32.eq
                 (i32.and (i32.add (get_local $i1) (i8.const -1))
                  (get_local $i1))
                 (i8.const 0))
                (block (i32.store (i32.const 9508) (get_local $i1))
                  (i32.store (i32.const 9504) (get_local $i1))
                  (i32.store (i32.const 9512) (i8.const -1))
                  (i32.store (i32.const 9516) (i8.const -1))
                  (i32.store (i32.const 9520) (i8.const 0))
                  (i32.store (i32.const 9472) (i8.const 0))
                  (i32.store (i32.const 9500)
                   (i32.xor
                    (i32.and (call_function $_time (i8.const 0))
                     (i8.const -16))
                    (i32.const 1431655768)))
                  (br $do-once$0)) ; <<<<<
                (call_function $_abort)))))
       (br $topmost))))

@kripken
Copy link
Member

kripken commented Jan 6, 2016

How did you test - perhaps you did not run the binaryen optimization passes? On your example (1), when I run

bin/binaryen-shell -print-after 1.wast -remove-unused-brs -remove-unused-names -merge-blocks

Then I get

  (func $_zcfree (param $i1 i32) (param $i2 i32)
    (call $_free
      (get_local $i2)
    )
  )

Those three passes are run automatically after asm2wasm, so if you tested using that, it should also have worked.

If it still doesn't work for you, can you provide the full input and command you are running, and not just the bad output?

@ghost
Copy link
Author

ghost commented Jan 6, 2016

Thank you. Yes, I was not running it with all the passes. Looking again, there are still opportunities to flatten blocks within loop and case. Also do we need to insert a nop when removing the br $a in (block $a (exp) (br $a)) so that the block still returns the same values rather than the values from (exp)?

  1. Block in loop not flattened.
      (loop $label$break$L17 $label$continue$L17
        (block
          (block $label$break$L19
            (tableswitch $switch$0
  1. Block in case not flattened
        (case $switch-case$1
          (block (set_local $i7 (i8.const 4)) (br $switch$0)))

There still remain some redundant br uses in block sub-expressions that could be removed, and sometimes removing these makes labels unused allowing a block to be flattened, but perhaps these could be addressed in a separate issue if at all.

@kripken
Copy link
Member

kripken commented Jan 6, 2016

It's not always safe to merge a named block into a parent unnamed block, since it moves the location of the branch target. I don't see enough in your first example to tell if that's the case there or not.

For switch and case, I am not planning to optimize them yet since i think the spec might still change for them. But yes, that should eventually be optimized.

@ghost
Copy link
Author

ghost commented Jan 7, 2016

It looks like this block was not flatten only because they are not optimized. Thank you.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant