Improved bytecode fusion algorithm by CharlieTap · Pull Request #117 · CharlieTap/chasm

CharlieTap · 2026-03-09T19:27:03Z

This PR contains a large architectural rewrite of chasms runtime and compiler, it improves its execution speed roughly 2x on the industry standard coremark benchmark. I had long considered making chasm more performant a dead end, and that we are fundamentally limited by the type of interpretation which can be expressed in java bytecode. Last year I implemented a fusion algorithm that bumped the speed roughly 30%, but what I didn't realise is that this algorithm was suboptimal in a bunch of ways. Long story short we were fetching operands via a lambda to avoid the explosion of monomorphised instruction handlers, this was a decision I made at the time because it limited the scope and I generally assumed not a lot of performance had been left on the table.

In the last 6 months I have been working on a new runtime, written in a low level language that has some very impressive performance characteristics. This work is inspired by things I have learnt there and also from another open source runtime called stitch, check it out its very clever!

TLDR

We now create super instructions with the operands i and s, i meaning intermediate which are constants that are folded into the instruction, s being stack but notably stack with a slot index. Chasm has always stored locals on the stack and what this stack with index approach does is allow us to fuse both locals and normal stack operands using the same handler. Its a very clever trick which keeps code size/instruction cache under control. At compile time we calculate the the entire size needed for the call frame and the indicies for every instructions so there is no runtime overhead.

Further to this we also now lower wasms control flow instructions from a nested syntax representation to true jumps, this is something I started last year but it never showed the performance characteristics I needed. It transpires coupled with some other improvements it was meaningful. That being said theres still some performance on the table as we are maintaining an instruction stack for now, this can be removed in the future and changed to a real ip/pc and instruction pointer model.

Anyways, enjoy twice the performance. Shout out to GPT 5.4 which actually solved some very tricky compiler issues I ran intro, its remarkable what a good harness can do for these agents

new fusion algorithm

04851c3

CharlieTap merged commit 0c37acb into main Mar 9, 2026
9 of 16 checks passed

CharlieTap deleted the monomorph branch March 9, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved bytecode fusion algorithm#117

Improved bytecode fusion algorithm#117
CharlieTap merged 1 commit intomainfrom
monomorph

CharlieTap commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CharlieTap commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant