Skip to content

Improved bytecode fusion algorithm#117

Merged
CharlieTap merged 1 commit intomainfrom
monomorph
Mar 9, 2026
Merged

Improved bytecode fusion algorithm#117
CharlieTap merged 1 commit intomainfrom
monomorph

Conversation

@CharlieTap
Copy link
Owner

This PR contains a large architectural rewrite of chasms runtime and compiler, it improves its execution speed roughly 2x on the industry standard coremark benchmark. I had long considered making chasm more performant a dead end, and that we are fundamentally limited by the type of interpretation which can be expressed in java bytecode. Last year I implemented a fusion algorithm that bumped the speed roughly 30%, but what I didn't realise is that this algorithm was suboptimal in a bunch of ways. Long story short we were fetching operands via a lambda to avoid the explosion of monomorphised instruction handlers, this was a decision I made at the time because it limited the scope and I generally assumed not a lot of performance had been left on the table.

In the last 6 months I have been working on a new runtime, written in a low level language that has some very impressive performance characteristics. This work is inspired by things I have learnt there and also from another open source runtime called stitch, check it out its very clever!

TLDR

We now create super instructions with the operands i and s, i meaning intermediate which are constants that are folded into the instruction, s being stack but notably stack with a slot index. Chasm has always stored locals on the stack and what this stack with index approach does is allow us to fuse both locals and normal stack operands using the same handler. Its a very clever trick which keeps code size/instruction cache under control. At compile time we calculate the the entire size needed for the call frame and the indicies for every instructions so there is no runtime overhead.

Further to this we also now lower wasms control flow instructions from a nested syntax representation to true jumps, this is something I started last year but it never showed the performance characteristics I needed. It transpires coupled with some other improvements it was meaningful. That being said theres still some performance on the table as we are maintaining an instruction stack for now, this can be removed in the future and changed to a real ip/pc and instruction pointer model.

Anyways, enjoy twice the performance. Shout out to GPT 5.4 which actually solved some very tricky compiler issues I ran intro, its remarkable what a good harness can do for these agents

@CharlieTap CharlieTap merged commit 0c37acb into main Mar 9, 2026
9 of 16 checks passed
@CharlieTap CharlieTap deleted the monomorph branch March 9, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant