Skip to content

Optimization: u256 limb-based compound arithmetic #17

@koko1123

Description

@koko1123

Summary

Implement explicit 4x u64 limb-based multiplication and division for u256 compound operations to close the 3.31x gap on u256_uniswapv2_amount_out.

Current Performance

Benchmark eth.zig alloy.rs Gap
u256_uniswapv2_amount_out 43 ns 13 ns 3.31x loss
u256_mul (standalone) 2 ns 5 ns 2.50x win
u256_div (standalone) 3 ns 12 ns 4.00x win

We win on standalone mul/div but lose badly on compound operations (mul + div in sequence). The root cause: alloy's ruint stores u256 as [u64; 4] with hand-optimized limb arithmetic, while Zig's native u256 compiles to generic LLVM operations that lose context across compound ops.

Root Cause

The UniswapV2 benchmark does:

const amount_in_with_fee = fastMul(amount_in, 997);
const numerator = fastMul(amount_in_with_fee, reserve_out);
const denominator = fastMul(reserve_in, 1000) +% amount_in_with_fee;
const amount_out = fastDiv(numerator, denominator);

LLVM sees each operation independently and can't optimize the pipeline. Rust's ruint keeps limbs visible across operations, enabling register reuse and carry propagation without LLVM's generic lowering.

Proposed Approach

Add a Limb256 type that stores [4]u64 and implements arithmetic with explicit carry chains:

pub const Limb256 = struct {
    limbs: [4]u64,

    pub fn mul(a: Limb256, b: Limb256) Limb256 {
        // Schoolbook 4x4 limb multiplication with explicit carries.
        // 16 u64 multiplies, each producing u128 intermediate.
        var result: [4]u64 = .{0} ** 4;

        for (0..4) |i| {
            var carry: u64 = 0;
            for (0..4) |j| {
                if (i + j >= 4) break; // Only need low 256 bits
                const prod: u128 = @as(u128, a.limbs[i]) * b.limbs[j];
                const sum: u128 = @as(u128, result[i + j]) + prod + carry;
                result[i + j] = @truncate(sum);
                carry = @intCast(sum >> 64);
            }
        }

        return .{ .limbs = result };
    }

    /// Multiply by a small constant (common in DEX math: 997, 1000, etc.)
    pub fn mulSmall(a: Limb256, b: u64) Limb256 {
        var result: [4]u64 = .{0} ** 4;
        var carry: u64 = 0;
        for (0..4) |i| {
            const prod: u128 = @as(u128, a.limbs[i]) * b + carry;
            result[i] = @truncate(prod);
            carry = @intCast(prod >> 64);
        }
        return .{ .limbs = result };
    }

    pub fn fromU256(val: u256) Limb256 {
        return .{ .limbs = .{
            @truncate(val),
            @truncate(val >> 64),
            @truncate(val >> 128),
            @truncate(val >> 192),
        }};
    }

    pub fn toU256(self: Limb256) u256 {
        return @as(u256, self.limbs[0]) |
            (@as(u256, self.limbs[1]) << 64) |
            (@as(u256, self.limbs[2]) << 128) |
            (@as(u256, self.limbs[3]) << 192);
    }
};

Then provide a uniswapV2AmountOut that stays in limb space the entire time:

pub fn uniswapV2AmountOut(amount_in: u256, reserve_in: u256, reserve_out: u256) u256 {
    const a = Limb256.fromU256(amount_in);
    const r_in = Limb256.fromU256(reserve_in);
    const r_out = Limb256.fromU256(reserve_out);

    const a_fee = a.mulSmall(997);
    const num = a_fee.mul(r_out);
    const den = r_in.mulSmall(1000).add(a_fee);
    return num.div(den).toU256();
}

Key Insight

The mulSmall path is critical -- DEX math frequently multiplies u256 by small constants (997, 1000, 10000). A specialized mulSmall(u64) avoids the full 4x4 limb multiply and does 4 u64*u64 with carry instead.

Target

Close the gap from 3.31x loss to 1.5x loss or better. This directly impacts MEV searcher hot paths.

References

  • ruint source -- Rust's u256 with limb arithmetic
  • src/math/uint256.zig -- current fastMul/fastDiv implementation
  • bench/bench.zig:372-389 -- benchmark code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededoptimizationPerformance optimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions