Skip to content

Perf: optimize NTHOCCURRENCE with COUNTIF + INDEX:INDEX #56

@jimmytacks

Description

@jimmytacks

Context

NTHOCCURRENCE currently uses SUMPRODUCT((INDEX(col, SEQUENCE(i)) = INDEX(col, i)) * 1) inside a MAP over SEQUENCE(n). This works on both ranges and in-memory arrays, but it doesn't hit Excel's native C++-level COUNTIF fast path.

Proposed change

Swap the inner aggregation to:

COUNTIF(INDEX(col, 1):INDEX(col, i), INDEX(col, i))

When col is a real worksheet range, INDEX(a):INDEX(b) yields a range reference, and COUNTIF over a range reference is one of Excel's most optimised aggregations. Expected material speedup on large ranges.

Caveats / testing required

  • Must verify behaviour is identical on in-memory array inputs (e.g. when NTHOCCURRENCE is chained after another LAMBDA that returns an array, not a range). The INDEX:INDEX trick degrades to array semantics there and may lose the fast path — but the result must still be correct.
  • Benchmark against current implementation on:
    • Small arrays (n ≤ 50) — expect no meaningful difference.
    • Medium arrays (n ≈ 500) — expect noticeable speedup on range inputs.
    • Large arrays (n ≈ 5000) — expect significant speedup on range inputs.
  • Confirm existing test cases in the .lambda file still pass.
  • Also drop the redundant * 1 that remains even under current logic if we keep a SUMPRODUCT fallback.

Do not merge without

Measured benchmark numbers (before/after) on both range and array inputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions