0.2.1: specialized ropes and core optimizations#18
Merged
Conversation
Replace the Object[] based cursor cache with three direct volatile-mutable fields (_cc_chunk, _cc_start, _cc_end) to eliminate Integer boxing overhead. Fix rope-chunk-at to properly adjust index when descending into subtrees. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StringRope refactor: - with-tree macro replaces ~16 copies of the (binding [*t-join* alloc] ...) form; helpers ->StringRope*, coll->str, and coll->tree-root deduplicate the 6-arg constructor and coercion dispatch that were scattered through the PRope method bodies - Simpler rope-cat / rope-insert / rope-remove / rope-splice method bodies - Extract rope-tree-walk and wrap-reduce-fn from the duplicated 1-arity and 2-arity rope-reduce implementations; drop dead reduce-chunk-indexed - New tests: flat-boundary, hashmap-key-compatibility, cursor-cache-stress kernel/chunk.clj (new): - Holds the PRopeChunk protocol extensions for the rope kernel's chunk backends (APersistentVector, String, byte[]) - These are internal kernel dispatch, not user-facing interop — extracting them keeps kernel/rope.clj focused on the rope algebra (1237 -> 1155 lines, 266-line chunk.clj) ByteRope: - Persistent immutable byte sequence backed by the chunked WBT kernel - byte[] chunks via PRopeChunk extension (13 methods), including a fused chunk-splice-split that avoids intermediate allocations on the overflow path - byte-rope-node-create / bytes->root / byte-rope->bytes in kernel/rope.clj - types/byte_rope.clj: ByteRope deftype, ByteRopeSeq/SeqReverse, TransientByteRope with ByteArrayOutputStream tail buffer, flat-mode optimization (raw byte[] root below 1024 bytes), cursor cache for O(1) amortized sequential nth on tree-mode ropes - Unsigned semantics: nth / reduce / seq yield longs in [0, 255] - Equality: byte-rope = byte-rope and byte-rope = byte[]; intentionally not equal to Clojure vectors to avoid signed/unsigned confusion - Comparable via Arrays.compareUnsigned (lex order matches protobuf / Okio / Netty conventions) - Utilities: multi-byte reads (get-byte / short / int / long with big-endian default and -le variants), materialization (bytes, hex, write to OutputStream, InputStream adapter), byte-rope-index-of, byte-rope-digest (streams chunks through MessageDigest without materializing the whole rope) - Public API: byte-rope, byte-rope-concat, byte-rope-bytes, byte-rope-hex, byte-rope-write, byte-rope-input-stream, byte-rope-get-byte / short / int / long (+ -le), byte-rope-index-of, byte-rope-digest - #byte/rope "hex" tagged literal with EDN round-trip via readers.clj - 34 unit tests + 8 property tests (152 assertions) - bench_runner: 13 ByteRope vs byte[] benchmarks (construction, concat, split, splice, insert, remove, nth, reduce, fold, repeated-edits, bytes, digest) - bench_analyze: ByteRope vs byte[] headline section - simple_bench: :byte-rope category Full suite: 688 tests, 470,395 assertions, 0 failures.
Documentation: - cookbook.md restructured with six rope recipes leading: text editor buffer, regex/clojure.string on StringRope, bulk sequence assembly, binary protocol assembly with ByteRope, streaming crypto digest, persistent undo history. Existing collection recipes renumbered; duplicate "#11" at the end removed. - ropes.md gains a "Chunk Abstraction: One Kernel, Many Backends" section explaining PRopeChunk and pointing at kernel/chunk.clj as the internal dispatch table (vs types/interop.clj for user-facing extension). Also a "Specialized Ropes" section with concrete StringRope and ByteRope examples plus a variant-picker table. The Status and API sections now cover all three variants. - collections-api.md gains full StringRope and ByteRope sections with constructors, interface tables, and per-variant operations. Per-variant CSI tuning: - kernel/rope.clj adds `*target-chunk-size*` and `*min-chunk-size*` dynamic vars. Every internal function that reads CSI captures them into a local at entry, keeping the cost to one var deref per call (not per chunk). The public `+target-chunk-size+` and `+min-chunk-size+` defs remain for external code. - Each rope variant now declares its own `+target-chunk-size+` and `+min-chunk-size+` constants and binds them via its `with-tree` macro along with `tree/*t-join*`. Generic Rope, StringRope, and ByteRope all carry per-variant CSI without touching the kernel. - The generic Rope deftype gains a `with-tree` macro so its mutation methods pick up CSI from one place instead of open-coding `(binding [tree/*t-join* alloc] ...)` at 10 sites. Crossover benchmarking: - rope_tuning_bench.clj fully rewritten to sweep chunk sizes across all three variants (Rope vs Vector, StringRope vs String, ByteRope vs byte[]). Each (variant, N, target) cell measures construct/nth/reduce/split/splice/concat and reports speedups vs the natural baseline plus a geomean score for ranking. `--variant rope|string-rope|byte-rope` restricts the sweep. - Ran the full sweep on 2023 M2. At 100K+ elements every variant showed monotonic improvement in the [256, 1024] range with diminishing returns beyond 1024. Updated all three variant defaults to target=1024, min=512. Generic Rope at 500K moves from ~256's baseline to: +41% nth, +10% reduce, +38% split, 5x concat, -20% splice (still ~6000x faster than vector). StringRope and ByteRope improve on every operation. Memory-meter coverage: - memory_test.clj adds `string-rope-memory` and `byte-rope-memory` deftests comparing each variant to its natural baseline (String and byte[] respectively). The summary report table now has a third section showing the full rope family with per-variant baselines and overhead ratios. Full suite: 690 tests, 471,304 assertions, 0 failures.
Mirrors the existing StringRope / ByteRope flat-mode optimization: when a rope holds ≤ +flat-threshold+ (1024) elements, the `root` field holds the raw APersistentVector directly instead of a one-chunk tree wrapper. Reads and writes dispatch on `(flat? root)` to either vector-native operations or the kernel tree path, with transparent promotion to tree form once the size exceeds the threshold. types/rope.clj: - Adds +flat-threshold+ (= 1024, matching +target-chunk-size+) plus flat-mode helpers: `flat?`, `flat-size`, `ensure-tree-root`, `make-root`, `->tree-root`. - Every Rope deftype method now has a `cond` dispatching on nil / flat / tree: count, nth, seq, rseq, reduce, fold, peek, pop, cons, assoc, toArray, contains, and the full PRope protocol (rope-cat / -split / -sub / -insert / -remove / -splice / -chunks / -str). - Flat paths use vector-native ops (subvec, .nth, .cons, .pop, into, indexOf) which are already O(1) or close to it. The kernel is only invoked when the operation either starts in tree mode or would exceed the flat threshold. Reduce uses clojure.core/reduce so both PersistentVector (IReduceInit) and SubVector (plain reducible) dispatch correctly. - ->rope / rope / rope-concat-all construct as flat if the input fits; rope-concat promotes lazily when the combined size exceeds the threshold. - TransientRope.persistent! demotes a small tree result back to a flat vector at finalization time (mirrors StringRope/ByteRope). - asTransient promotes flat-mode roots to tree on the way in so the transient's internal machinery sees a uniform tree representation. kernel/rope.clj: - `invariant-valid?` and `normalize-root` now tolerate a bare APersistentVector as a trivially-valid flat-mode root, so tests that pass `.-root` directly to the kernel keep working without having to distinguish flat vs tree at the test layer. test/ordered_collections/rope_test.clj: - `rope-tree-healthy?` recognizes flat roots as trivially healthy. Memory (clj-memory-meter, N=100K random longs): rope: 29.5 bytes/elem (total: 2.8 MB) — was 30.3 bytes/elem vector: 29.4 bytes/elem (total: 2.8 MB) ratio: 1.00x — was 1.03x At N=1K (flat mode) the overhead is essentially zero — the rope is just a PersistentVector plus the Rope deftype header. Larger N also improved slightly from the per-variant CSI tuning landed earlier. Full suite: 690 tests, 470,354 assertions, 0 failures.
Small cardinalities matter more now that all three rope variants have a flat-mode optimization: ropes ≤ 1024 elements skip the tree wrapper entirely and should measure comparable to (or better than) their natural baselines on reads while still winning on structural edits. N=1000 is safely inside flat mode; N=5000 is the first "in-tree but small" size and exercises the post-promotion path. - `bench_runner.clj` sizes-full now runs [1000 5000 10000 100000 500000] (was [10000 100000 500000]). Every benchmark spec scales cleanly to smaller N, so the README/headline tables pick up new columns automatically when the full suite runs. - `simple_bench.clj` sizes-quick, sizes-default, and sizes-full all gain 5000 alongside the pre-existing 1000. The private per-category defaults (rope-sizes, byte-rope-sizes, string-rope-sizes) are updated to match. - `rope_tuning_bench.clj` default-sizes gains 1000 and 5000 so CSI tuning sweeps can see the flat-vs-tree crossover and the small-tree-mode behaviour. Spot-checked rope category at 1K/5K/10K via `lein bench-simple`: - nth at 1K: rope 109µs ≈ vector 111µs (flat mode = direct .nth) - repeated edits at 1K: rope 1.74ms vs vector 10.12ms (~6x) - fold at 1K: rope 13µs vs vector 78µs (flat mode skips fork-join)
Every rope variant (generic, string, byte) now documents that ropes at or below the flat threshold are stored as a bare concrete collection (PersistentVector / String / byte[]) directly in the root field, bypassing the tree wrapper entirely. - `ropes.md`: new "Flat Mode: Zero-Overhead Small Ropes" section explaining the optimization once for all three variants, with a table of which concrete type backs each variant in flat mode and how promotion/demotion work. The existing "Benchmark Summary" picks up a callout noting its numbers are tree-mode only. - `collections-api.md`: generic Rope section gains a paragraph matching the existing StringRope/ByteRope notes. The `nth` / `r/fold` rows call out the flat-mode dispatch path. - `algorithms.md`: new "Flat Mode" subsection beside CSI. Also updates the stale CSI constants block (target/min 256/128 → 1024/512) and notes that each variant carries its own per-variant defaults via its `with-tree` macro. - `CHANGES.md`: new 0.2.1-SNAPSHOT entry summarizing the StringRope refactor, kernel/chunk.clj extraction, ByteRope addition, per-variant CSI tuning, flat-mode optimization, benchmark suite updates, and documentation rewrites. Full suite: 690 tests, 469,668 assertions, 0 failures.
The existing report showed losses but silently computed-and-discarded
the matching wins. It also had no cross-category view and no way to
see the three rope variants side-by-side. Three new sections, no
removals, same terminal formatting throughout.
etc/lib/bench_analyze.clj:
- `category-summary` aggregates the scorecard by category and returns
wins / parity / losses counts, geomean speedup, best win, and worst
loss per category. The geomean matters more than an arithmetic mean
here because speedups are ratios — geomean is what gets reported.
- `rope-family-summary` picks the largest benchmarked size and, for
each structural operation, looks up the per-variant speedup vs the
natural baseline (vector / String / byte[]). Returns a row per
operation with one cell per variant; cells are nil when a variant
does not have a matching benchmark group (e.g. the generic rope
has no single-splice `rope-insert` bench).
etc/lib/bench_render.clj:
- `render-significant-wins` — parallel to the existing losses
renderer. Formats as plain speedup strings ('12.3x') not the
'1.4x slower' framing used for losses.
- `render-category-summary` — seven-column table keyed on the
existing `category-order`. Shows wins / parity / losses counts,
geomean speedup, best win, and the single worst-loss case with
group name.
- `render-rope-family` — four-column cross-variant table labelled
'Each cell is variant vs natural baseline speedup at N=X'. Big
wins render as bold `**N.Nx**`; losses use sub-1 decimal precision.
- `fmt-speedup-cell` handles the full range — from `**1236x**` down
to `0.0018x` — without losing precision at either end.
etc/bench_report.bb:
- New sections inserted between 'Headline Performance' and 'At
Parity' in this order: Performance by Category, Rope Family at
Scale, Significant Wins. Everything else is unchanged.
Verified against bench-results/2026-04-10_09-24-13.edn: all existing
sections render identically; three new sections populate correctly.
The ByteRope column in the Rope Family table shows placeholders on
old result files (pre-ByteRope) and will populate after a fresh
bench run.
After writing a fresh bench-results/<timestamp>.edn, the runner now looks for the most-recent existing EDN in the same directory that predates the new one. If it finds one, it flat-walks both files, matches leaf measurements by (size, group, variant), and prints a compact section with: - the top regressions (≥10% slower) and - the top improvements (≥10% faster) Each row shows old → new timing plus percent delta. At the end it suggests `lein bench-report --file … --baseline …` for the full category breakdown. Self-contained: parses the EDN inline with `clojure.edn` and does the delta computation in ~50 lines, no dependency on the bb bench-report tool. When the prior file has a different size set (e.g. before N=1000/5000 were added), unmatched cells are simply skipped and the reported "Compared: N matching cells" count reflects the intersection. Smoke-tested against 2026-04-09 → 2026-04-10 EDN pair: the inline section matches the subset of regressions the full `lein bench-report --baseline` tool produces, laid out with timing units and percent deltas.
benchmarks.md stripped of hardcoded numbers (were stale from 0.2.0) and restructured around the benchmark infrastructure: versioned EDN artifacts, the parse-analyze-render pipeline, A/B comparison method, and per-category interpretation. All current numbers live in report.txt which is auto-generated via `lein bench-report > doc/report.txt`.
nth: each rope variant now inlines the tree walk directly in the deftype, replacing the generic kernel's protocol-dispatched rope-nth with concrete chunk-type calls (alength/aget for byte[], .length/.charAt for String, .count/.nth for vector). Eliminates per-level PRopeChunk dispatch and the [chunk offset] tuple allocation from rope-chunk-at. Measured 2-2.6x improvement on random nth at N=500K. reduce: byte-rope and string-rope add monomorphic tree-reduce helpers that walk the tree with inlined per-chunk loops, bypassing per-chunk chunk-reduce-init dispatch. 1.7-3.3x improvement. Cursor cache removed from StringRope and ByteRope. The volatile-mutable fields had torn-read races under concurrent access (three volatile writes are not atomic as a group) and caused cache thrashing when two threads did sequential access on the same instance. Monomorphic tree walk is fast enough that the cache benefit does not justify the correctness cost.
Add benchmarks for range-map (construction, lookup, carve-out, iteration vs Guava TreeRangeMap), segment-tree (construction, query, update vs sorted-map), priority-queue (construction, push, pop-min vs sorted-set-by), ordered-multiset (construction, multiplicity, iteration vs sorted-map counts), fuzzy-set and fuzzy-map (construction, nearest vs sorted-set/map). All wired into all-benchmark-specs for inclusion in lein bench --full. Memory tests extended to cover string-rope, byte-rope, range-map, segment-tree, and fuzzy-map. Time estimates updated: --full is ~60 min with the expanded suite (was ~30 min).
bench-report gains headline sections for ordered-set, ordered-map, long-specialized, and string-specialized vs their competitors. Asterisks removed from speedup formatting (plain text report). Default --top increased from 12 to 30. Auto-baseline: bench-report now auto-selects the prior timestamped EDN as baseline when --baseline is not specified, so Regressions and Improvements sections render by default. File discovery filters to timestamped EDN only (excludes non-standard filenames). Rope tuner: score function now uses structural-editing geomean (splice, split, concat) as the primary ranking metric, with a secondary 'all' column showing the equal-weight geomean. Docstring explains the rationale: splice and split are chunk-size-insensitive at scale, so the old equal-weight geomean was misleadingly driven by concat.
README: performance tables updated from 2026-04-12 bench run. Added StringRope, ByteRope, and specialized-collection tables. Collections table gains string-rope and byte-rope constructors. Ropes section describes all three variants with examples. Test count 454 -> 690. ByteRope framed as persistent structure-sharing memory: O(log n) structural editing, zero-cost immutable snapshots via path-copying, automatic chunk coalescing, GC of unreachable versions. Use cases: binary protocol construction, undo/redo, diffing/patching, streaming. CHANGES.md: monomorphic hot paths entry with measured improvements, cursor cache removal rationale, updated specialized-type entries. Cookbook: range-map pricing-tiers recipe, interval-set availability windows recipe. Collections-api: cursor cache references removed.
Empty StringRope: charAt and nth dereferenced nil root instead of throwing bounds exception. Fixed by using flat-size (nil-safe) for bounds check before dispatching on root type. StringRope valAt: coerced all keys with (int k), causing ClassCastException on non-integer keys like :x or nil. Added integer? guard to match standard associative lookup semantics. Empty fold: StringRope and ByteRope fell through to rope-fold on nil root, crashing instead of returning (combinef). Added nil check. ByteRope InputStream: read(buf, off, 0) returned -1 at EOF instead of 0 per InputStream contract. Zero-length reads now return 0 regardless of position. Found by Codex review.
lein bench-charts reads the latest benchmark EDN and generates 7 PNG charts in doc/charts/: 1. set-algebra-scaling — union/intersection/difference vs sorted-set 2. rope-editing-scaling — repeated-edits for all 3 rope variants 3. collection-winners — best headline win per collection type (dot plot) 4. rope-operations-profile — full win/loss profile (dot plot, log-Y) 5. rope-vs-vector-absolute — diverging O(log n) vs O(n) lines 6. string-rope-crossover — per-operation crossover vs String 7. byte-rope-crossover — per-operation crossover vs byte[] Uses XChart 3.8.8 (dev dependency) for direct PNG output via Java2D.
conj/disjoin on OrderedSet and assoc/without/assoc-new on OrderedMap were passing tree/node-create-weight-balanced (the generic SimpleNode constructor) instead of the collection's alloc field. This silently downgraded LongKeyNode to SimpleNode after a single mutation, losing the unboxed-key benefit that long-ordered-set/map exist to provide. Fixed by threading alloc through all node-add/node-remove call sites. ordered-merge-with also propagated nil alloc/stitch into the result map; fixed to carry alloc/stitch from the source and bind *t-join*. Found by Codex review.
getAllocator returned nil instead of tree/node-create-weight-balanced; getStitch returned nil instead of tree/node-stitch. This violated the INodeCollection/IBalancedCollection contract and would crash if with-tree-env were ever used on these types. Found by Codex review.
README Performance section embeds set-algebra-scaling and rope-editing-scaling charts. benchmarks.md gains a Charts section linking all 7 PNGs with descriptions. Reflection warnings in bench_charts.clj fixed via XYStyler/CategoryStyler type hints.
dco-lmeyers
reviewed
Apr 13, 2026
| @@ -0,0 +1 @@ | |||
| dan.lentz@lentz-mbpro-14233.830 No newline at end of file | |||
Flat-mode seq/rseq IReduce: 1-arity reduce treated enum=nil as empty, discarding all chunk elements for ropes at or below the flat threshold. Fixed by always starting from the current element and only checking enum for continuation to the next chunk. Affects StringRopeSeq, StringRopeSeqReverse, ByteRopeSeq, ByteRopeSeqReverse. Empty StringRope 1-arity reduce: nil root fell through to tree path (node-least on nil). Added nil check returning (f). InputStream.read(buf, off, len): added bounds validation per the InputStream contract. Out-of-range off+len now throws IOOB. StringRope.subSequence: empty flat subsequence stored "" instead of nil, breaking isEmpty on the result. Now converts to nil. Found by Codex review (round 2).
str->root split strings at raw char offsets, placing a lone high surrogate at the end of one chunk and the low surrogate at the start of the next. Now adjusts chunk boundaries to avoid splitting UTF-16 surrogate pairs.
Add targeted tests for all Codex-found issues: - Empty StringRope charAt/nth bounds exceptions - Non-integer key valAt returns nil - Empty fold returns (combinef) - ByteRope InputStream zero-length and out-of-bounds reads - Surrogate pair at chunk boundary not split - LongKeyNode preserved through conj/disj/assoc/dissoc - LongKeyNode preserved through ordered-merge-with 695 tests, 470K assertions, 0 failures.
All numbers from bench-results/2026-04-12_16-48-22.edn. Set algebra improved to 50-75x vs sorted-set at 500K (was 39-57x in prior run).
…boxing The generic-Rope-specific RopeSeq and RopeSeqReverse lived in the kernel but are only used by the generic Rope deftype — string-rope and byte-rope carry their own monomorphic seq types. Moving them to types/rope.clj makes the kernel honestly chunk-protocol-agnostic and cuts ~220 lines from the kernel file. Also trims now-unused imports (Murmur3, SeqIterator, Util). Also fixes a pre-existing auto-boxing warning in str->root and bytes->root: the loop variable `pos` was inferred primitive long but the recur arg came from clojure.core/min (Object) and unchecked-dec-int (int), forcing auto-boxing per iteration. Threaded as primitive long throughout, using unchecked-add/dec/int consistently.
…rator Three related constant-factor improvements, each targeting a known loss in doc/report.txt. No contract changes. 1. Primitive rank for long/string keys. Adds node-rank-long and node-rank-string in kernel/tree.clj alongside the existing contains/find/find-val primitive fast paths. OrderedSet.rank-of, OrderedSet.indexOf, and OrderedMap.rank-of dispatch via identity check on cmp. string-ordered-set rank: ~1.9x faster at N=100K. 2. Range-map bulk construction. node-build-sorted in kernel/tree.clj builds a balanced tree from sorted kv pairs in O(n). The range-map constructor now sorts input, validates disjointness, and takes the bulk path when applicable; overlapping input still falls through to the general carving path, preserving "later wins" semantics. N=10K disjoint construction: ~10x faster. 3. Non-allocating Java iterator for OrderedSet/OrderedMap. NodeIterator deftype advances the tree enumerator in place via unsynchronized- mutable, avoiding the seq-cell allocation per .next() that SeqIterator-over-seq incurred. Full traversal at N=100K: ~1.65x faster. Thread-safety contract unchanged: the iterator is per-call fresh (no shared state on the collection), matching the memory model of clojure.lang.SeqIterator.
The Full Scorecard, Regressions, and Improvements sections are useful for interactive A/B review during development but are noise for outside readers of the committed doc/report.txt snapshot. lein bench-report --publish suppresses the three sections for redirect-to-file use. Default lein bench-report is unchanged and still shows everything. doc/benchmarks.md updated to document the flag and recommend `lein bench-report --publish > doc/report.txt` for the snapshot workflow.
The existing bench suite didn't exercise the optimization code paths added in 8d19c26: - bench-rank-lookup only ran on default ordered-set (NormalComparator), not on long-ordered-set or string-ordered-set — so the primitive node-rank-long / node-rank-string paths never ran. - build-oc-range-map in bench-range-map-construction uses per-entry assoc, so the new single-argument (range-map coll) bulk-build path via node-build-sorted never ran. - bench-set-iteration uses reduce (goes through CollReduce), so the new tree/NodeIterator never ran. Adds four bench cases alongside the existing ones: - bench-long-rank-lookup (long-ordered-set vs data.avl rank-of) - bench-string-rank-lookup (string-ordered-set vs data.avl rank-of) - bench-range-map-bulk-construction (single-arg (range-map coll) vs Guava TreeRangeMap put) - bench-set-iteration-iterator (Java .iterator() traversal across sorted-set / data.avl / ordered-set) Also registers headline-benchmark entries so the new groups render in the appropriate scaling tables under lein bench-report: - Long-Specialized vs data.avl gains a "Rank lookup" row - String-Specialized vs data.avl gains a "Rank lookup" row - Range Map vs Guava gains a "Bulk Construction" row - Ordered Set vs sorted-set / data.avl gain an "Iteration (Iterator)" row At N=1K (smoke-tested), the new benches show ordered-set rank ~4x faster than data.avl, bulk range-map construction ~2x faster than Guava, and Iterator traversal ~3.6x faster than data.avl's iterator.
…data-avl Regenerates the headline tables and prose claims against the 2026-04-17 benchmark run. No API or behavior changes. README.md - Rope vs PersistentVector and StringRope vs String tables extended to the full N=1K / 5K / 10K / 100K / 500K cardinality range. - Set algebra tables (vs sorted-set, data.avl, clojure.core/set) refreshed. - "Other operations" and Specialized collections tables refreshed. - Intro: "18-60x wins at 500K" (was 28-75x) and "up to 60x faster" (was 50x) to match current set-algebra ceilings. doc/ropes.md - Main Benchmark Summary (Rope vs PersistentVector) regenerated. - StringRope vs String performance table regenerated, shifted to N=10K/100K/500K, added Single Remove row. - ByteRope vs byte[] performance table regenerated, added Single Remove and Split at Midpoint rows, noted crossover at 10K+. doc/cookbook.md - StringRope speedup note: "~38x faster than plain String" at 100K (was ~35x), with "~130x at 500K" added. doc/vs-clojure-data-avl.md - Parallel set operations: 7-51x (was 7-42x). - Parallel fold: 3-5x (was 6-9x) — honest post-baseline-shift.
Fresh snapshot against bench-results/2026-04-17_11-03-53.edn (git rev 990b9a5), rendered via lein bench-report --publish so the Full Scorecard, Regressions, and Improvements sections — useful for interactive A/B review but noise for outside readers — are omitted from the committed artifact. 320 lines (was 423). Headline scaling tables, per-category geomean, Rope Family at Scale, Significant Wins, At Parity, Significant Losses remain.
project.clj: 0.2.1-SNAPSHOT → 0.2.1 CHANGES.md: [0.2.1-SNAPSHOT] - unreleased → [0.2.1] - 2026-04-17 CHANGES entries added this cycle: - New Performance Improvements section: primitive rank for long/ string ordered collections, range-map bulk construction path, and non-allocating java.util.Iterator for OrderedSet/OrderedMap. - New Refactoring section noting RopeSeq/RopeSeqReverse relocation from kernel/rope.clj to types/rope.clj. - Bug Fixes gains the auto-boxing fix in str->root / bytes->root. - Benchmarks and Tooling gains the bench-report --publish flag and the four new bench cases exercising the optimizations. - StringRope headline claim refreshed: ~38x at 100K, ~130x at 500K (was ~35x at 100K).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
0.2.1: StringRope, ByteRope, and a kernel/infrastructure pass
Release branch for 0.2.1. 35 commits, 54 files changed (+11.1K / −1.5K). Adds two new persistent collection types, refactors the rope kernel to be chunk-protocol-agnostic, and tightens
the tree kernel with targeted constant-factor improvements. No breaking changes.
Summary
Two new collection types
API expecting text. #string/rope "…" EDN tag, content-based equality with String, String.hashCode-compatible. Constructors: string-rope, string-rope-concat. Up to ~38× faster than
String at 100K characters on random structural edits, growing to ~130× at 500K.
"hex" EDN tag. Framed as persistent memory: structural editing, zero-cost snapshots, structure sharing. Extras: byte-rope-bytes, -hex, -write, -input-stream, -get-byte/-short/-int/-long
(with -le variants), -index-of, streaming -digest through MessageDigest. At 500K: ~110× vs byte[] on splice, ~128× on remove.
Rope kernel: one kernel, three variants
split / splice / reduce / fold / CSI maintenance are written once.
default to 1024/512 after lein bench-rope-tuning sweep (up from 256/128).
tree overhead and transparent promotion on edits. Memory parity with the raw baseline.
access. Monomorphic walk is fast enough that the cache's benefit didn't justify the thread-safety cost.
Performance pass (late in cycle)
node-rank-long, node-rank-string added. string-ordered-set rank ~1.9× faster; long-ordered-set rank at parity (the LongComparator was already HotSpot-inlined).
general carving path, preserving "later wins" semantics. ~10× faster than the previous per-insert path; ~2× faster than Guava TreeRangeMap bulk put.
clojure.lang.SeqIterator's memory model — thread-safety contract unchanged (per-call fresh, not shared). Java iteration is ~2× faster than sorted-set and ~3.6× faster than data.avl.
Bug fixes
downgrading LongKeyNode → SimpleNode after a single conj on a long-ordered-set. Fixed by threading alloc through all call sites.
Bench infrastructure
behavior unchanged.
Documentation
Full details in CHANGES.md.