Skip to content

StringView::Pool — amortized-allocation view factory#3

Merged
paracycle merged 2 commits intomainfrom
pool
Mar 18, 2026
Merged

StringView::Pool — amortized-allocation view factory#3
paracycle merged 2 commits intomainfrom
pool

Conversation

@tobi
Copy link
Copy Markdown
Member

@tobi tobi commented Mar 18, 2026

What

StringView::Pool pre-allocates a batch of StringView objects tied to a single backing string. pool.view(offset, length) returns a pre-built view with just two long writes — zero Ruby object allocation in steady state.

Why

When parsing a large buffer, you extract dozens of substrings per parse call. Each StringView.new allocates one Ruby object (~235ns). In a hot loop processing thousands of messages, this dominates:

10,000 messages × 20 fields × 235ns = 47ms just in allocation overhead

With a pool, each .view() is ~13ns (18x faster) and allocates nothing.

The looped parser pattern

pool = StringView::Pool.new(buffer)

records.each do |record_offset, field_offsets|
  name  = pool.view(record_offset, name_len)       # ~13ns, 0 alloc
  value = pool.view(record_offset + sep, value_len) # ~13ns, 0 alloc
  process(name, value)
  pool.reset!  # rewind cursor — views get reused next iteration
end

After the first iteration, the pool has enough capacity and every subsequent .view() call is zero-allocation. The pool stabilizes at the high-water mark of views needed per iteration.

Performance

StringView.new:       ~235ns/view   (4.3M views/s)
Pool.view (grow):     ~170ns/view   (5.8M views/s)  — 1.4x faster
Pool.view (reuse):     ~13ns/view  (77M views/s)     — 18x faster

GC safety

Every view in the pool is a real Ruby object managed by the GC. The pool holds them in a Ruby Array (GC-visible). Each view holds a strong reference to the backing string via rb_gc_mark_movable. Compaction-safe via dcompact callbacks. No tricks, no unsafe pointers, no finalizers.

API

pool = StringView::Pool.new(string)      # create pool, pre-allocate 32 views
view = pool.view(byte_offset, byte_len)  # return next pre-allocated view (~13ns)
pool.reset!                               # rewind cursor, reuse views
pool.size                                 # views handed out since last reset
pool.capacity                             # total pre-allocated slots
pool.backing                              # the frozen backing string

Growth: starts at 32 slots, doubles when exhausted (32 → 64 → 128 → …).

View lifetime after reset!

After pool.reset!, previously returned views are still valid Ruby objects, but their offset/length will be overwritten by the next .view() call that reuses that slot. If you need a view to outlive a reset:

  • Call .to_s to materialize it into a String before resetting
  • Use StringView.new for long-lived views
  • Don't call reset! (let the pool grow — views stay valid forever)

Tests

285 tests, 643 assertions, 0 failures. Includes:

  • Parser loop pattern tests with zero-allocation verification
  • Exponential growth tests
  • GC safety tests
  • Allocation counting: 0 per .view() when pre-warmed vs 1 per StringView.new

tobi added 2 commits March 19, 2026 01:21
Pre-allocates a batch of StringView objects tied to a single backing
string. pool.view(offset, length) returns a pre-built view with just
two long writes — zero Ruby object allocation in steady state.

Designed for the looped parser pattern:

    pool = StringView::Pool.new(buffer)
    records.each do |record|
      key   = pool.view(key_offset, key_len)
      value = pool.view(val_offset, val_len)
      process(key, value)
      pool.reset!   # rewind cursor, reuse views next iteration
    end

Performance (YJIT, Apple M-series):

    StringView.new:     ~235ns/view   (4.3M views/s)
    Pool.view (reuse):   ~13ns/view  (77M views/s)  — 18x faster

GC-safe: every view is a real Ruby object managed by the GC. The pool
holds them in a Ruby Array. Each view holds a strong reference to the
backing string. Compaction-safe via rb_gc_mark_movable + dcompact.

Pool API:
  Pool.new(string)              — create pool, pre-allocate 32 views
  pool.view(byte_off, byte_len) — return next pre-allocated view
  pool.reset!                   — rewind cursor (views get reused)
  pool.size                     — views handed out since last reset
  pool.capacity                 — total pre-allocated slots
  pool.backing                  — the frozen backing string

Growth: starts at 32 slots, doubles when exhausted (32→64→128→...).
After a few iterations the pool stabilizes at the high-water mark.

285 tests, 643 assertions, 0 failures.
Add ~65 new tests covering:

- Construction edge cases: empty string, binary, large, frozen
- View boundary cases: start/end, adjacent, overlapping, same range
- Thorough bounds checking: negative offset/length, overflow
- Exponential growth: detailed sequence, many doublings, validity
- Reset lifecycle: multiple resets, reset-then-grow, slot reuse,
  materialize-before-reset pattern
- Multibyte/UTF-8: CJK, emoji, mixed-width characters
- Full StringView API interop: include?, start_with?, getbyte,
  to_i, to_f, comparison, hash keys, each_byte, slicing, delegates
- Stress: 10K views without reset, rapid reset cycling, alternating
  iteration sizes
- GC safety: during use, compaction, multiple pools same backing
- Real-world parser patterns: CSV, HTTP headers, log lines,
  fixed-width records

350 tests, 10978 assertions, 0 failures.
@paracycle paracycle merged commit c52d16a into main Mar 18, 2026
7 checks passed
@paracycle paracycle deleted the pool branch March 18, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants