Skip to content

perf: presize dedup map in mapNodes#562

Closed
jvoisin wants to merge 1 commit into
PuerkitoBio:masterfrom
jvoisin:presize
Closed

perf: presize dedup map in mapNodes#562
jvoisin wants to merge 1 commit into
PuerkitoBio:masterfrom
jvoisin:presize

Conversation

@jvoisin
Copy link
Copy Markdown
Contributor

@jvoisin jvoisin commented May 27, 2026

mapNodes is the shared engine behind most traversal helpers (Parents, Children, Next/Prev, Siblings, Find*, Contents, ...). It allocates a map[*html.Node]bool used by appendWithoutDuplicates to skip duplicates across per-node results. The map was created with no size hint, so it incurred grow-and-double rehashing as it filled up.

Pass len(nodes) as the size hint. That's a reasonable upper bound for the common case where each input yields 0 or 1 results (Parent, Next, Prev, Children, ...) and stays close enough for the multi-result cases (Parents, NextAll, PrevAll, ...) to still avoid most rehashes.

Benchmarks (count=10, benchtime=500ms, linux/arm64):

Notable per-bench wins (all p<0.01):
Children -32.35% sec/op
Prev -36.73% sec/op
Next -29.76% sec/op
NextFiltered -29.57% sec/op
PrevFiltered -27.12% sec/op
ParentsFilteredUntil* -25% sec/op, -21% B/op
FindWithinSelection -17.97% sec/op, -21% B/op
ContentsFiltered -20.34% sec/op
NextUntil -16.25% sec/op
ParentsUntil -15.42% sec/op
Contents -12.09% sec/op
Find -11.52% sec/op

FindSelection / FindNodes feed mapNodes a 448-element input, so the presize bumps B/op by ~5 KiB on those two benchmarks — but allocs/op drops from 91 to 85, because the single sized allocation replaces the grow-and-double sequence the map went through before.

No benchmark regressions were found.

mapNodes is the shared engine behind most traversal helpers (Parents,
Children, Next/Prev, Siblings, Find*, Contents, ...). It allocates a
map[*html.Node]bool used by appendWithoutDuplicates to skip duplicates
across per-node results. The map was created with no size hint, so it
incurred grow-and-double rehashing as it filled up.

Pass len(nodes) as the size hint. That's a reasonable upper bound for
the common case where each input yields 0 or 1 results (Parent, Next,
Prev, Children, ...) and stays close enough for the multi-result cases
(Parents, NextAll, PrevAll, ...) to still avoid most rehashes.

Benchmarks (count=10, benchtime=500ms, linux/arm64):

Notable per-bench wins (all p<0.01):
  Children                  -32.35% sec/op
  Prev                      -36.73% sec/op
  Next                      -29.76% sec/op
  NextFiltered              -29.57% sec/op
  PrevFiltered              -27.12% sec/op
  ParentsFilteredUntil*     -25%    sec/op, -21% B/op
  FindWithinSelection       -17.97% sec/op, -21% B/op
  ContentsFiltered          -20.34% sec/op
  NextUntil                 -16.25% sec/op
  ParentsUntil              -15.42% sec/op
  Contents                  -12.09% sec/op
  Find                      -11.52% sec/op

FindSelection / FindNodes feed mapNodes a 448-element input, so the
presize bumps B/op by ~5 KiB on those two benchmarks — but allocs/op
drops from 91 to 85, because the single sized allocation replaces the
grow-and-double sequence the map went through before.

No benchmark regressions were found.
@jvoisin
Copy link
Copy Markdown
Contributor Author

jvoisin commented May 27, 2026

Superseded by #563

@jvoisin jvoisin closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant