Skip to content

Add Hashtable and LongHashingUtils utilities#11409

Open
dougqh wants to merge 21 commits into
masterfrom
dougqh/util-hashtable
Open

Add Hashtable and LongHashingUtils utilities#11409
dougqh wants to merge 21 commits into
masterfrom
dougqh/util-hashtable

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented May 18, 2026

What Does This Do

Introduces Hashtable which serves as lighter weight alternative to HashMap.

Motivation

Hashtable is parameterized on Entry types allowing for lower overhead.
The Entry can hold multiple fields that comprise the key.
The Entry can hold mutable fields that compromise the value.
The Entry can include metainfo useful for eviction, etc

Hashable includes D1 and D2 for 1-D and 2-D maps respectively, but also includes a Support class that be used to make higher dimensional / more complicated map structures.

Particularly useful in aggregation workloads with multipart keys where lookups dominate insertions. In those situations, a solution based on Hashtable avoids constantly allocating a composite key object that will be immediately thrown away.

Additional Notes

Splits out of #11382 into stand-alone own change:

  • datadog.trace.util.Hashtable — generic open-addressed-ish bucket table keyed by a 64-bit hash. Public abstract Entry lets client code subclass it for higher-arity keys (e.g. for multi-field aggregation keys in the metrics aggregator). Support helpers (create, clear, bucketIndex, bucketIterator, mutatingBucketIterator) are package-private but enough for higher layers built on top.
  • datadog.trace.util.LongHashingUtils — chained 64-bit hash combiners with primitive overloads (boolean, short, int, long, Object). Used in place of varargs combiners to avoid Object[] allocation and boxing on the hot path.

No callers within internal-api yet. The first usage will land in #11382 (AggregateTable + AggregateEntry), which now becomes a smaller, more focused diff once this lands first.

Test plan

🤖 Generated with Claude Code

@dougqh dougqh added type: enhancement Enhancements and improvements comp: core Tracer core tag: no release notes Changes to exclude from release notes tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes labels May 18, 2026
@dougqh dougqh marked this pull request as ready for review May 18, 2026 20:29
@dougqh dougqh requested a review from a team as a code owner May 18, 2026 20:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a534e4f4f4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
@dougqh dougqh requested a review from amarziali May 19, 2026 13:38
Comment thread internal-api/src/jmh/java/datadog/trace/util/HashtableBenchmark.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
Comment thread internal-api/src/test/java/datadog/trace/util/HashtableTest.java Outdated
Comment thread internal-api/src/test/java/datadog/trace/util/HashtableTest.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/LongHashingUtils.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/LongHashingUtils.java Outdated
@datadog-prod-us1-4

This comment has been minimized.

Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
Comment thread internal-api/src/jmh/java/datadog/trace/util/HashtableD2Benchmark.java Outdated
Comment thread internal-api/src/jmh/java/datadog/trace/util/HashtableD1Benchmark.java Outdated

public static final int hash(Object obj0, Object obj1, Object obj2, Object obj3, Object obj4) {
return hash(hashCode(obj0), hashCode(obj1), hashCode(obj2), hashCode(obj3));
return hash(hashCode(obj0), hashCode(obj1), hashCode(obj2), hashCode(obj3), hashCode(obj4));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related bug fix

Comment thread internal-api/src/main/java/datadog/trace/util/Hashtable.java Outdated
dougqh and others added 8 commits May 20, 2026 14:03
Adds Support.bucket(buckets, keyHash) which returns the bucket head
already cast to the caller's concrete entry type. D1.get and D2.get
now drop the raw-Entry intermediate variable and walk the chain via
Entry.next() directly. The unchecked cast lives in one place,
consistent with Entry.next() and Support.forEach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Holdover from when both lived in a shared HashtableBenchmark; redundant
now that each lives in its own class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bleIterator

Three consumer-facing helpers that callers building higher-arity tables on
top of Hashtable.Support kept open-coding:

- MAX_RATIO_NUMERATOR / _DENOMINATOR: the 4/3 multiplier for sizing a
  bucket array from a target working-set under a 75% load factor.
- insertHeadEntry(buckets, bucketIndex, entry): the (setNext + array-store)
  pair for splicing a new entry at the head of a bucket chain.
- MutatingTableIterator + Support.mutatingTableIterator(buckets): walks
  every entry in the table (not filtered by hash) with remove() support,
  for sweeps like eviction and expunge that aren't keyed to a specific
  hash. Sibling of MutatingBucketIterator.

Tests cover the table-wide iterator at head-of-bucket and mid-chain
removal, empty buckets between live entries, exhaustion, and
remove-without-next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… create()

Replace Support.MAX_RATIO_NUMERATOR / _DENOMINATOR with a single float
MAX_RATIO constant, and add a Support.create(int, float) overload that
takes a scale factor. Callers now write Support.create(n, MAX_RATIO)
instead of stitching together the int arithmetic at the call site.

The scaled size is truncated (not ceiled) before going through sizeFor.
sizeFor already rounds up to the next power of two, so truncation just
absorbs float fuzz that would otherwise push a result like 12 * 4/3 =
16.0000005f past 16 and double the bucket array size for no reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five small cleanups from a design re-review pass:

1. Support javadoc: drop the stale "methods are package-private" sentence;
   most of them were made public in earlier commits for higher-arity
   callers. Also drop the "nested BucketIterator" framing (iterators are
   peers of Support inside Hashtable, not nested inside Support).
2. MAX_RATIO javadoc: drop the Math.ceil recommendation; create(int, float)
   deliberately truncates and is the canonical pathway.
3. Document the null-hash treatment on D1.Entry.hash and D2.Entry.hash so
   the behavior difference is explicit: D1 uses Long.MIN_VALUE as a
   sentinel that's collision-free against any int-valued hashCode(); D2
   has no such sentinel and relies on matches() to resolve null/null vs
   hash-0 collisions.
4. Rename Support.MAX_CAPACITY -> MAX_BUCKETS and sizeFor's parameter to
   requestedSize. The cap is on the bucket-array length, not entry count;
   the new name reflects that. Error messages updated to match.
5. Drop the `abstract` modifier on Hashtable in favor of `final` with a
   private constructor. Nothing actually subclasses Hashtable -- the
   abstract was a namespace device that read as "intended for extension."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add Support.insertHeadEntry(buckets, long keyHash, entry) overload that
  derives the bucket index itself. Callers that already have a hash but
  not the index (the common case) now avoid the redundant bucketIndex(...)
  hop.
- D1.insert, D1.insertOrReplace, D2.insert, D2.insertOrReplace: use the
  new overload, drop the (thisBuckets local, bucketIndex compute,
  setNext, store) sequence at each call site.
- D2.buckets: drop the `private` modifier to match D1.buckets. Both are
  package-private so iterator tests in the same package can drive
  Support.bucketIterator against the table's bucket array. Added a short
  comment on both fields documenting the rationale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the design review:

- Make Hashtable.Entry.next private. All same-package readers
  (BucketIterator) already had a next() accessor; the leftover direct
  field reads now route through it. Closes the "mixed encapsulation"
  gap where some readers used the accessor and same-package ones
  reached for the field.
- BucketIterator and MutatingBucketIterator now document that chain-walk
  work happens in next() (and the constructor for the first match);
  hasNext() is an O(1) field read.
- Add D1.getOrCreate(K, Function) and D2.getOrCreate(K1, K2, BiFunction).
  Both reuse the lookup hash for the insert on miss, avoiding the
  double-hash that "get; if null then insert" callers would otherwise
  pay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses PR #11409 review comments:

- #3267164119 / #3267165525: wrap every single-line if/break body in
  braces (7 sites across BucketIterator, MutatingBucketIterator, and the
  full-table Iterator).

- #3275947761 / #3275948108 (sarahchen6): null out the removed/replaced
  entry's next pointer after splicing it out of the chain in
  MutatingBucketIterator.remove / .replace. Applied the same fix to the
  full-table Iterator.remove for consistency.

  Rationale: detaching prevents accidental traversal through a removed
  entry via a stale reference and lets the GC reclaim a chain tail that
  the removed entry was the last referrer to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh force-pushed the dougqh/util-hashtable branch from 66ec7f6 to e2642cd Compare May 20, 2026 18:05
@dougqh dougqh requested review from a team as code owners May 20, 2026 18:05
@dougqh dougqh requested review from P403n1x87, mcculls and ygree and removed request for a team May 20, 2026 18:05
@dougqh dougqh changed the base branch from dougqh/conflating-metrics-background-work to master May 20, 2026 18:06
Comment thread internal-api/src/main/java/datadog/trace/util/LongHashingUtils.java Outdated
…sistency

Addresses PR #11409 review comment #3276167001. The method parallels the
primitive hash(boolean) / hash(int) / hash(long) / ... family, so naming
it hash(Object) -- with null collapsing to Long.MIN_VALUE as a sentinel
distinct from any real hashCode -- matches the rest of the public surface.

Test call sites that pass a literal null now disambiguate against
hash(int[]) / hash(Object[]) / hash(Iterable) via an (Object) cast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh enabled auto-merge May 20, 2026 18:20
@gh-worker-ownership-write-b05516 gh-worker-ownership-write-b05516 Bot removed the request for review from a team May 20, 2026 19:44
@dougqh dougqh added this pull request to the merge queue May 20, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 20, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 20, 2026

View all feedbacks in Devflow UI.

2026-05-20 20:34:55 UTC ℹ️ Start processing command /merge
Use /merge -c to cancel this operation!


2026-05-20 20:35:00 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 1h (p90).

Use /merge -c to cancel this operation!


⏳ Processing

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants