Add Hashtable and LongHashingUtils utilities#11409
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a534e4f4f4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
|
|
||
| public static final int hash(Object obj0, Object obj1, Object obj2, Object obj3, Object obj4) { | ||
| return hash(hashCode(obj0), hashCode(obj1), hashCode(obj2), hashCode(obj3)); | ||
| return hash(hashCode(obj0), hashCode(obj1), hashCode(obj2), hashCode(obj3), hashCode(obj4)); |
Adds Support.bucket(buckets, keyHash) which returns the bucket head already cast to the caller's concrete entry type. D1.get and D2.get now drop the raw-Entry intermediate variable and walk the chain via Entry.next() directly. The unchecked cast lives in one place, consistent with Entry.next() and Support.forEach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Holdover from when both lived in a shared HashtableBenchmark; redundant now that each lives in its own class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bleIterator Three consumer-facing helpers that callers building higher-arity tables on top of Hashtable.Support kept open-coding: - MAX_RATIO_NUMERATOR / _DENOMINATOR: the 4/3 multiplier for sizing a bucket array from a target working-set under a 75% load factor. - insertHeadEntry(buckets, bucketIndex, entry): the (setNext + array-store) pair for splicing a new entry at the head of a bucket chain. - MutatingTableIterator + Support.mutatingTableIterator(buckets): walks every entry in the table (not filtered by hash) with remove() support, for sweeps like eviction and expunge that aren't keyed to a specific hash. Sibling of MutatingBucketIterator. Tests cover the table-wide iterator at head-of-bucket and mid-chain removal, empty buckets between live entries, exhaustion, and remove-without-next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… create() Replace Support.MAX_RATIO_NUMERATOR / _DENOMINATOR with a single float MAX_RATIO constant, and add a Support.create(int, float) overload that takes a scale factor. Callers now write Support.create(n, MAX_RATIO) instead of stitching together the int arithmetic at the call site. The scaled size is truncated (not ceiled) before going through sizeFor. sizeFor already rounds up to the next power of two, so truncation just absorbs float fuzz that would otherwise push a result like 12 * 4/3 = 16.0000005f past 16 and double the bucket array size for no reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five small cleanups from a design re-review pass: 1. Support javadoc: drop the stale "methods are package-private" sentence; most of them were made public in earlier commits for higher-arity callers. Also drop the "nested BucketIterator" framing (iterators are peers of Support inside Hashtable, not nested inside Support). 2. MAX_RATIO javadoc: drop the Math.ceil recommendation; create(int, float) deliberately truncates and is the canonical pathway. 3. Document the null-hash treatment on D1.Entry.hash and D2.Entry.hash so the behavior difference is explicit: D1 uses Long.MIN_VALUE as a sentinel that's collision-free against any int-valued hashCode(); D2 has no such sentinel and relies on matches() to resolve null/null vs hash-0 collisions. 4. Rename Support.MAX_CAPACITY -> MAX_BUCKETS and sizeFor's parameter to requestedSize. The cap is on the bucket-array length, not entry count; the new name reflects that. Error messages updated to match. 5. Drop the `abstract` modifier on Hashtable in favor of `final` with a private constructor. Nothing actually subclasses Hashtable -- the abstract was a namespace device that read as "intended for extension." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add Support.insertHeadEntry(buckets, long keyHash, entry) overload that derives the bucket index itself. Callers that already have a hash but not the index (the common case) now avoid the redundant bucketIndex(...) hop. - D1.insert, D1.insertOrReplace, D2.insert, D2.insertOrReplace: use the new overload, drop the (thisBuckets local, bucketIndex compute, setNext, store) sequence at each call site. - D2.buckets: drop the `private` modifier to match D1.buckets. Both are package-private so iterator tests in the same package can drive Support.bucketIterator against the table's bucket array. Added a short comment on both fields documenting the rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the design review: - Make Hashtable.Entry.next private. All same-package readers (BucketIterator) already had a next() accessor; the leftover direct field reads now route through it. Closes the "mixed encapsulation" gap where some readers used the accessor and same-package ones reached for the field. - BucketIterator and MutatingBucketIterator now document that chain-walk work happens in next() (and the constructor for the first match); hasNext() is an O(1) field read. - Add D1.getOrCreate(K, Function) and D2.getOrCreate(K1, K2, BiFunction). Both reuse the lookup hash for the insert on miss, avoiding the double-hash that "get; if null then insert" callers would otherwise pay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses PR #11409 review comments: - #3267164119 / #3267165525: wrap every single-line if/break body in braces (7 sites across BucketIterator, MutatingBucketIterator, and the full-table Iterator). - #3275947761 / #3275948108 (sarahchen6): null out the removed/replaced entry's next pointer after splicing it out of the chain in MutatingBucketIterator.remove / .replace. Applied the same fix to the full-table Iterator.remove for consistency. Rationale: detaching prevents accidental traversal through a removed entry via a stale reference and lets the GC reclaim a chain tail that the removed entry was the last referrer to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
66ec7f6 to
e2642cd
Compare
…sistency Addresses PR #11409 review comment #3276167001. The method parallels the primitive hash(boolean) / hash(int) / hash(long) / ... family, so naming it hash(Object) -- with null collapsing to Long.MIN_VALUE as a sentinel distinct from any real hashCode -- matches the rest of the public surface. Test call sites that pass a literal null now disambiguate against hash(int[]) / hash(Object[]) / hash(Iterable) via an (Object) cast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in Use ⏳ Processing |
What Does This Do
Introduces Hashtable which serves as lighter weight alternative to HashMap.
Motivation
Hashtable is parameterized on Entry types allowing for lower overhead.
The Entry can hold multiple fields that comprise the key.
The Entry can hold mutable fields that compromise the value.
The Entry can include metainfo useful for eviction, etc
Hashable includes D1 and D2 for 1-D and 2-D maps respectively, but also includes a Support class that be used to make higher dimensional / more complicated map structures.
Particularly useful in aggregation workloads with multipart keys where lookups dominate insertions. In those situations, a solution based on Hashtable avoids constantly allocating a composite key object that will be immediately thrown away.
Additional Notes
Splits out of #11382 into stand-alone own change:
datadog.trace.util.Hashtable— generic open-addressed-ish bucket table keyed by a 64-bit hash. Public abstractEntrylets client code subclass it for higher-arity keys (e.g. for multi-field aggregation keys in the metrics aggregator). Support helpers (create,clear,bucketIndex,bucketIterator,mutatingBucketIterator) are package-private but enough for higher layers built on top.datadog.trace.util.LongHashingUtils— chained 64-bit hash combiners with primitive overloads (boolean,short,int,long,Object). Used in place of varargs combiners to avoidObject[]allocation and boxing on the hot path.No callers within
internal-apiyet. The first usage will land in #11382 (AggregateTable+AggregateEntry), which now becomes a smaller, more focused diff once this lands first.Test plan
:internal-api:compileJavapasses (verified locally).🤖 Generated with Claude Code