Conversation
|
nice design |
He-Pin
left a comment
There was a problem hiding this comment.
Simple and correct change. Caching hashCode on an immutable ByteString is a standard optimization.
One concern: The current implementation uses lazy val which has synchronization overhead on first access in Scala. For hot paths that create many small ByteStrings, the double-checked locking generated by lazy val could be measurable. Consider using a @volatile var with manual caching if benchmarks show this is an issue:
scala\n@volatile private var _hashCode: Int = 0\noverride def hashCode: Int = {\n if (_hashCode == 0) _hashCode = super.hashCode()\n _hashCode\n}\n
However, this has a problem: hash code of 0 cannot be distinguished from uncomputed. For most real-world ByteStrings this is unlikely to matter, but the lazy val approach is safer. The current implementation is fine as-is.
He-Pin
left a comment
There was a problem hiding this comment.
Deep CR: PR #2836 - cache bytestring hashcode
Architecture Review
Adding override lazy val hashCode: Int = super.hashCode() to the abstract ByteString class is a standard and correct optimization for immutable collections. Since ByteString extends IndexedSeq[Byte] with StrictOptimizedSeqOps, the super.hashCode() delegates to IndexedSeq.hashCode() which computes a content-based hash.
Key observation: ByteString does NOT currently override equals. This means:
equalsis inherited fromIndexedSeqwhich does content-based comparisonhashCodenow caches the result ofIndexedSeq.hashCode()- The equals/hashCode contract is maintained: equal ByteStrings will have equal hashCodes
This is important because if ByteString were used as HashMap keys, this optimization provides significant value.
Binary Compatibility
Adding lazy val hashCode introduces a new public field on the abstract ByteString class. Since this is Pekko 2.0 (major version), this is acceptable. All concrete subclasses (ByteString1C, ByteString1, ByteStrings) inherit this field automatically - no changes needed.
Potential issue: If external code has created subclasses of ByteString (unlikely since it's sealed), they would need to handle the new hashCode field. Since ByteString is sealed, this is safe.
Performance Considerations
lazy val overhead: Scala's lazy val uses double-checked locking with a bitmap field. On the first access:
- Acquire monitor lock
- Compute
super.hashCode()(iterates all bytes) - Store result
- Release lock
On subsequent accesses:
- Check bitmap flag (single volatile read)
- Return cached value
The initial cost is amortized over repeated accesses. For ByteStrings used in HashMaps, Sets, or compared frequently, this is a clear win. For one-off ByteStrings that are never hashed, there is a small memory overhead (the bitmap field per instance).
Memory overhead: Each ByteString instance gains one bitmap byte for the lazy val. Negligible for most use cases.
Test Coverage
The test verifies:
hashCodeis consistent across multiple calls on the same instance- Equal ByteStrings have equal hashCodes
- Different ByteString implementations (ByteString1, ByteStrings) produce consistent hashCodes
Missing tests:
- No test for ByteString1C.hashCode
- No test verifying that hashCode is computed from content (not reference)
- No negative test (different content should produce different hashCode - though this is inherited behavior)
Code Quality
The implementation is minimal and correct. The comment "Cache the hash code since ByteStrings are immutable" is clear.
Java API Coverage
N/A - this is a Scala-only change. Java users calling .hashCode() on ByteString instances will benefit from the caching automatically.
Suggestions
- Consider adding a test for
ByteString1C.hashCodeto ensure all three implementations are covered. - Consider testing that different content produces different hashCodes (probabilistic, but catches obvious bugs).
- The
lazy valapproach is correct but has a small initial synchronization cost. If benchmarks show this is significant in ByteString-heavy workloads, a manual@volatilevar approach could be explored, but the current implementation is the safer default choice.
ByteString hashCode does not change because ByteStrings are immutable.