-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
I was reviewing a PR that dealt with some Protobuf today and saw that their ByteString class (which is pretty similar to Lucene's BytesRef) is abstract with a pair of concrete subclasses -- LiteralByteString and BoundedByteString. The BoundedByteString has length and offset members, while LiteralByteString is just a wrapper around byte[]. As a result, LiteralByteString is a whole 8 bytes smaller. Woohoo!
This got me thinking -- how many BytesRef instances out there have offset == 0 and length == bytes.length?
Within a lot of "hot" Lucene code, I believe the answer is "not many", since we do a very good job of reusing BytesRef instances forever. That said, all of the Term constructors end up producing BytesRefs of known (fixed) length. So the potential benefit is clearly non-zero. (Maybe close to zero?)
I'm thinking of trying to make BytesRef abstract and sealed with a pair of subclasses, similar to the Protobuf approach. Obviously, this means replacing direct field access with getters (and setters), but I think those can be bimorphically inlined.