Skip to content

[Investigate] Specialize BytesRef for literal byte arrays to save 8 bytes #15191

@msfroh

Description

@msfroh

Description

I was reviewing a PR that dealt with some Protobuf today and saw that their ByteString class (which is pretty similar to Lucene's BytesRef) is abstract with a pair of concrete subclasses -- LiteralByteString and BoundedByteString. The BoundedByteString has length and offset members, while LiteralByteString is just a wrapper around byte[]. As a result, LiteralByteString is a whole 8 bytes smaller. Woohoo!

This got me thinking -- how many BytesRef instances out there have offset == 0 and length == bytes.length?

Within a lot of "hot" Lucene code, I believe the answer is "not many", since we do a very good job of reusing BytesRef instances forever. That said, all of the Term constructors end up producing BytesRefs of known (fixed) length. So the potential benefit is clearly non-zero. (Maybe close to zero?)

I'm thinking of trying to make BytesRef abstract and sealed with a pair of subclasses, similar to the Protobuf approach. Obviously, this means replacing direct field access with getters (and setters), but I think those can be bimorphically inlined.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions