[Investigate] Specialize BytesRef for literal byte arrays to save 8 bytes

### Description

I was reviewing a PR that dealt with some Protobuf today and saw that their `ByteString` class (which is pretty similar to Lucene's `BytesRef`) is abstract with a pair of concrete subclasses -- `LiteralByteString` and `BoundedByteString`. The `BoundedByteString` has length and offset members, while `LiteralByteString` is just a wrapper around `byte[]`. As a result, `LiteralByteString` is a whole 8 bytes smaller. Woohoo!

This got me thinking -- how many `BytesRef` instances out there have `offset == 0` and `length == bytes.length`? 

Within a lot of "hot" Lucene code, I believe the answer is "not many", since we do a **very** good job of reusing `BytesRef` instances forever. That said, all of the `Term` constructors end up producing `BytesRef`s of known (fixed) length. So the potential benefit is clearly non-zero. (Maybe close to zero?)

I'm thinking of trying to make `BytesRef` `abstract` and `sealed` with a pair of subclasses, similar to the Protobuf approach. Obviously, this means replacing direct field access with getters (and setters), but I think those can be bimorphically inlined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Investigate] Specialize BytesRef for literal byte arrays to save 8 bytes #15191

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Investigate] Specialize BytesRef for literal byte arrays to save 8 bytes #15191

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions