New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-15560: Optimize JavaBinCodec encode/decode performance. #255
base: main
Are you sure you want to change the base?
Conversation
I've put an initial before/after comparison for JavaBinCodec decode performance in the JIRA issue: https://issues.apache.org/jira/browse/SOLR-15560 |
So far, initial summary is: 79% improvement for decode performance for this benchmark. |
23a1491
to
68d86e6
Compare
result.stringProvider= getStringProvider(); | ||
return result; | ||
result.stringProvider= getStringProvider(); | ||
return result; | ||
} | ||
|
||
if (sz > MAX_UTF8_SZ) return _readStr(dis, null, sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL_DEREFERENCE: object null
is dereferenced by call to _readStr(...)
at line 1070.
(at-me in a reply with help
or ignore
)
68d86e6
to
b709c8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GH won't let me comment inline, but should https://github.com/apache/solr/pull/255/files#diff-d8d1c34f834fa8749d6ce10361f106676c178d412579914bb8db54d6590c82e8L1089 be changed to wrriteByteArray(b.array(), b.arrayOffset() + b.position(), ...)
?
log("benchmark random seed: " + seed); | ||
|
||
// set the seed used by hard to reach places | ||
System.setProperty("randomSeed", Long.toString(new Random(seed).nextLong())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PREDICTABLE_RANDOM: This random generator (java.util.Random) is predictable (details)
(at-me in a reply with help
or ignore
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see those benchmark numbers Mark!
arr.reset(); | ||
ByteUtils.UTF8toUTF16(bytes, 0, sz, arr); | ||
return arr.toString(); | ||
return new String(bytes, 0, sz, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why
int sz = ByteUtils.UTF16toUTF8(s, 0, end, bytes, 0); | ||
writeTag(STR, sz); | ||
daos.write(bytes, 0, sz); | ||
byte[] stringBytes = ((String)s).getBytes(StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer re-uses bytes
field. This is better?
@@ -305,129 +350,183 @@ protected Object readObject(DataInputInputStream dis) throws IOException { | |||
return readExternString(dis); | |||
} | |||
|
|||
// if ((tagByte & 0xe0) == 0) { | |||
// if top 3 bits are clear, this is a normal tag | |||
|
|||
switch (tagByte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this before and pondered if it would be more efficient to dispatch to an array of lambdas indexed by the tag Byte? But then I suppose the JIT could internally do such tricks -- dunno if it does.
|
||
|
||
/** | ||
* A subclass of {@link FastOutputStream} that avoids all buffering and writes directly to the {@link OutputStream}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why subclass FastOutputStream if we aren't using its buffer? Are expecting a FOS in some cases (maybe we should be less strict?
@@ -46,7 +46,8 @@ | |||
protected final Map<String,Object> _fields; | |||
|
|||
private List<SolrDocument> _childDocuments; | |||
|
|||
private MyEntryWriter entryWriter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field is a shame; are you sure it is worth it? If it stays, it definitely needs comments to explain it; also referring to this JIRA issue
solr/benchmark/build.gradle
Outdated
@@ -69,6 +69,8 @@ private static String toPath(Set<File> classpathUnderTest) { | |||
dependencies { | |||
implementation project(':solr:test-framework') | |||
|
|||
implementation 'org.apache.logging.log4j:log4j-layout-template-json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care about the logging format for benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's a silly question. Logging has tremendous performance impact. Format has tremendous logging impact.
Regardless, this issue was part of standing up a benchmark system on the benchmark module, and the code from month ago has little bearing. Indicated by initial, initial, and quick exploration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, why log in JSON in particular? Apparently you felt it was important enough that it warranted an additional dependency, which is fine with me, but it's curious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's included in the server jars anyways now: https://github.com/apache/solr/blob/main/solr/server/build.gradle#L75
Review on this WIP code from 29 days has little value. |
8539ac5
to
ec7670e
Compare
SOLR-12555 WIP WIP
I'm trying to follow along but the most recent commit appears to have auto-formatted a bunch of files. Can you please undo that; maybe even force-push a correction? As a project, we've got spotless integrated but not yet enabled for these modules; only one so far. |
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution! |
Initially this is just a quick exploration on decode perf, encode perf still to do.
Basic benchmark will encode/decode a NamedList with a few typical entries as well as 30 SolrDocuments in a SolrDocumentList by default.