Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-15560: Optimize JavaBinCodec encode/decode performance. #255

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

markrmiller
Copy link
Member

Initially this is just a quick exploration on decode perf, encode perf still to do.

Basic benchmark will encode/decode a NamedList with a few typical entries as well as 30 SolrDocuments in a SolrDocumentList by default.

@markrmiller
Copy link
Member Author

I've put an initial before/after comparison for JavaBinCodec decode performance in the JIRA issue: https://issues.apache.org/jira/browse/SOLR-15560

@markrmiller
Copy link
Member Author

So far, initial summary is:

79% improvement for decode performance for this benchmark.
327% improvement for encode performance for this benchmark.

@markrmiller markrmiller changed the title SOLR-15560: Look into JIT optimization in JavaBin. SOLR-15560: Optimize JavaBinCodec encode/decode performance. Aug 11, 2021
@markrmiller markrmiller force-pushed the SOLR-15560 branch 3 times, most recently from 23a1491 to 68d86e6 Compare August 11, 2021 18:30
result.stringProvider= getStringProvider();
return result;
result.stringProvider= getStringProvider();
return result;
}

if (sz > MAX_UTF8_SZ) return _readStr(dis, null, sz);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL_DEREFERENCE: object null is dereferenced by call to _readStr(...) at line 1070.
(at-me in a reply with help or ignore)

Copy link
Contributor

@madrob madrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH won't let me comment inline, but should https://github.com/apache/solr/pull/255/files#diff-d8d1c34f834fa8749d6ce10361f106676c178d412579914bb8db54d6590c82e8L1089 be changed to wrriteByteArray(b.array(), b.arrayOffset() + b.position(), ...)?

log("benchmark random seed: " + seed);

// set the seed used by hard to reach places
System.setProperty("randomSeed", Long.toString(new Random(seed).nextLong()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PREDICTABLE_RANDOM: This random generator (java.util.Random) is predictable (details)
(at-me in a reply with help or ignore)

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see those benchmark numbers Mark!

arr.reset();
ByteUtils.UTF8toUTF16(bytes, 0, sz, arr);
return arr.toString();
return new String(bytes, 0, sz, StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why

int sz = ByteUtils.UTF16toUTF8(s, 0, end, bytes, 0);
writeTag(STR, sz);
daos.write(bytes, 0, sz);
byte[] stringBytes = ((String)s).getBytes(StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer re-uses bytes field. This is better?

@@ -305,129 +350,183 @@ protected Object readObject(DataInputInputStream dis) throws IOException {
return readExternString(dis);
}

// if ((tagByte & 0xe0) == 0) {
// if top 3 bits are clear, this is a normal tag

switch (tagByte) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this before and pondered if it would be more efficient to dispatch to an array of lambdas indexed by the tag Byte? But then I suppose the JIT could internally do such tricks -- dunno if it does.



/**
* A subclass of {@link FastOutputStream} that avoids all buffering and writes directly to the {@link OutputStream}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why subclass FastOutputStream if we aren't using its buffer? Are expecting a FOS in some cases (maybe we should be less strict?

@@ -46,7 +46,8 @@
protected final Map<String,Object> _fields;

private List<SolrDocument> _childDocuments;

private MyEntryWriter entryWriter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field is a shame; are you sure it is worth it? If it stays, it definitely needs comments to explain it; also referring to this JIRA issue

@@ -69,6 +69,8 @@ private static String toPath(Set<File> classpathUnderTest) {
dependencies {
implementation project(':solr:test-framework')

implementation 'org.apache.logging.log4j:log4j-layout-template-json'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the logging format for benchmarking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's a silly question. Logging has tremendous performance impact. Format has tremendous logging impact.

Regardless, this issue was part of standing up a benchmark system on the benchmark module, and the code from month ago has little bearing. Indicated by initial, initial, and quick exploration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, why log in JSON in particular? Apparently you felt it was important enough that it warranted an additional dependency, which is fine with me, but it's curious to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's included in the server jars anyways now: https://github.com/apache/solr/blob/main/solr/server/build.gradle#L75

@markrmiller
Copy link
Member Author

Review on this WIP code from 29 days has little value.

@markrmiller markrmiller force-pushed the SOLR-15560 branch 2 times, most recently from 8539ac5 to ec7670e Compare September 15, 2021 20:24
@dsmiley
Copy link
Contributor

dsmiley commented Sep 15, 2021

I'm trying to follow along but the most recent commit appears to have auto-formatted a bunch of files. Can you please undo that; maybe even force-push a correction? As a project, we've got spotless integrated but not yet enabled for these modules; only one so far.

Copy link

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale PR not updated in 60 days
Projects
None yet
4 participants