SOLR-16812: Support CBOR format for update/query by noblepaul · Pull Request #1655 · apache/solr

noblepaul · 2023-05-22T08:16:31Z

No description provided.

solr/solrj/src/java/org/apache/solr/client/solrj/request/GenericSolrRequest.java

dsmiley

Cool to see this new format!

dsmiley · 2023-05-30T22:44:11Z

solr/core/src/test/org/apache/solr/util/TestCborDataFormat.java

+import org.apache.solr.handler.loader.CborStream;
+import org.apache.solr.response.XMLResponseWriter;
+
+public class TestCborDataFormat extends SolrCloudTestCase {


IMO our request/response formats, to include this new one, should be tested in a general way without a dedicated test, especially not a SolrCloud test. For example, we randomly pick compatible substitutes for the codec. Shouldn't we do the same for the underlying "wt" encoding too?

solr/core/src/java/org/apache/solr/handler/loader/CborStream.java

solr/core/src/java/org/apache/solr/response/CborResponseWriter.java

solr/core/src/java/org/apache/solr/handler/UpdateRequestHandler.java

gerlowskija

The CBOR serialization code itself looks OK, but the test/benchmarking code could really use some improvement IMO.

I'm also a little concerned about what's not in this PR. In particular: documentation anywhere about the new format. And, less crucially, deprecation for javabin (or guidance around when users should use javabin vs cbor).

gerlowskija · 2023-06-05T15:25:01Z

solr/core/src/test/org/apache/solr/util/TestCborDataFormat.java

+
+      byte[] b =
+          Files.readAllBytes(
+              new File(ExternalPaths.SOURCE_HOME, "example/films/films.json").toPath());


[Q] Does the films dataset have sufficient variety of field types to fully validate the cbor serialization/deserialization logic? Is there testing elsewhere that would make sure that cbor serialization doesn't blow up when presented with e.g. a 'binary' or 'pint' field

gerlowskija · 2023-06-05T16:10:01Z

solr/core/src/test/org/apache/solr/util/TestCborDataFormat.java

+      byte[] b =
+          Files.readAllBytes(
+              new File(ExternalPaths.SOURCE_HOME, "example/films/films.json").toPath());
+      // every operation is performed twice. We should only take the second number


[-0] Is this class a test meant to validate correctness, or a benchmark for measuring performance? I think both are needed and valuable; it's great you thought of both. But I think it's a mistake to combine the two in a single class. Especially when we have a whole "benchmark" module built for exactly the sort of measurements you're taking. It offers tools (e.g. JMH annotations, data generators) that make it easy to test performance much more robustly than is feasible to do here. (See my comment below).

[-1] It's a good start, but I think there's a handful of problems with how this JUnit method gathers numbers around CBOR queries, that make it misleading for gathering even anecdotal performance data.

I'm sure you've done other benchmarking at this point to prove out the CBOR idea, but so far these are the only numbers the rest of the community has seen. So we need to make sure they're rock solid! In particular, here are a few of my concerns:

Even for tiny methods called in a tight loop, JVMs often take more than n=2 iterations to optimize bytecode. The 'N' here is just too small to assume the JVM is fully "hot" on the ser-de code for each format. Even ignoring JVM optimizations, n=2 is too small to rule out random noise (other processes competing for CPU, JVM GC, etc.) on the host machine.

The order in which you're testing formats likely skews results. e.g. The two JSON queries run on an almost entirely cold JVM, whereas CBOR queries run when the JVM, OS caches, Solr caches, etc. are considerably warmer.

"films.json" is pretty small as a data set and homogenous in terms of a few field-types. I worry that would make the gathered data unrepresentative and noisy.

I think all three of these would be easy to address if the code was moved to the benchmarking module. It has data generators for building larger and more diverse datasets. JMH has a lot of support built in that make it easy to configure "Warming" iterations that get ignored in the collected statistics. etc.

gerlowskija · 2023-06-05T16:15:59Z

solr/core/src/test/org/apache/solr/util/TestCborDataFormat.java

+    request.setResponseParser(new InputStreamResponseParser(wt));
+    result = client.request(request, testCollection);
+    byte[] b = copyStream((InputStream) result.get("stream"));
+    System.out.println(wt + "_time : " + timer.getTime());


[Q] So, as I read this code - runQuery will capture CBOR serialization in its perf numbers but not deserialization since we're not deserializing the received data on the client side.

Am I reading that right? And if so, is that intentional?

[0] Also, doesn't forbiddenApis complain about the System.out.println usage here? I swear that was one of the things we disallowed, though maybe we're looser about that in our tests...

noblepaul and others added 11 commits May 15, 2023 22:36

added a CBOR responseWriter

296fd62

tidy

5b046e5

aded a loader

ff84e1d

added test

7ea246b

added round trip test

7a39cd6

addedmore tests

285faf3

updated versions.lock

223cfb8

test fix

dbce97d

Merge branch 'apache:main' into noble/wt_cbor

05665ef

test improved

09aa831

test improved

3a14299

mkhludnev reviewed May 25, 2023

View reviewed changes

solr/solrj/src/java/org/apache/solr/client/solrj/request/GenericSolrRequest.java Show resolved Hide resolved

mkhludnev approved these changes May 25, 2023

View reviewed changes

SuppressForbidden

de456e8

dsmiley reviewed May 30, 2023

View reviewed changes

noblepaul added 4 commits June 1, 2023 14:42

javadocs

eb71795

got rid of System.currTimeMillis()

6bdfdd4

Added test

5ae4b86

refactored test

6cf43fc

noblepaul merged commit faf5d3c into apache:main Jun 5, 2023

noblepaul added a commit that referenced this pull request Jun 5, 2023

SOLR-16812: Support CBOR format for update/query (#1655)

95f2ea9

gerlowskija reviewed Jun 5, 2023

View reviewed changes

noblepaul added a commit to cowpaths/fullstory-solr that referenced this pull request Jun 12, 2023

SOLR-16812: Support CBOR format for update/query (apache#1655)

c7c32a9

epugh pushed a commit that referenced this pull request Jun 21, 2023

SOLR-16812: Support CBOR format for update/query (#1655)

34c90c0

hiteshk25 pushed a commit to cowpaths/fullstory-solr that referenced this pull request Jul 11, 2023

SOLR-16812: Support CBOR format for update/query (apache#1655)

d43dfc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-16812: Support CBOR format for update/query#1655

SOLR-16812: Support CBOR format for update/query#1655
noblepaul merged 16 commits intoapache:mainfrom
cowpaths:noble/wt_cbor

noblepaul commented May 22, 2023

Uh oh!

Uh oh!

dsmiley left a comment

Uh oh!

dsmiley May 30, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gerlowskija left a comment

Uh oh!

gerlowskija Jun 5, 2023

Uh oh!

gerlowskija Jun 5, 2023

Uh oh!

gerlowskija Jun 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

noblepaul commented May 22, 2023

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

dsmiley May 30, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gerlowskija left a comment

Choose a reason for hiding this comment

Uh oh!

gerlowskija Jun 5, 2023

Choose a reason for hiding this comment

Uh oh!

gerlowskija Jun 5, 2023

Choose a reason for hiding this comment

Uh oh!

gerlowskija Jun 5, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants