Refactor ResponseContext #11828

paul-rogers · 2021-10-22T00:39:07Z

Description

Fixes a number of annoying issues in preparation for request trailers and the query profile.

Converts keys from an enum to classes for smaller code
Wraps stored values in functions for easier capture for other uses
Reworks the "header squeezer" to handle types other than arrays.
Uses metadata for visibility, and ability to compress, to replace ad-hoc code.
Cleans up JSON serialization for the response context.
Other miscellaneous cleanup.

Based on a prototype by Gian.

Preparing for response trailers and query profile

The query profile work needs a way to return profile information at the completion of a query. It turns out Gian had prototyped just such a mechanism. To do that, he modified the response context to make it clear which fields are returned in the trailer vs. the header. vs. not at all.

The query profile will use the response context as a "back door" to sneak metrics information out of each QueryRunner.run() call. This PR contains just the refactoring done by those projects without the additional complexity of the actual trailer or profile code.

Key metadata

ResponseContext provides a set of key values that define the values allowed in the the context. It seems that those values were originally for a small set of request header values, but grew over time to include internal-only values. With the addition of the response trailer (coming in a later PR), we'll also have footer values.

We have code that knows which names should be in the header and picks those out. However, extensions can add keys, which makes a hard-code approach brittle. This PR adds metadata to each key to identify its visibility. (Based on work originally done by Gian.)

Expanded header truncation support

The code can drop (or truncate) header fields. The code hard-codes those fields that can be truncated, and assumes they are arrays. This PR adds the "droppable" state as metadata in the key, and handles non-array fields when dropping. This ensures that extensions get the same "droppable" behavior as core code.

Keys as classes, not enums

Keys are defined as an enum. Enums define a fixed set of values. Yet, we allow extensions to "extend" the set of keys. In this case, it is easier to define keys as objects not in an enum.

With this move, we can simplify key definitions. Enum values can't be instances of a class. But, with the additional responsibilities placed on keys, the enum structured forced quite a bit of redundant code. Changing keys to objects let's us use the usual code reuse techniques to avoid redundancy.

Explicit accessors

The code to use the response context usually looks like this:

responseContext.add(ResponseContext.Key.CPU_CONSUMED_NANOS, cpuTimeNs);

That is rather verbose. And, if we wanted to capture the value for use in, say, metrics, we'd have to hunt down all the places that update the response context and add code to update metrics. This PR provides accessors instead:

responseContext.addCpuNanos(cpuTimeNs);

Release notes

This PR changes the the ResponseContext and it's keys in a breaking way. Any extension that used the prior way to add keys to the response context will need to change the code a bit. However, such changes should be simple.

The prior version of the response context suggested that keys be defined in an enum, then registered. This version suggests that keys be defined as objects, then registered. See the ResponseContext class itself for the details.

Note that this change does not impact any extension unless that extension adds customer response context keys.

Summary

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a local Druid cluster.
release note explanation listed above

paul-rogers · 2021-10-29T02:29:28Z

Build is complaining about lack of code coverage in QueryResource.java where I replaced one get method with another. This seems to be a case where the code changed, and we're enforcing new rules that perhaps weren't in place when this code was first added. How do we handle this case?

clintropolis

had a first pass; I think changes look good overall, but am going to try scan through again soon since I forgot how complicated response context stuff was until I started this review and would like to double check 😅

clintropolis · 2021-11-02T00:31:41Z

processing/src/main/java/org/apache/druid/query/context/ResponseContext.java

 @PublicApi
 public abstract class ResponseContext
 {
  /**
   * The base interface of a response context key.
   * Should be implemented by every context key.
   */
-  public interface BaseKey
+  public interface Key


I tagged this PR with design review and release notes labels because of this change.

ResponseContext is marked @PublicApi, which i guess means this part of the code is extensible, so any custom extensions depending on BaseKey will need to be modified to now implement Key (which does not have default implementations, so it would need this label even if not renamed).

This is probably fine assuming that it is still possible for extensions to define custom response context information, just calling it out

@clintropolis, you are right. As it turned out, none of the extensions in the Druid code repo itself actually had this issue, which suggested that, while this is an extension point, it may not actually have many (or any) extensions. So, the benefit of the simpler code (especially as we add the response trailer) seemed worth the risk.

I wonder, how do we normally communicate such potentially breaking changes?

Hmm, I think this interface should be marked ExtensionPoint. PublicApi doesn't seem enough because PublicAPI is not meant to be subclassed, but only means it's stable for callers. @paul-rogers can you add the ExtensionPoint annotation for this interface? Even though it's not marked, I agree that this change should be called out in the release notes.

Added @ExtensionPoint. How do I record this info so it is available when we prepare release notes?

The one you added in the PR description seems good enough to me 🙂

What do you all think about tagging this with @UnstableApi too? I think it's a good idea, because:

This is not a common extension point

We may be changing it more in the future as part of additional query trailer work

So, thinking about compatibility in each of those changes will be a hassle without much benefit.

It sounds good to me 👍

processing/src/main/java/org/apache/druid/query/context/ResponseContextDeserializer.java

jihoonson

This change LGTM overall. I left some comments about javadoc and test besides trivial comments. @paul-rogers can you please resolve the conflict as well?

server/src/main/java/org/apache/druid/server/QueryResource.java

server/src/main/java/org/apache/druid/query/RetryQueryRunner.java

processing/src/main/java/org/apache/druid/query/context/ResponseContextDeserializer.java

processing/src/main/java/org/apache/druid/query/context/ResponseContext.java

jihoonson · 2021-11-20T05:21:34Z

processing/src/main/java/org/apache/druid/query/context/ResponseContext.java

+     * Keys that are present in both the "X-Druid-Response-Context" header *and* the response context trailer.
+     */
+    HEADER,


What sort of keys will be in both the header and the trailer? Also, should it be something like HEADER_AND_TRAILER?

That's a good question for the next PR. Here, it seems that, in picking out this set of changes, I picked the wrong comment. Fixed that.

Thanks. I'm sill curious about what keys will be in both the header and the trailer 🙂 Will the next PR need to modify this enum, such as adding a new Visibility type?

In Gian's query trailer prototype uncoveredIntervals uncoveredIntervalsOverflowed, missingSegments, cpuConsumed, and a new metrics item are trailer entries. In the query profile work, the profile is also in the trailer only. The final set of values will be HEADER, TRAILER, HEADER_AND_TRAILER, NONE.

Having just written this, it may make more sense to have two Booleans rather than an all-possible-combination enum. What do you think?

Like isVisibleInHeader and isVisibleInTrailer? That sounds reasonable to me. You can do it in your follow-up PR if you want.

@jihoonson, went ahead and converted Visibility to a Boolean, since I had to make another fix as well. For now, only the header flag appears: the next PR will add the footer flag.

paul-rogers · 2021-11-26T20:08:35Z

Failing in 34294.18 (openjdk8) server module test due to previously-untested code in QueryResource.

Original line:

      if (prevEtag != null && prevEtag.equals(responseContext.get(ResponseContext.Key.ETAG))) {

Revised line:

      if (prevEtag != null && prevEtag.equals(responseContext.getEntityTag())) {

All we're doing is calling a method to get the ETAG value rather than grabbing it directly.

Error:

Diff coverage statistics:
------------------------------------------------------------------------------
|     lines      |    branches    |   functions    |   path
------------------------------------------------------------------------------
| 100% (2/2)     |  25% (1/4)     |  50% (1/2)     | org/apache/druid/server/QueryResource.java
------------------------------------------------------------------------------

ERROR: Insufficient branch coverage of 25% (1/4). Required 50%.

This is odd because QueryResource cannot be tested (AFAIK) in a unit test since we cannot run a Druid server in a unit test. The ETAG, I'm told, has to do with query caching. Presumably, we do test query caching somewhere - integration tests?

It seems this a case where the original author was not obligated to ensure sufficient tests exist, but us folks who come along later are required to rectify that oversight? In code that, seemingly, is not designed to allow unit testing?

Suggestions?

jihoonson

@paul-rogers thanks for addressing comments. You may haven't pushed your last change. Please check #11828 (comment).

For the coverage check failure, I think we have 3 options similar to what I suggested in #11822 (comment).

Using your hack to work around the check.
Ignoring the test coverage check failure. This makes sense only when all tests in the phase 2 pass. I can trigger the phase 2 tests if we decide to go down this path.
Adding a new test. I prefer this option most since it doesn't seem very hard to do it. I'm thinking of adding a new test in QueryResourceTest that verifies the behavior when the etag matches to the previous one. Since this is a unit test, I would say you don't have to test the e2e behavior of result-level cache here. Instead, you can just verify what is returned when etag matches. For easy testing, you may want to mock QueryLifecycleTest and QueryLifecycle, so that you don't have to worry about query processing at all.

jihoonson · 2021-11-30T17:52:52Z

@paul-rogers nice test 🙂 I restarted the failed test which seems flaky. Besides the test failure, please check my comment in #11828 (comment). Other changes LGTM.

@clintropolis do you have more comments?

Fixes a number of issues in preparation for request trailers and the query profile. * Converts keys from an enum to classes for smaller code * Wraps stored values in functions for easier capture for other uses * Reworks the "header squeezer" to handle types other than arrays. * Uses metadata for visibility, and ability to compress, to replace ad-hoc code. * Cleans up JSON serialization for the response context. * Other miscellaneous cleanup.

paul-rogers · 2021-11-30T21:19:52Z

Squashed commits and rebased on master to resolve merge conflicts.

gianm

The refactoring looks good to me. It looks like the intent is not to actually change behavior right now, except for dropping certain keys from the response context header that didn't really make sense there anyway. (Some of them are clearly for internal state tracking; some of them couldn't possibly have been populated properly by the time the HTTP header needs to be generated.)

I had a few comments on individual pieces; please LMK what you think.

processing/src/main/java/org/apache/druid/query/context/ResponseContextDeserializer.java

gianm · 2021-11-30T23:59:49Z

processing/src/main/java/org/apache/druid/query/context/ResponseContextDeserializer.java

+
+    ResponseContext.Keys keys = ResponseContext.Keys.instance();
+    while (jp.currentToken() == JsonToken.FIELD_NAME) {
+      final ResponseContext.Key key = keys.keyOf(jp.getText());


I think we'll want some handling here for the case where a data server generates a key that the Broker hasn't heard of. It may happen when we add new keys. Skipping them seems reasonable.

@gianm, excellent point. This can occur during a rolling upgrade when, say, the historical is newer than the broker.

To handle this, added code to "freewheel" over the JSON value, whether it be a scalar or structured. Added tests to match.

This seems like a generic operation, might we have something in one of the utilities? A quick search didn't find anything.

I don't see anything for this in JacksonUtils, which is probably where it would be if we had it. I guess it's because we usually deserialize using annotations rather than writing Deserializers directly, and the annotation-based deserialization mechanism has built-in logic to skip unknown properties.

Also, make "Visibility" into a boolean.

paul-rogers · 2021-12-02T06:16:19Z

Tests got to the second stage (finally!), but failed in the Kafka test with an apparently unrelated issue:

org.apache.druid.java.util.common.ISE: Max number of retries[50] exceeded for Task[waiting for autoScaling task numbers from 1 to 2]. Failing.
	at org.apache.druid.testing.utils.ITRetryUtil.retryUntil(ITRetryUtil.java:94) ~[druid-integration-tests-0.23.0-SNAPSHOT.jar:0.23.0-SNAPSHOT]

jihoonson

LGTM, but please rename the variable as in my comment.

jihoonson · 2021-12-02T21:57:29Z

processing/src/main/java/org/apache/druid/query/context/ResponseContext.java

     * would be called only from class static blocks.
     */
-    private static final ConcurrentMap<String, BaseKey> REGISTERED_KEYS = new ConcurrentSkipListMap<>();
+    private final ConcurrentMap<String, Key> registered_keys = new ConcurrentSkipListMap<>();


Suggested change

private final ConcurrentMap<String, Key> registered_keys = new ConcurrentSkipListMap<>();

private final ConcurrentMap<String, Key> registeredKeys = new ConcurrentSkipListMap<>();

Suggested change

private final ConcurrentMap<String, Key> registered_keys = new ConcurrentSkipListMap<>();

private final ConcurrentMap<String, Key> registered_keys = new ConcurrentSkipListMap<>();

abhishekagarwal87 · 2021-12-06T08:16:44Z

@jihoonson - is this PR good to merge now?

jihoonson · 2021-12-07T01:03:29Z

Merged. Thanks @paul-rogers!

paul-rogers force-pushed the 211020-context branch from aea5a6b to 80abece Compare October 22, 2021 00:53

asdf2014 added the Area - Querying label Oct 26, 2021

clintropolis added Design Review Release Notes labels Nov 2, 2021

clintropolis reviewed Nov 2, 2021

View reviewed changes

jihoonson reviewed Nov 20, 2021

View reviewed changes

paul-rogers force-pushed the 211020-context branch from 5bbfbf5 to df7ae51 Compare November 25, 2021 21:07

jihoonson reviewed Nov 29, 2021

View reviewed changes

clintropolis approved these changes Nov 30, 2021

View reviewed changes

paul-rogers force-pushed the 211020-context branch from 25b1c56 to 71a5039 Compare November 30, 2021 21:19

gianm reviewed Dec 1, 2021

View reviewed changes

paul-rogers added 2 commits November 30, 2021 21:33

Handle unknown keys in deserialization

f25afff

Also, make "Visibility" into a boolean.

Revised comment

dffea80

jihoonson approved these changes Dec 2, 2021

View reviewed changes

Renamd variable

0cabb0b

jihoonson merged commit 34a3d45 into apache:master Dec 7, 2021

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

abhishekagarwal87 mentioned this pull request Jun 22, 2022

[Draft] 0.23.0 Release notes #12510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ResponseContext #11828

Refactor ResponseContext #11828

paul-rogers commented Oct 22, 2021 •

edited

Loading

paul-rogers commented Oct 29, 2021

clintropolis left a comment

clintropolis Nov 2, 2021

paul-rogers Nov 11, 2021

jihoonson Nov 20, 2021

paul-rogers Nov 22, 2021

jihoonson Nov 29, 2021

gianm Dec 1, 2021

jihoonson Dec 1, 2021

jihoonson left a comment

jihoonson Nov 20, 2021

paul-rogers Nov 22, 2021

jihoonson Nov 29, 2021

paul-rogers Nov 30, 2021

jihoonson Nov 30, 2021

paul-rogers Dec 1, 2021

paul-rogers commented Nov 26, 2021

jihoonson left a comment

jihoonson commented Nov 30, 2021 •

edited

Loading

paul-rogers commented Nov 30, 2021

gianm left a comment

gianm Nov 30, 2021

paul-rogers Dec 1, 2021

gianm Dec 1, 2021

paul-rogers commented Dec 2, 2021

jihoonson left a comment

jihoonson Dec 2, 2021

abhishekagarwal87 commented Dec 6, 2021

jihoonson commented Dec 7, 2021

	private final ConcurrentMap<String, Key> registered_keys = new ConcurrentSkipListMap<>();
	private final ConcurrentMap<String, Key> registeredKeys = new ConcurrentSkipListMap<>();

Refactor ResponseContext #11828

Refactor ResponseContext #11828

Conversation

paul-rogers commented Oct 22, 2021 • edited Loading

Description

Preparing for response trailers and query profile

Key metadata

Expanded header truncation support

Keys as classes, not enums

Explicit accessors

Release notes

Summary

paul-rogers commented Oct 29, 2021

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers commented Nov 26, 2021

jihoonson left a comment

Choose a reason for hiding this comment

jihoonson commented Nov 30, 2021 • edited Loading

paul-rogers commented Nov 30, 2021

gianm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers commented Dec 2, 2021

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Dec 6, 2021

jihoonson commented Dec 7, 2021

paul-rogers commented Oct 22, 2021 •

edited

Loading

jihoonson commented Nov 30, 2021 •

edited

Loading