-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialize dimensions in group by queries to their respective types when reading from their serialized format #16511
Conversation
@@ -216,4 +216,9 @@ default boolean equals(T a, T b) | |||
{ | |||
throw DruidException.defensive("Not implemented. Check groupable() first"); | |||
} | |||
|
|||
default Class<?> complexDimensionType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe getComplexClass
or the classic getClazz
would be a better name since it probably isn't a different type when used as a dimension (and we have the separate groupable
to determine if it can be used as a dimension or not)
} | ||
|
||
public ObjectStrategyComplexTypeStrategy( | ||
ObjectStrategy<T> objectStrategy, | ||
TypeSignature<?> signature, | ||
@Nullable final Hash.Strategy<T> hashStrategy | ||
@Nullable final Hash.Strategy<T> hashStrategy, | ||
@Nullable final Class<?> complexDimensionType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this just use getClazz
of ObjectStrategy
?
final ObjectMapper jsonMapper, | ||
final ObjectMapper spillMapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should jsonMapper and spillMapper be different? I know spillMapper is usually smile, though isn't results from historicals also smile? regardless, probably worth a comment somewhere of why we have different mappers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing the code, this seemed natural, since we are using spillMapper for the spilling runner, and nothing else. After going through the group by code again, I think we should still be using the jsonMapper.
The input rows which get further deserialized according to the complex types are sourced from:
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java#L90 - The input rows are from the processed subqueries.
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java#L254C54-L254C55 - The input rows are to the mergeRunner, which has already been read from the segments by the createRunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested stuff, serde in (2) is not required.
26ef2a0
to
7d70e53
Compare
Thanks for the review @clintropolis @asdf2014 |
Description
Grouping on complex columns allowed the dimensions to be of complex types. When reading the dimensions from their serialized format, we must deserialize them in their desired complex type. Otherwise, Druid deserializes the dimension into a generic java object equivalent to
objectMapper.readValue(serializedValue, Object.class)
We read the dimensions from their serialized form when reading dimensions over the wire (in native queries)
This wasn't discovered at the time of the original PR because only JSON columns support groupability at the moment, and they are designed to handle the generic Java object that the object mapper deserializes to by default. However, this is required for most other complex types that require groupability.
Interface change
Benchmarking - I have ran the
GroupByBenchmark.queryMultiQueryableIndexWithSerde
benchmark with limited iterations and here are the results before and after the change.Release note
(none)
Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz
This PR has: