Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping on complex columns aka unifying GroupBy strategies #16068

Merged
merged 50 commits into from Apr 24, 2024

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Mar 7, 2024

Description

User Impact

Once the core functionality is in, users can pass complex types as dimensions to the group by queries. For example:

SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1

Making grouping strategies type agnostic

The grouping engine requires a way to address a dimension and store a fixed-width representation of that dimension on a buffer. This can vary for different types, for example, longs can be represented as is, while strings require a dictionary (runtime, or precomputed) so that a variable length string can be addressed by a fixed width dictionary id.
Currently, we have a separate strategy for each type. For enabling grouping on complex columns (at the engine level), we need these strategies to be type-agnostic, so that they can work with the supported complex types, without knowing the types. We can also unify the pre-existing types so that the grouping strategies are classified based on the work (and dimensions) that they handle, rather than being specialized for each type. These are the following classifications:

  1. Fixed width types - These include numeric primitives. In the future, we can optimize fixed-width complex types, like IPv4 and Geo types to use this strategy instead, with some hints from the type strategy, that the objects can be represented in fixed-width columns. Such types can be serialized into their representations without being mapped to a dictionary, as they are fixed widths.

  2. Variable width types with prebuilt dictionaries - These include string types with prebuilt dictionaries. In the future, once we have a way of extracting array dictionaries from the "array columns selectors", we can have array types to use this strategy as well. Such types can be represented using their prebuilt dictionaryId

  3. Variable width types without prebuilt dictionaries - These include any variable type without a prebuilt dictionary, including complex types. We need to build a dictionary + reverse dictionary corresponding to each key and represent the key using the dictionaryId. This also requires a way to estimate the size of the dictionaries and address the key on the dictionary - which requires some interface changes.

Implementing a hashing strategy for complex types

Group by on complex columns requires that the complex types are addressable from a reverse dictionary.

Currently, all the reverse dictionaries are of pre-known types, and hence the dictionary implementations themselves (hash maps) supply the hashCode and equals required to address the object on the dictionary. Dictionaries can be of two types, and we lay down the ins and outs of addressing an object on both the types:

Associative array, i.e. HashMap (Object2IntOpenHashMap)

This is currently used throughout the code.

For any type to be added to a map, it requires the correct implementation of two default methods of Object class

  1. hashCode() : It is required to associate a position in the hash table with the object. Same objects must have the same hashCodes, while different objects can have the same hashcodes as well, which leads to the following point.

  2. equals(): It is required for disambiguating the objects in case of collisions, i.e. when they have the same hashCode.

Sorted Set (Object2IntRBTreeMap/ Object2IntAvlTreeMap)

This would be a change in the existing dictionary-building strategy, therefore it will affect the performance of array types and string types without a prebuilt dictionary. Hence this change is benchmarked.
The benefit of using this would be that we can reuse the comparators for addressing the objects in the arrays.
However, this affects the performance of the grouping engine, therefore this design is dropped (check the benchmarks section).

Grouping on non useful types

Should grouping on types like HLLs, Pairs etc (used for earliest/latest), be allowed or not? There isn’t any practical use case for supporting these types, however, they can fall under the catch-all umbrella, where we serde the type, and compare the byte representation. Therefore, the major question is to or not to?

Currently, grouping on these types is disallowed, because they are often counter-intuitive to what the users want to measure. For example, the user might want to group on the finalized estimated count, and rather they are grouping on the sketch itself.

Interface changes

Following interface changes have been proposed to implement the stuff above:

TypeStrategy.java

public interface TypeStrategy<T> extends Comparator<Object>, Hash.Strategy<T>
{

  ....
  // Whether the type is groupable
  default boolean groupable()
  {
    return false;
  }

  // Implement for hashing complex dimensions
  @Override
  default int hashCode(T o)
  {
    throw DruidException.defensive("Not implemented. Check groupable() first");
  }

  // Implement for hashing complex dimensions
  @Override
  default boolean equals(T a, T b)
  {
    throw DruidException.defensive("Not implemented. Check groupable() first");
  }
  .......
}

Benchmarks

The performance of the current and the changes made using this patch are equivalent, while if we use the sorted sets, it degrades, and hence that approach is rejected

  • master i.e. current
Benchmark                                  (groupingDimension)  Mode  Cnt      Score       Error  Units
SqlGroupByBenchmark.querySql    stringArray-Sequential-100_000  avgt    5    946.754 ±  198.785  ms/op
SqlGroupByBenchmark.querySql  stringArray-Sequential-3_000_000  avgt    5  41515.654 ± 4081.663  ms/op
SqlGroupByBenchmark.querySql        stringArray-ZipF-1_000_000  avgt    5    435.283 ±   21.554  ms/op
SqlGroupByBenchmark.querySql     stringArray-Uniform-1_000_000  avgt    5   7669.069 ±  629.368  ms/op
SqlGroupByBenchmark.querySql      longArray-Sequential-100_000  avgt    5    820.420 ±   58.626  ms/op
SqlGroupByBenchmark.querySql    longArray-Sequential-3_000_000  avgt    5  43184.769 ± 6502.256  ms/op
SqlGroupByBenchmark.querySql          longArray-ZipF-1_000_000  avgt    5    319.639 ±   17.426  ms/op
SqlGroupByBenchmark.querySql       longArray-Uniform-1_000_000  avgt    5   6242.893 ±  279.628  ms/op
  • this branch + using associative array i.e. post this patch
Benchmark                                  (groupingDimension)  Mode  Cnt      Score       Error  Units
SqlGroupByBenchmark.querySql    stringArray-Sequential-100_000  avgt    5    852.758 ±   136.371  ms/op
SqlGroupByBenchmark.querySql  stringArray-Sequential-3_000_000  avgt    5  41204.688 ±  8408.370  ms/op
SqlGroupByBenchmark.querySql        stringArray-ZipF-1_000_000  avgt    5    390.198 ±    12.634  ms/op
SqlGroupByBenchmark.querySql     stringArray-Uniform-1_000_000  avgt    5   5509.581 ±   451.345  ms/op
SqlGroupByBenchmark.querySql      longArray-Sequential-100_000  avgt    5    660.656 ±    57.070  ms/op
SqlGroupByBenchmark.querySql    longArray-Sequential-3_000_000  avgt    5  47083.740 ± 20552.695  ms/op
SqlGroupByBenchmark.querySql          longArray-ZipF-1_000_000  avgt    5    300.093 ±    38.598  ms/op
SqlGroupByBenchmark.querySql       longArray-Uniform-1_000_000  avgt    5   5725.369 ±   725.885  ms/op

*this branch + using sorted sets i.e. discarded dictionary building approach

Benchmark                                  (groupingDimension)  Mode  Cnt      Score       Error  Units
SqlGroupByBenchmark.querySql    stringArray-Sequential-100_000  avgt    5   1221.906 ±   64.073  ms/op
SqlGroupByBenchmark.querySql  stringArray-Sequential-3_000_000  avgt    5  51887.836 ± 2145.206  ms/op
SqlGroupByBenchmark.querySql        stringArray-ZipF-1_000_000  avgt    5   1013.708 ±  110.974  ms/op
SqlGroupByBenchmark.querySql     stringArray-Uniform-1_000_000  avgt    5   9127.126 ±  183.487  ms/op
SqlGroupByBenchmark.querySql      longArray-Sequential-100_000  avgt    5    819.836 ±   26.940  ms/op
SqlGroupByBenchmark.querySql    longArray-Sequential-3_000_000  avgt    5  41988.609 ± 3683.729  ms/op
SqlGroupByBenchmark.querySql          longArray-ZipF-1_000_000  avgt    5    695.684 ±   21.028  ms/op
SqlGroupByBenchmark.querySql       longArray-Uniform-1_000_000  avgt    5   7877.897 ± 1755.698  ms/op
SqlGroupByBenchmark.querySql         nested-Sequential-100_000  avgt    5     77.833 ±    7.598  ms/op
SqlGroupByBenchmark.querySql       nested-Sequential-3_000_000  avgt    5    157.890 ±    7.128  ms/op
SqlGroupByBenchmark.querySql             nested-ZipF-1_000_000  avgt    5    103.281 ±    8.962  ms/op

Future work

Once the core functionality is up, we can have the following previously alluded to improvements for faster grouping of specific types:

  • Implement a group by on complex columns and arrays for vectorized query engine. This can be done in a follow-up PR, without requiring an interface change.

  • Once there’s a way to extract the dictionaries from array selectors, we can reuse those dictionaries, i.e. strategy (2), rather than rebuilding the dictionaries. This requires a new selector class/interface like ArrayColumnValueSelector, akin to DimensionSelector which would be a massive code change across Druid’s codebase

  • We can hint to the grouping engine about a few complex types being fixed width beforehand - these include the IP types. This would prevent the need for having a dictionary associated with the type, and instead, they can use strategy (1) to directly serialize the type onto the buffer. This will require adding methods to the public interface, which I wanna defer to the future patches

Release note


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -413,9 +414,12 @@ public static Object convertObjectToType(
case DOUBLE:
return coerceToObjectArrayWithElementCoercionFunction(obj, DimensionHandlerUtils::convertObjectToDouble);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to add a case here for other array types to just recursively call this method on the element type for nested and non-primitive arrays

Comment on lines 52 to 54
static {
ComplexMetrics.registerSerde(ColumnType.NESTED_DATA.getComplexTypeName(), new NestedDataComplexTypeSerde());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be here, it happens in the module, if a test is needing this then just register manually for the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, definitely didn't intend that there.

@@ -380,7 +380,7 @@ public VerifyNativeQueries(BaseExecuteQuery execStep)
public void verify()
{
for (QueryResults queryResults : execStep.results()) {
verifyQuery(queryResults);
// verifyQuery(queryResults);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this i assume was for testing some stuff and needs to go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I didn't wanna write the query to verify that it was working. That's why I have also commented it out with Intellij style comment (s.t. it'd fail during checkstyle).

Comment on lines +78 to +93
case ARRAY:
switch (capabilities.getElementType().getType()) {
case LONG:
case STRING:
case DOUBLE:
return DictionaryBuildingGroupByColumnSelectorStrategy.forType(capabilities.toColumnType());
case FLOAT:
// Array<Float> not supported in expressions, ingestion
default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());

}
case COMPLEX:
return DictionaryBuildingGroupByColumnSelectorStrategy.forType(capabilities.toColumnType());
default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no reason other array types cannot group in the same way, so I think we can just do this

Suggested change
case ARRAY:
switch (capabilities.getElementType().getType()) {
case LONG:
case STRING:
case DOUBLE:
return DictionaryBuildingGroupByColumnSelectorStrategy.forType(capabilities.toColumnType());
case FLOAT:
// Array<Float> not supported in expressions, ingestion
default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());
}
case COMPLEX:
return DictionaryBuildingGroupByColumnSelectorStrategy.forType(capabilities.toColumnType());
default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());
case ARRAY:
case COMPLEX:
default:
return DictionaryBuildingGroupByColumnSelectorStrategy.forType(capabilities.toColumnType());

super(keyBufferPosition);
this.keyBufferPosition = keyBufferPosition;
this.complexType = complexType;
this.complexTypeName = Preconditions.checkNotNull(complexType.getComplexTypeName(), "complex type name expected");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if instead of complexType.getComplexTypeName() we just use complexType.asTypeString() we can re-use this strategy generically for things like nested arrays or arrays of complex types too. Suggest renaming to GenericDictionaryBuildingRowBasedKeySerdeHelper or something.

I suppose we could pack dictionaries for all types into this since they are all basically the same, to have less stuff to track... but it also seems ok to leave the strings and primitive arrays handled separately in case we want to further optimize stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it separate in case we wanna revisit using the hashmaps instead of sortedMaps for them.

// TODO(laksh): Check if calling .getObject() on primitive selectors be problematic??
// Convert the object to the desired type
//noinspection unchecked
return (T) DimensionHandlerUtils.convertObjectToType(columnValueSelector.getObject(), columnType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if there is a nicer way we could do this by letting the thing that creates this supply a value supplier method that could be backed by whatever makes sense for the selector type, e.g. something like

  public FixedWidthGroupByColumnSelectorStrategy(
      int keySizeBytes,
      boolean isPrimitive,
      ColumnType columnType,
      Function<ColumnValueSelector<?>, T> getterFn
  )

and then

      case LONG:
        return new FixedWidthGroupByColumnSelectorStrategy<>(
            Byte.BYTES + Long.BYTES,
            true,
            ColumnType.LONG,
            ColumnValueSelector::getLong
        );

and so on. It seems to work afaict and doesn't seem like it should be any worse than going through getObject and dimension handler utils conversion methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense given that either we'd be having numeric primitives, or complex (future work) types, since in that case, DimensionHandlerUtils.convertObjectToType() would be a no-op, and we can remove that call.

*
* @param <DimensionHolderType> Type of the dimension holder
*/
public interface DimensionToIdConverter<DimensionHolderType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like we should special handle multi-value dimensions instead of making new interfaces to accommodate them, though I'm still considering exactly how we should do this... maybe just making a dedicate multi-value string grouping strategy that doesn't fit any of the common patterns would be best.

Everything else should be grouping a single value and that is what we should always do going forwards. Things with multiple values like arrays should instead explicitly use UNNEST to aggregate individual elements of multiple values.

In fact, I almost wonder if we should dump all of the multi-value grouping stuff and just rewrite in GroupingEngine.process to wrap the storage adapter in an unnest if we detect any multi-value dimensions...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I almost wonder if we should dump all of the multi-value grouping stuff and just rewrite in GroupingEngine.process to wrap the storage adapter in an unnest if we detect any multi-value dimensions...

I think that would work, we'd need something like UNNEST(MV_TO_ARRAY(...))). However, that would perhaps alter the results which equated null to []. Also, we might not be able to use the dictionaries (if present) on top of the multi-value dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, rather than explicitly writing that function, the idea would be to automatically wrap the storageAdapter in an UnnestStorageAdapter for each multi-value string being grouped. The cursors made by UnnestStorageAdapter already use dimension selectors in the case of a multi-value string, and so can produce dictionary encoded dimension selectors from the cursor as well, so we should be able to retain grouping directly on the dimension id. If i recall correctly, because we use the dimension selector for the unnest cursor, I think it should behave the same way as the grouping engine with regards to null and [], but I would have to confirm.

This is probably too big of a change for this PR though, so I don't think this refactor needs to be done right now, but is worth pursuing in the future I think because it feels like the right way to handle things, then grouping itself has no concept of multi-value dimensions which sounds pretty amazing to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Currently, I am refactoring the keymapping strategy to
a. KeyMappingGroupByColumnSelectorStrategy - For arrays and complex types
b. KeyMappingMultiValueGroupByColumnSelectorStrategy - This would handle only strings. Upon thinking about it, it should be exactly the same as the StringGroupByColumnSelectorStrategy

*
* @param <DimensionType> Type of the dimension's values
*/
public interface IdToDimensionConverter<DimensionType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this seems basically like IdLookup except generic, and the latter method is only used internally to the class where this thing is also used.

I wonder if we can just drop this interface, make IdLookup generic, and just define canCompareIds directly on KeyMappingGroupByColumnSelectorStrategy and let stuff override it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While refactoring, I realized that IdLookup is for the replacement of DimensionToIdConvertor. After separating out the strategy, it should be simple to replace DimensionToIdConverter with IdLookup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16068 (comment) After completing this refactoring, I realized that now we only have a single implementation of Id->Dimension and Dimension->Id converters. Keeping this abstraction or removing it completely now lies on how whether or not we want to keep the stuff extendible for future dimensions having prebuilt dictionaries.

Also, I think we can't replace DimensionToIdConverter with IdLookup because the former requires tracking memory estimates as well.

Comment on lines 132 to 135
return columnCapabilities != null
&& columnCapabilities.hasBitmapIndexes()
&& (columnCapabilities.areDictionaryValuesSorted()
.and(columnCapabilities.areDictionaryValuesUnique())).isTrue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't correct, we should be checking isDictionaryEncoded not hasBitmapIndexes

Suggested change
return columnCapabilities != null
&& columnCapabilities.hasBitmapIndexes()
&& (columnCapabilities.areDictionaryValuesSorted()
.and(columnCapabilities.areDictionaryValuesUnique())).isTrue();
return columnCapabilities != null
&& columnCapabilities.isDictionaryEncoded()
.and(columnCapabilities.areDictionaryValuesSorted())
.and(columnCapabilities.areDictionaryValuesUnique())
.isTrue();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I was also confused about the condition, but then I chose to go ahead with the pre-existing code: https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/column/StringGroupByColumnSelectorStrategy.java#L165.

I'll correct this here.

* @see IdToDimensionConverter decoding logic for converting back dictionary to value
*/
@NotThreadSafe
class KeyMappingGroupByColumnSelectorStrategy<DimensionType, DimensionHolderType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, after looking more at this i really starting to feel like we should either split out multi-value string grouping into its own strategy (or do the other thing and dump it completely in favor of unnest adapter).

Everything except for mvds will spend time doing pointless stuff when there is ever only 1 dictionary id per row, so it doesn't feel worth the complexity/cost to have this odd strategy unified with the rest of the sane ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense. It'll introduce some redundancy, but it should make the code a lot cleaner (especially when we handle single values)

@LakshSingla LakshSingla marked this pull request as ready for review March 21, 2024 21:52
@Override
public boolean groupable()
{
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should check if element type strategy is groupable

{
if (columnType.equals(ColumnType.STRING)) {
// String types are handled specially because they can have multi-value dimensions
throw DruidException.defensive("Should use special variant which handles multi-value dimensions");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can handle regular strings probably, since if we know that a string column definitely isn't multi-value it probably more efficient to not have to check every value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't it out, but I suspect most of the work would happen while building the dictionary. On the flip side, it helped me get rid of the PreBuiltGroupBySelectorStrategy because there's no type at the moment that can exploit this, therefore this seemed lucrative.

I could add that back in, so adding stuff for the array dictionary becomes easier later. WDYT?

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 15, 2024
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass. Will take a second pass soon.

* under the License.
*/

package org.apache.druid.sql.calcite;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class required ?

* Implements {@link Arrays#equals} but the element equality uses the element's type strategy
*/
@Override
public boolean equals(@Nullable Object[] a, @Nullable Object[] b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have Ut's for this.

@@ -82,7 +83,7 @@ public int writeKeys(

// Use same ROUGH_OVERHEAD_PER_DICTIONARY_ENTRY as the nonvectorized version; dictionary structure is the same.
stateFootprintIncrease +=
DictionaryBuilding.estimateEntryFootprint((value == null ? 0 : value.length()) * Character.BYTES);
DictionaryBuildingUtils.estimateEntryFootprint((value == null ? 0 : value.length()) * Character.BYTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we push this stringLogic inside the footPrintMethod or maybe have a wrapper string method inside the same class ?

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Changes LGTM!!

@LakshSingla
Copy link
Contributor Author

I am cleaning up the dead todos, and adding a few tests soon! Should be good to go then.

@cryptoe cryptoe merged commit 6bca406 into apache:master Apr 24, 2024
85 checks passed
@LakshSingla LakshSingla deleted the unified-group-by-strategies-2 branch April 26, 2024 04:46
cryptoe pushed a commit that referenced this pull request Apr 30, 2024
 #16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work).
This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
adarshsanjeev pushed a commit to adarshsanjeev/druid that referenced this pull request May 6, 2024
 apache#16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work).
This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.
LakshSingla added a commit that referenced this pull request May 7, 2024
#16068 modified DimensionHandlerUtils to accept complex types to be dimensions. This had an unintended side effect of allowing complex types to be joined upon (which wasn't guarded explicitly, it doesn't work).
This PR modifies the IndexedTable to reject building the index on the complex types to prevent joining on complex types. The PR adds back the check in the same place, explicitly.

Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Area - Segment Format and Ser/De GHA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants