Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object[] handling for DimensionHandlers for arrays #12552

Merged
merged 6 commits into from
May 25, 2022

Conversation

cryptoe
Copy link
Contributor

@cryptoe cryptoe commented May 20, 2022

Description

Fixes a bug when running q's like

 SELECT cntarray,
       Count(*)
FROM   (SELECT dim1,
               dim2,
               Array_agg(cnt) AS cntarray
        FROM   (SELECT dim1,
                       dim2,
                       dim3,
                       Count(*) AS cnt
                FROM   foo
                GROUP  BY 1,
                          2,
                          3)
        GROUP  BY 1,
                  2)
GROUP  BY 1  

This generates an error:

org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList
        at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx]

Because it's an array of numbers it looks like it does the convertToList call, which looks like:

  @Nullable
  public static ComparableList convertToList(Object obj)
  {
    if (obj == null) {
      return null;
    }
    if (obj instanceof List) {
      return new ComparableList((List) obj);
    }
    if (obj instanceof ComparableList) {
      return (ComparableList) obj;
    }
    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
  }

I.e. it doesn't know about arrays. Added the array handling as part of this PR.

Also, there is another code path that knows about arrays. but that has a typo.

Objects[] objects = (Objects[]) obj;
    String[] delegate = new String[objects.length];
    for (int i = 0; i < objects.length; i++) {
      delegate[i] = convertObjectToString(objects[i]);
    }
    return ComparableStringArray.of(delegate);

it should be object[] and not objects[].
Added test cases to handle this case along with the change.

Thanks, @imply-cheddar for finding this bug.


Key changed/added classes in this PR
  • processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
  • sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

if (obj instanceof Object[]) {
List<T> delegateList = new ArrayList<T>();
for (Object eachObj : (Object[]) obj) {
delegateList.add(convertFunction.apply(eachObj));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not symmetrical with the List<> case above. Is there a specific reason why objects need conversion when they are an array and not when they are a List<>? It seems like they should generally be the same code path, so if return new ComparableList((List) obj); is correct for List, why can it not also be correct for the Object[] case? Or, vice-versa, if the converted form is correct, then I think the List<> case above needs update.

Do we know which code path tickles the List<> case above? Is there a test that covers it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a test case for it.
GroupByQueryTestRunner#testGroupByWithLongArrays()
on further inspection, its returning doubles in the output. That looks incorrect to me.

Which means we might have to transform the list as well. Similar argument can be made for the String Array stuff as well.

WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with the behavior of other calls to DimensionHandlerUtils.convertObjectToX, which is used by the grouper to ensure everything is expected and consistent type (both to handle schema differences between segments and also handle jackson changing types like doubles to floats, longs to ints, etc), these methods should be coercing the elements I think, so the List case is probably wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked!!

default:
return convertToList(obj);
case LONG:
return convertToListWithObjectFunction(obj, CONVERT_OBJECT_TO_LONG_FUNCTION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason not to use DimensionHandlerUtils::convertObjectToLong instead of static fields?

Copy link
Contributor Author

@cryptoe cryptoe May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DimensionHandlerUtils::convertObjectToLong seems better thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested changes look less verbose

@abhishekagarwal87 abhishekagarwal87 merged commit 9f9faee into apache:master May 25, 2022
cryptoe added a commit to cryptoe/druid that referenced this pull request May 25, 2022
Description
Fixes a bug when running q's like

 SELECT cntarray,
       Count(*)
FROM   (SELECT dim1,
               dim2,
               Array_agg(cnt) AS cntarray
        FROM   (SELECT dim1,
                       dim2,
                       dim3,
                       Count(*) AS cnt
                FROM   foo
                GROUP  BY 1,
                          2,
                          3)
        GROUP  BY 1,
                  2)
GROUP  BY 1  
This generates an error:

org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList
        at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx]
Because it's an array of numbers it looks like it does the convertToList call, which looks like:

  @nullable
  public static ComparableList convertToList(Object obj)
  {
    if (obj == null) {
      return null;
    }
    if (obj instanceof List) {
      return new ComparableList((List) obj);
    }
    if (obj instanceof ComparableList) {
      return (ComparableList) obj;
    }
    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
  }
I.e. it doesn't know about arrays. Added the array handling as part of this PR.
abhishekagarwal87 pushed a commit that referenced this pull request Jun 2, 2022
Description
Fixes a bug when running q's like

 SELECT cntarray,
       Count(*)
FROM   (SELECT dim1,
               dim2,
               Array_agg(cnt) AS cntarray
        FROM   (SELECT dim1,
                       dim2,
                       dim3,
                       Count(*) AS cnt
                FROM   foo
                GROUP  BY 1,
                          2,
                          3)
        GROUP  BY 1,
                  2)
GROUP  BY 1  
This generates an error:

org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList
        at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx]
Because it's an array of numbers it looks like it does the convertToList call, which looks like:

  @nullable
  public static ComparableList convertToList(Object obj)
  {
    if (obj == null) {
      return null;
    }
    if (obj instanceof List) {
      return new ComparableList((List) obj);
    }
    if (obj instanceof ComparableList) {
      return (ComparableList) obj;
    }
    throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
  }
I.e. it doesn't know about arrays. Added the array handling as part of this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants