Skip to content

Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly#14924

Merged
soumyava merged 14 commits intoapache:masterfrom
somu-imply:latest_schema_mismatch_fix
Sep 13, 2023
Merged

Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly#14924
soumyava merged 14 commits intoapache:masterfrom
somu-imply:latest_schema_mismatch_fix

Conversation

@somu-imply
Copy link
Contributor

@somu-imply somu-imply commented Aug 29, 2023

…we update the vectorized aggs properly

If in vectorized:force mode on a column which has segment which is a String column like

{
                "typeSignature": "STRING",
                "type": "STRING",
                "hasMultipleValues": false,
                "hasNulls": false,
                "size": 0,
                "cardinality": 1,
                "minValue": "0",
                "maxValue": "0",
                "errorMessage": null
            }

We run queries with numeric aggregation on that string column such as

"aggregations": [
   {
     "type": "floatLast",
     "name": "a0",
     "fieldName": "<string_col>",
     "timeColumn": "__time"
   }
 ],

the queries fail with an error

Error: RUNTIME_FAILURE (OPERATOR)

Cannot make VectorValueSelector for column with class[org.apache.druid.segment.column.StringUtf8DictionaryEncodedColumn]

org.apache.druid.java.util.common.UOE

This is because vectorized aggs need these cast supports. Till that is done, this PR fixes the issue by going down the non-vectorized route

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment on lines +119 to +120
final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
return capabilities != null && capabilities.isNumeric();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know i advised differently offline earlier, but i now think this can still return true, since the value selector creation is moved inside of the check in factorizeVector. looking closer at factorize and factorizeBuffer those will sort of end up using a 'nil' selector because the non-vectorize column value selector on these numeric aggs will call getDouble or getLong on the value selectors which will be null/zero for string inputs.

Copy link
Contributor Author

@somu-imply somu-imply Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same initially, but the get of NumericNilVectorAggregator would only return a null and not a 0 as the non-vectorized parts do. One solution would be to update the NumercNilVectorAggregator to return a serializablePair with the rhs as 0 on case of a null, but I am unsure if that will break anything else. I'll try to go down that path too

Comment on lines +131 to +132
final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
return capabilities != null && capabilities.isNumeric();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about returning true

Comment on lines +131 to +132
final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
return capabilities != null && capabilities.isNumeric();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about returning true

Comment on lines +153 to +154
final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
return capabilities != null && capabilities.getType().equals(ValueType.STRING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one probably is necessary to keep though, the string vector aggregator calls DimensionHandlerUtils.convertObjectToString which can handle any type of value, however it is using a VectorObjectSelector which isn't guaranteed to be implemented for numbers, which typically need to use VectorValueSelector.

@somu-imply
Copy link
Contributor Author

Added the casting from valueSelector to ObjectSelector for non-string columns when used with stringLast/First in vectorized mode. Currently this fails on coverage only, adding tests soon

@Nullable
private final Object returnValue;

public static NumericNilVectorAggregator of(Object returnValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: I wonder if we should rename NumericNilVectorAggregator if we are going to be using it for non-numbers, maybe just NilVectorAggregator. I guess this doesn't need to be done in this PR, but it does seem kind of funny exposing it for other purposes

return new DoubleFirstVectorAggregator(timeSelector, valueSelector);
}
return NumericNilVectorAggregator.doubleNilVectorAggregator();
return NumericNilVectorAggregator.of(new SerializablePair<>(0L, NullHandling.defaultDoubleValue()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be saved as a static variable instead of making a new one each time since its not going to change over the lifetime of the service, same for all the other impls

@somu-imply
Copy link
Contributor Author

Addressed the review comments

@abhishekagarwal87 abhishekagarwal87 changed the title Fix for schema mismatch to go down using the non vectorize path till … Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly Sep 13, 2023
@soumyava soumyava merged commit bf99d2c into apache:master Sep 13, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants