expression transform improvements and fixes#13947
expression transform improvements and fixes#13947clintropolis merged 6 commits intoapache:masterfrom
Conversation
changes: * fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms * improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away * improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension * added some tests for ExpressionTransform with array inputs
processing/src/test/java/org/apache/druid/query/expression/RegexpExtractExprMacroTest.java
Fixed
Show fixed
Hide fixed
| if (value instanceof byte[]) { | ||
| return new StringExprEval(StringUtils.encodeBase64String((byte[]) value)); | ||
| } |
There was a problem hiding this comment.
This has me wondering, what if an expression actually wants the byte[] how would it be defined so that if one expression returns a byte[] and the next one wants to use it, then it will just be passed through without being base64 encoded in between?
There was a problem hiding this comment.
This block is for things that are asking for STRING typed values, (though they might be multi-value strings as well). COMPLEX types will accept bytes as is and try to deserialize them into the appropriate object using a TypeStrategy that wraps the ObjectStrategy.
However, to complicate this answer slightly, where the type passed to this method comes from varies depending on where it is being called. This ofType method is what backs IdentifierExpr which is what feeds input values into expressions. When backed by a segment, the type will be the type which was stored in the segment, etc. For places we don't know though, such as expression transforms, we fall back to using bestEffortOf, which will handle byte[] as a STRING type. It lacks the complex type name so cant handle byte[] to complex object translation, so we turn it into a string because at least then something could do something with it.
Responding to this made me realize that the expression post-agg bindings could be improved with the partial type information derived from the aggregators 'result' type, so I have updated them to use it accordingly in the latest commit.
Back to byte[], we could alternatively consider leaving it as 'unknown' COMPLEX, though that would cause some issue with nested columns which is the other main user of 'bestEffortOf', which uses it to try to derive the type information of these values. Since we don't have a native binary blob type, STRING is most useful here so we can at least preserve the values (and for JSON, byte[] already come in as base64 strings, so byte[] really only appear in other nested formats, such as Avro, Parquet, Protobuf, and ORC).
There was a problem hiding this comment.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
| builder.put("e", new String[] {null, "foo", "bar"}); | ||
| builder.put("f", new String[0]); | ||
| bindings = InputBindings.withMap(builder.build()); | ||
| bindings = InputBindings.forMap(builder.build()); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| public void testDoubleEval() | ||
| { | ||
| Expr.ObjectBinding bindings = InputBindings.withMap(ImmutableMap.of("x", 2.0d)); | ||
| Expr.ObjectBinding bindings = InputBindings.forMap(ImmutableMap.of("x", 2.0d)); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| public void testLongEval() | ||
| { | ||
| Expr.ObjectBinding bindings = InputBindings.withMap(ImmutableMap.of("x", 9223372036854775807L)); | ||
| Expr.ObjectBinding bindings = InputBindings.forMap(ImmutableMap.of("x", 9223372036854775807L)); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| Expr.ObjectBinding bindings = InputBindings.withMap(bindingsMap); | ||
| Expr.ObjectBinding bindings = InputBindings.forMap(bindingsMap); | ||
|
|
||
| try { |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
| .put("str1", "v1") | ||
| .put("str2", "v2"); | ||
| bindings = InputBindings.withMap(builder.build()); | ||
| bindings = InputBindings.forMap(builder.build()); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
|
hi, @clintropolis , the modification that adds evalDimension(Row row) method in RowFunction interface, References: |
…15452) The PR: #13947 introduced a function evalDimension() in the interface RowFunction. There was no default implementation added for this interface which causes all the implementations and custom transforms to fail and require to implement their own version of evalDimension method. This PR adds a default implementation in the interface which allows the evalDimension to return value as a Singleton array of eval result.
Description
changes:
ExprEval.bestEffortOfandExprEval.ofType, which could causebyte[]values to end up as javatoStringvalues instead of base64 encoded strings in ingest time transformsExpressionTransformbinding to re-useExprEval.bestEffortOfwhen evaluating a binding instead of throwing it awayExpressionTransformarray handling in anticipation of nested columns + arrays = array columns! #13803, addedRowFunction.evalDimensionthat returnsList<String>to backRow.getDimensionand remove the automatic coercing of array types that would typically happen toExpressionTransformnow unless usingRow.getDimensionExpressionPostAggregatorto use partial type information from decorationExpressionTransformwith array inputsThis PR has: