Adding new function decode_base64_utf8 and expr macro#14943
Adding new function decode_base64_utf8 and expr macro#14943soumyava merged 4 commits intoapache:masterfrom
Conversation
|
meanwhile review, I am adding more tests. |
|
|
||
| import javax.annotation.Nullable; | ||
|
|
||
| public class DecodeBase64UTFOperatorConversion implements SqlOperatorConversion |
There was a problem hiding this comment.
I think you can just use DirectOperatorConversion which will save you from having to implement toDruidExpression
| if (toDecode.value() == null) { | ||
| return ExprEval.of(null); | ||
| } | ||
| return new StringExpr(new String(StringUtils.decodeBase64String(toDecode.asString()), StandardCharsets.UTF_8)).eval(bindings); |
There was a problem hiding this comment.
You could also use StringUtils.fromUtf8(StringUtils.decodeBase64String(...))
| } | ||
| } | ||
|
|
||
| public static class StringDecodeBase64UTFExprMacro implements ExprMacroTable.ExprMacro |
There was a problem hiding this comment.
It would be useful to implement getOutputType, isLiteral, isNullLiteral, and getLiteralValue similar to ComplexDecodeBase64Expression.
There was a problem hiding this comment.
General Q, can this be implemented as an extension of UnivariateFunction? But then this is an expression and not exactly a function
There was a problem hiding this comment.
BaseScalarUnivariateMacroFunctionExpr is the macro version of right thing to extend here since this is a single arugment function, +1 for re-using it since it would simplify things a bit here
| private static final List<ExprMacro> BUILT_IN = ImmutableList.of( | ||
| new BuiltInExprMacros.ComplexDecodeBase64ExprMacro() | ||
| new BuiltInExprMacros.ComplexDecodeBase64ExprMacro(), | ||
| new BuiltInExprMacros.StringDecodeBase64UTFExprMacro() |
There was a problem hiding this comment.
it doesn't necessarily have to be in this PR, but it would be nice to add an alias for ComplexDecodeBase64ExprMacro (and also the SQL layer), perhaps just by extending the class and overriding the usage of NAME so it has consistent naming with this new function, e.g. decode_base64_complex
There was a problem hiding this comment.
I think this can be done registering an AliasedOperatorConversion similar to how SUBSTR or TRUNC is done
There was a problem hiding this comment.
that would fix the SQL layer, but this is native expression macros, which do not currently have a generic aliasing thing here.
@pranavbhole what are your plans for this aliasing? (I just want to make sure this actually gets done whether or not in this PR, and not lost)
There was a problem hiding this comment.
I will add aliasing in the followup PR.
somu-imply
left a comment
There was a problem hiding this comment.
It'll be good to add
- a test in CalciteQueryTest
- Update website/.spelling with decode_base64_utf8
| assertExpr("decode_base64_utf8('aGVsbG8=')", "hello"); | ||
| assertExpr("decode_base64_utf8('V2hlbiBhbiBvbmlvbiBpcyBjdXQsIGNlcnRhaW4gKGxhY2hyeW1hdG9yKSBjb21wb3VuZHMgYXJlIHJlbGVhc2VkIGNhdXNpbmcgdGhlIG5lcnZlcyBhcm91bmQgdGhlIGV5ZXMgKGxhY3JpbWFsIGdsYW5kcykgdG8gYmVjb21lIGlycml0YXRlZC4=')", "When an onion is cut, certain (lachrymator) compounds are released causing the nerves around the eyes (lacrimal glands) to become irritated."); | ||
| assertExpr("decode_base64_utf8('eyJ0ZXN0IjogMX0=')", "{\"test\": 1}"); | ||
| assertExpr("decode_base64_utf8('')", ""); |
There was a problem hiding this comment.
nit. this can be null in SQL compatible mode
601a87e to
491e4b5
Compare
|
addressed comments. |
040c898 to
ad20520
Compare
|
LGTM. One question, is there a way say I have a column of encoded strings. Atm I understand the function works on string values. Not this PR but can we later try to make it work on a column too ? Will that be useful ? Something like So might want to think about a better error message. But need not be done in context of this PR |
It works on the column as well, provided you have encoded base64 string value stored in column. |
ad20520 to
ed64879
Compare
c592237 to
304d973
Compare
soumyava
left a comment
There was a problem hiding this comment.
LGTM, the aliasing will be done in a followup PR
Description
Adding a new function called decode_base64_utf8 that takes base64 encoded string, decode it and gives back the utf8 encoded string.
This PR has: