fix ExpressionVirtualColumn equivalence key to use Expr.stringify instead of raw input string to stabilize comparison#18334
Conversation
…tead of raw input string to stabilize comparison
| this.name = Preconditions.checkNotNull(name, "name"); | ||
| this.expression = new Expression(Preconditions.checkNotNull(expression, "expression"), outputType); | ||
| this.parsedExpression = parsedExpression; | ||
| this.expression = new Expression( |
There was a problem hiding this comment.
since parsedExpression.get() is already called in line 132, why does Expression need a Supplier<Expr> instead of just Expr? Maybe Parser.lazyParse is not needed?
There was a problem hiding this comment.
I'm not going to remove Parser.lazyParse, this is behind a supplier so we don't spend time parsing expressions unless we actually need to do something with the Expr since parsing the expression can add up quite a bit in terms of cpu cost
There was a problem hiding this comment.
i see, so maybe we should do:
this.expressionAnalysis = Suppliers.memoize(() -> parsedExpression.get().analyzeInputs());
There was a problem hiding this comment.
oops, yea that is a mistake, good catch
| } | ||
| Expression that = (Expression) o; | ||
| return Objects.equals(expressionString, that.expressionString) && Objects.equals(outputType, that.outputType); | ||
| return Objects.equals(parsed.get().stringify(), that.parsed.get().stringify()) |
There was a problem hiding this comment.
I spot checked a few Expr and they seems to have equals implemented, is there any reason to call stringify specifically?
There was a problem hiding this comment.
its probably ok... for whatever reason I am hesitant to trust equals even tho Expr is marked with SubclassesMustOverrideEqualsAndHashCode it has a lot less test coverage than stringify which has a lot of coverage to ensure that expr.stringify() makes a string that can be parsed back into the same expr. I guess the solution to my feelings is to just add more coverage using equals and hashcode instead of not using them.
Maybe I'll try to add some coverage and switch to equals.
There was a problem hiding this comment.
adding some equals/hashcode checks to FunctionTest has hit a handful of failures, so my hunch was maybe correct. This is certainly wack that equals/hashcode are not equal, but i think the verdict is that i'll leave this as using stringify until i can fix
There was a problem hiding this comment.
an update, the failures i've seen so far seem related to typed null constants (like NullLongExpr not being equal to NullDoubleExpr or a StringExpr with a null value), but these all have the same stringified form. Not entirely sure how to best fix it, since they are technically different Expr, but they are equivalent... we might need a method like equals but less strict since a given string expression might have multiple possible Expr forms depending on stuff like for example whether asSingleThreaded is called or not, etc
There was a problem hiding this comment.
i suppose more technically we need an actual equivalence method for Expr to power all of this stuff, then it could handle more general equivalence like a + b and b + a being interchangeable, but that's too much for this PR i think
| public int hashCode() | ||
| { | ||
| return Objects.hash(name, expression); | ||
| return Objects.hash(name, expression.expressionString, expression.outputType); |
There was a problem hiding this comment.
what's the difference between expressionString and expression.parsed.get().stringify()?
There was a problem hiding this comment.
expressionString is the raw expression as input by the user from the json, stringifying the parsed expression homogenizes this so a+b and a + b and "a" + "b" become the same expr after being parsed. (this is the reason the equivalence check was broken, using the raw input strings is not correct because the user could write the same expression in a lot of different ways)
| this.name = Preconditions.checkNotNull(name, "name"); | ||
| this.expression = new Expression(Preconditions.checkNotNull(expression, "expression"), outputType); | ||
| this.parsedExpression = parsedExpression; | ||
| this.expression = new Expression( |
There was a problem hiding this comment.
i see, so maybe we should do:
this.expressionAnalysis = Suppliers.memoize(() -> parsedExpression.get().analyzeInputs());
…rtual-column-equivalence
…rtual-column-equivalence
Description
Fixes a bug with
ExpressionVirtualColumnequivalence checking (VirtualColumn.getEquivalenceKey()) caused by using the raw input expression string instead of like parsing the expression and using stringify to stabilize it to a homogeneous value. This method is used to match virtual columns in projections with query time virtual columns, so unless the projection was created with the exact same expression string as the query time expression, it would be unable to match correctly (really counter to the purpose of all the equivalence key stuff in the first place, which was added for this purpose 😅)After the change,
expressionStringis barely needed, but i have retained it so that json serialization and such are not required to parse the expression and restringify just to serialize, and not change behavior of equals/hashcode of the parentExpressionVirtualColumntype (theEquivalenceKeyis the internalExpressiontype)