UpdateBy gRPC#2635
Conversation
niloc132
left a comment
There was a problem hiding this comment.
👎 as it stands - this is introducing a lot (all? not sure) of QST via gRPC in this one new feature, and that means there is a lot of code that isn't obvious from the pull request which is now reachable from any client. Easiest example: a Expression.type.raw is an easy way to smuggle arbitrary java for execution, but there are no safeguards introduced to prevent this (the updateby group_by field may hold those raw expression strings, and engine-table's SelectColumn.ExpressionAdapter will invoke SelectColumnFactory.getExpression(..) on them without a second look).
At the very least, if this feature can only be implemented by using QST's Expression types, I would a) like to see the qst protos moved to a new .proto file, and b) a safe-by-default config option preventing the use of the UpdateByGrpcImpl type.
|
|
||
| message UpdateByRequest { | ||
|
|
||
| message Options { |
There was a problem hiding this comment.
IIRC not all languages qualify these with namespacing, it probably would be a good idea to give more descriptive names to avoid collisions.
There was a problem hiding this comment.
That's unfortunate :/ but good to know.
| Expression expression = 2; | ||
| } | ||
|
|
||
| message Pair { |
There was a problem hiding this comment.
convention elsewhere has been to just encode these as foo=bar/foo strings - ComboAggregateRequest.Aggregate.match_pairs for example.
i see the point that we can be more descriptive by describing it this way, but for a=a we have to write the string twice anyways, and a plain string for each of the two fields is also not accurate, since it can only be java identifiers.
There was a problem hiding this comment.
I'll add some documentation - but the idea is that input_column_name can be empty for the a=a case:
public static Pair adapt(io.deephaven.proto.backplane.grpc.Pair pair) {
final ColumnName output = ColumnName.of(pair.getOutputColumnName());
return pair.getInputColumnName().isEmpty() ? output : Pair.of(ColumnName.of(pair.getInputColumnName()), output);
}|
|
||
| optional int32 chunk_capacity = 2; | ||
|
|
||
| optional double maxStaticSparseMemoryOverhead = 3; |
There was a problem hiding this comment.
snake case for message fields (maximumLoadFactor, targetLoadFactor too)
| Spec spec = 1; | ||
| repeated Pair pair = 2; | ||
| } | ||
| oneof type { |
There was a problem hiding this comment.
It's explicitly modeled this way at the table api level b/c there is room for additional types in the future:
public interface UpdateByClause {
...
interface Visitor<T> {
T visit(ColumnUpdateClause clause);
}|
|
||
| enum BadDataBehavior { | ||
| // Reset the state for the bucket to {@code null} when invalid data is encountered. | ||
| RESET = 0; |
There was a problem hiding this comment.
Since 0 is the "This is not set" default, consider making it fail-safe with THROW, so as to not surprise users who fail to set this?
There was a problem hiding this comment.
In context, at least right now, this is always being prefaced w/ optional. The idea is that if the client does not provide a value for it, it will inherit the appropriate server defaults (the defaults vary depending on the specific field):
message UpdateByEmaOptions {
optional BadDataBehavior on_null_value = 1;
optional BadDataBehavior on_nan_value = 2;
...
I'll add documentation at this layer to explain that.
That said, I'm not against making THROW = 0, but there may be reasons to want the order to be the same as the java enum it is mapping? (We can update the java enum as well...)
| oneof type { | ||
| string column_name = 1; | ||
| int64 long_value = 2; | ||
| string raw = 3; |
There was a problem hiding this comment.
why only raw and long_value? if various other numbers/etc can be raw, why can't long?
Also it seems odd to refer to the "type" as being "column_name" or "raw" - perhaps expression could have a "value" which could be "column reference" or "long literal", "raw X value", etc?
There was a problem hiding this comment.
Just found the Expression type hierarchy in existing code, and I'm only more confused - why is RawString not a Value, but ColumnName is? Shouldn't ColumnName be a "Reference" or something rather than a Value?
There was a problem hiding this comment.
column name, long value, and raw are the only plumbed through parts at the table api layer right now, but I agree we should flesh these out more (#830). I can complete the loop a bit here in these regards.
As it is right now,
Expression = Value | RawString
Value = ColumnName | long (literal)
Maybe we should add more hierarchy here, with "Reference" and "Literal" instead of "Value"
There was a problem hiding this comment.
I've removed this proto.
|
At first I was a bit confused by your comment on "easy way to smuggle arbitrary java for execution" - is there anything preventing the equivalent via a view or select? Looking at the code now though, I see I'll adapt these checks in. I'm hoping that we can (eventually) migrate from |
|
Looking deeper, I'm having trouble finding out if we disable arbitrary code execution via gRPC table operations api today. Afaict, we only do "this looks like a valid operation" validation. I would also argue, this should be the responsibility of the engine, not the grpc layer. Trying to get clarification from @rcaudy |
rcaudy
left a comment
There was a problem hiding this comment.
Minor comments. It's a lot of code for not a lot of functionality.
|
|
||
| private static io.deephaven.proto.backplane.grpc.Pair adapt(ColumnName pair) { | ||
| return io.deephaven.proto.backplane.grpc.Pair.newBuilder() | ||
| .setOutputColumnName(pair.name()) |
There was a problem hiding this comment.
It surprised me that you were opting to not make input column name explicit.
e6c87d6 to
8c4454a
Compare
|
I don't hate this. I do sort of feel like the layering is getting excessive. I think the evidence of this can be inferred from how long it took you (@devinrsmith ) and I to trace through some of the table creation logic for parent resolution. |
rcaudy
left a comment
There was a problem hiding this comment.
Marker, I've reviewed up to here.
cb8846b to
7fbfe98
Compare
|
I've force pushed removing any changes to TableSpec and gRPC impl via TableSpec. I haven't resolved any concerns wrt the .proto structure yet. |
| // public TableSpec apply(TableSpec spec, String[] formulas) { | ||
| // return spec.selectDistinct(formulas); | ||
| // } | ||
| // } |
There was a problem hiding this comment.
probably not meant to be added?
There was a problem hiding this comment.
I'll remove it, and looks like I need to also fix some merge conflicts.

Fixes #2607