-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udfs): generic support for lists/maps in UDFs #3054
Conversation
We should do the move to our own schema type first IMHO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its not ready for review... don't raise a review ;)
While I agree with you in principle I think it depends on how invasive this change is, and looking at the code it's really not that bad. The hack is very limited to a small part of the code and the investment of moving to our own schema type system is very large.
I did this to share code in a way that was easy to reference :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agavra main thing is the naming and zero-indexing of arrays, and edge cases on the generics that may or may not be an issue. Otherwise looking good. Wouldn't mind another review myself, but if this is holding you up, don't let me stop you.
@@ -129,8 +135,25 @@ private KsqlFunction( | |||
} | |||
|
|||
|
|||
public Schema getReturnType() { | |||
return returnType; | |||
public Schema getReturnType(final List<Schema> arguments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: having a param called arguments
and a field with the same name doesn't make the code the easiest to read.
What's the difference between the two? Can we express this in the naming somehow to make it easier to grok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the field to parameters
and what is passed in as arguments
. At some point I might go through the codebase and make this change everywhere because it's a distinction that is confused often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the different between arguments and parameters - both are the same to me. So I'm not sure changing the name makes the code in getReturnType
any more readable. But if we're changing the name, lets change the name of the getter, constructor param and toString too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A parameter is a variable in a method definition. When a method is called, the arguments are the data you pass into the method's parameters.
https://stackoverflow.com/a/156787/2258040
EDIT: I will make the corresponding changes in another PR since it touches a surprising amount of files.
ksql-common/src/main/java/io/confluent/ksql/function/UdfIndex.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/UdfIndex.java
Outdated
Show resolved
Hide resolved
ksql-common/src/main/java/io/confluent/ksql/function/UdfIndex.java
Outdated
Show resolved
Hide resolved
return "DECIMAL(" | ||
+ DecimalUtil.precision(schema) + ", " | ||
+ DecimalUtil.scale(schema) + ")"; | ||
SchemaUtil.requireValidBytesType(schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong - why are we extending the general use formatter to support this? The use of this special schema type should be very self contained within the UDF code...
ksql-engine/src/main/java/io/confluent/ksql/function/udf/list/Sublist.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udf/list/AsList.java
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/asmap.json
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/aslist.json
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/aslist.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took the first pass through, still reviewing tests. Left some comments/questions inline
return Sets.union( | ||
constituentGenerics(type.keySchema()), | ||
constituentGenerics(type.valueSchema())); | ||
case STRUCT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is currently not supported, right? maybe just throw an exception? It made me confused going between this and identifyGenerics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implemented because it's called to determine whether or not something has generics, which is used to determine whether or not to call identifyGenerics
. Throwing an exception here would break the normal flow (this method is called for every KsqlFunction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah that makes perfect sense. thanks!
} | ||
} | ||
|
||
return Sets.difference(constituentGenerics(schema), asMap.keySet()).isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the case when this would be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asMap
represents the "resolved" generics. if every one of the constituent generics exists in the mapping of resolved generics, then this would be true (e.g. constituents(Map<K,V>) ~ {K:String, V:Integer}
would be true
because K
and V
are both represented)
if you meant to ask when this would ever be false
, then you're right - I'm not actually sure it'll ever be false
&& GenericsUtil.instanceOf(schema, argument)) { | ||
// check if this specific generic has already been resolved to some other | ||
// type - if not, reserve it for future checks to accept | ||
final Schema old = reservedGenerics.putIfAbsent(schema, argument); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we be tracking the resolved generic schemas in reservedGenerics
, rather than the container schemas?
e.g. if I call accepts
with MAP<A, B>, MAP<STRING, INT>
, reservedGenerics should have A->STRING, B->INT
not MAP<A, B> -> MAP<STRING, INT>
?
This way, if I have a signature like FOO(MAP<A, B>, LIST<A>)
, I make sure that A is always resolved to the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that was the intention - it seems that I only have tests covering the case with generic top levels. Good catch!
ksql-common/src/main/java/io/confluent/ksql/function/UdfIndex.java
Outdated
Show resolved
Hide resolved
|
||
- Any generic in the return value of a method must appear in at least one of the method parameters | ||
- The generic must not adhere to any interface. For example, ``<T extends Number>`` is not valid). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support type coercion for generics, but do for specific types, right? e.g. if I have a function like add(T a, T b)
and I try to call it with something that resolves to INT, BIGINT
, that won't work. But if I did add(Long a, Long b)
it would. Might want to call that out in the docs.
ksql-common/src/main/java/io/confluent/ksql/function/GenericsUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -129,8 +135,25 @@ private KsqlFunction( | |||
} | |||
|
|||
|
|||
public Schema getReturnType() { | |||
return returnType; | |||
public Schema getReturnType(final List<Schema> arguments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the different between arguments and parameters - both are the same to me. So I'm not sure changing the name makes the code in getReturnType
any more readable. But if we're changing the name, lets change the name of the getter, constructor param and toString too!
ksql-common/src/main/java/io/confluent/ksql/schema/connect/SqlSchemaFormatter.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/JavaToConnect.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udf/list/Slice.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udf/map/AsMap.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udf/map/AsMap.java
Outdated
Show resolved
Hide resolved
ksql-engine/src/main/java/io/confluent/ksql/function/udf/list/Slice.java
Outdated
Show resolved
Hide resolved
ksql-functional-tests/src/test/resources/query-validation-tests/asarray.json
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit inline, otherwise LGTM
} | ||
|
||
final Map<Schema, Schema> genericMapping = GenericsUtil.resolveGenerics(schema, argument); | ||
for (final Entry<Schema, Schema> entry : genericMapping.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider pulling this pattern out into a common function in GenericsUtil, e.g:
public boolean checkGenericsConsistent(final Collection Entry<Schema, Schema> mappings, Optional<Map<Schema, Schema>> oReserved) {
final Map<Schema, Schema> reserved = oReserved.orElseGet(new HashMap<>());
for (final Entry<Schema, Schema> entry : mappings) {
final Schema old = reserved.putIfAbsent(entry.getKey(), entry.getValue());
if (old != null && !old.equals(entry.getValue())) {
return false;
}
}
return true;
}
Then you can use it from instanceOf, resolveGenerics, and here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to have as few methods with side-effects as possible, but the problem is that we need to both return a boolean and update a map... 🤔 I'll see if I can make that clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now why I didn't do this originally, for resolveGenerics
I wanted to be able to have a detailed exception message that tell me exactly which one was the problem. If I throw an exception after returning I lose that information. I think the amount of boiler plate to get this to work for all three of the methods is more confusing than just doing it in place.
Description
NOTE: This PR looks large, but the entire non-test, non-description functionality is under 500 lines and contained to a few files (see description below).
This change adds support for a single generic in UDFs so that container types are more accessible in our UDF framework. For example, the following UDF can now be used to create a map from two lists, the second of which has a generic type (see the QTT test for this UDF for an example):
To do this, the following changes were made:
KsqlFunction
will apply some logic to resolve any generics in the return type based on the actual argument types being passed inUdfIndex
will index generic types, and will traverse the trie for any inputGENERIC:<name>
. If we ever move to our own type system (i.e. not connect) this can have first class supportGenericsUtil
andJavaToConnect
have all of the magic, take a look thereSUBLIST
,AS_LIST
andAS_MAP
SchemaUtil
methods were moved toJavaToConnect
(and corresponding tests were moved as well)Missing
Testing done
Reviewer checklist