Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling UDF that takes varags with ARRAY type not handled correctly #3620

Open
big-andy-coates opened this issue Oct 18, 2019 · 1 comment
Open
Labels
bug engine user-defined-functions Tickets about UDF, UDAF, UDTF
Projects

Comments

@big-andy-coates
Copy link
Contributor

Describe the bug

Any UDF that has a varargs currently passes analysis if called with an ARRAY type.

For example, the FIELD UDF has a signature of public int field(final String str, final String... args). It's designed to be called like: FIELD("findMe", "possibility 1", "possibility 2"), but can also be currently be called just two params, the first a STRING and the second an ARRAY<STRING>.

Though this passed through the Query Analyser without issue, it then fails at runtime.

To Reproduce

  • Version: Master branch. (5.4-SNAPSHOT).
    Here's a QTT test case that highlights the issue:
   {
      "name": "field passed ARRAY<STRING>",
      "statements": [
        "CREATE STREAM TEST (value STRING) WITH (kafka_topic='test_topic', value_format='JSON');",
        "CREATE STREAM OUTPUT AS SELECT FIELD('hello', SPLIT(value, ',')) as pos FROM TEST;"
      ],
      "inputs": [
        {"topic": "test_topic", "key": 1, "value": {"value": "hello,world"}, "timestamp": 0}
      ],
      "outputs": [
        {"topic": "OUTPUT", "key": 1, "value": {"POS": 1}, "timestamp": 0}
      ]
    }

Expected behavior

This should ideally be either supported, or rejected when issuing the statement.

Actual behaviour

It fails at runtime when processing each row, with the error:

[2019-10-18 15:20:56,549] ERROR {"type":1,"deserializationError":null,"recordProcessingError":{"errorMessage":"Error computing expression FIELD('hello', SPLIT(TEST.VALUE, ',')) for column POS with index 0: Failed to invoke udf public int io.confluent.ksql.function.udf.string.Field.field(java.lang.String,java.lang.String[])","record":null,"cause":["Failed to invoke udf public int io.confluent.ksql.function.udf.string.Field.field(java.lang.String,java.lang.String[])","Couldn't coerce array argument \"args[1]\" to type class [Ljava.lang.String;"]},"productionError":null} (processing.CSAS_OUTPUT_0.Project.PROJECT:44)

Additional context

One solution might be to add a variant of FIELD that accepts STRING, ARRAY<STRING> parameters. However, that's only a partial fix: it only fixes this instance. The same issue exists for any UDF that has varargs. So the true fix is to either coerce the List<String> into a String[], or at the very least detect this during analysis and reject it.

@big-andy-coates big-andy-coates added bug user-defined-functions Tickets about UDF, UDAF, UDTF engine labels Oct 18, 2019
@agavra
Copy link
Contributor

agavra commented Oct 23, 2019

Just noting that my personal preference is to support this behavior (as opposed to reject it). This is how varargs work in most languages.

@big-andy-coates big-andy-coates added this to High priority in Bugs Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug engine user-defined-functions Tickets about UDF, UDAF, UDTF
Projects
Bugs
  
High priority
Development

No branches or pull requests

2 participants