feat: Added udf/udaf support to th etesting tool. #3252

hjafarpour · 2019-08-22T17:58:08Z

Description

This PR adds UDF/UDAF support to the testing tool.
I also removed some unused methods!
It also fixes #3094

Testing done

Test with UDF added.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

JimGalasyn

LGTM!

agavra

I think the core of this change looks good, but I'm requesting changes because of the usability concern and the test speed concern.

Also, how does this fix #3094?

agavra · 2019-08-22T21:53:07Z

docs/developer-guide/ksql-testing-tool.rst

+--------------
+
+The KSQL testing tool supports UDFs and UDAFs the same as KSQL. However, unlike KSQL where the location of the jar file that contains the UDF/UDAF code is configured by ``ksql.extension.dir``,
+in the KSQL testing tool, you should put the jar file in the same folder as the test statements file.


this doesn't seem like a very good user experience. I should be able to point you to where the JAR is instead of forcing me to place it in a specific place for tests (likely duplicating where I actually store it) by providing a --udf-jar path/to/jar flag or something like that

(replying to Almog's thread: https://github.com/confluentinc/ksql/pull/3252/files#r316900701)

Totally agree - we should not require the jar in the same folder as the statements file.

We should be able to run the test with a directory structure where test files are in sub-directories, and we shouldn't expect people to copy the jar into multiple places.

I would say we just load it from the class path. There is no need to do anything special if we do that.

I'm not sure we need a --udf-jar path/to/jar flag for the application, the same can be achieved by adding it to the class path on the command line. Of course we could have a wrapper script, (we may already!), that took such a parameter and included it in the class path.

Side note: Can we currently run the tool against a directory of files or are we expecting users to script up reading a directory of files and invoking per-file? Surely the former... otherwise just the cost of spinning up the JVM for each test is going to be prohibitive!

agavra · 2019-08-22T21:55:53Z

...ctional-tests/src/main/java/io/confluent/ksql/test/utils/TestRunnerFunctionRegistryUtil.java

+    new UdfLoader(mutableFR,
+        statementFileObject.getParentFile(),
+        KsqlTestingTool.class.getClassLoader(),
+        value -> false, new UdfCompiler(Optional.empty()), Optional.empty(), true


Suggested change

value -> false, new UdfCompiler(Optional.empty()), Optional.empty(), true

value -> false,

new UdfCompiler(Optional.empty()),

Optional.empty(),

true

agavra · 2019-08-22T21:57:19Z

ksql-functional-tests/src/main/java/io/confluent/ksql/test/tools/KsqlTestingTool.java

@@ -122,14 +117,16 @@ static void runWithTripleFiles(
        true
    );

+    final File statementFileObject = new File(statementFile);
+    final FunctionRegistry functionRegistry
+        = TestRunnerFunctionRegistryUtil.getFunctionRegistry(statementFileObject);


I think this might slow down our tests (in KsqlTestingToolTest) because we are loading the functions every time. If the test does not supply a custom jar, I think we should fallback to TestFunctionRegistry.INSTANCE.get() - can you post how long it takes to run the tests before and after this change?

(Replying to Almog's thread: https://github.com/confluentinc/ksql/pull/3252/files#r316902086)

So TestFunctionRegistry.INSTANCE will also be loading UDFs once per test file if the testing tool doesn't support running a directory structure of tests.

If we do as I suggest and require the UDF jar on the class path then TestFunctionRegistry.INSTANCE will already have loaded the custom UDFs.

We certainly shouldn't be loading UDFs twice. This is really slow!

So our fix should not be to mess around with the logic here - it should always use TestFuncitonRegistry.INSTANCE - and our fix for performance should be to support running the tool against recursively against a directory structure of tests. This can either be with single combined test files, or with the separate files, but matching them using some naming convention.

agavra · 2019-08-22T21:59:19Z

ksql-functional-tests/src/test/resources/test-runner/correct/udf/statements.sql

@@ -0,0 +1,2 @@
+CREATE STREAM TEST (ID bigint, NAME varchar, VALUE double) WITH (kafka_topic='test_topic', value_format='DELIMITED', key='ID');
+CREATE STREAM S1 as SELECT myudf(name) FROM test;


how was test-udf-1.0-SNAPSHOT.jar created? It's nice to be able to update this jar if necessary

We already have test UDFS in the code base, right? We should be either using those, or adding a new test-udf module. We shouldn't be checking in such jars as binaries.

big-andy-coates

Hey @hjafarpour

I think a much simpler solution to allow supporting of custom UDFs is just to expect them on the class path. This requires no code changes! And is pretty standard. This also does not require the user to copy the jar anywhere.

I'm not sure if we supply a bash script for executing the testing tool, but if we don't then we should. Such a tool could have a --udf switch, as @almog suggests, which would handle adding the jar or directory to the class path.

While reviewing this I've noted that it seems to test tool doesn't support operating on a directory structure of tests - is that the case? If so, then we need to fix this urgently, otherwise the performance of operating over many files is going to be terrible. Spinning up a JVM and loading UDFs for each file is going to be a killer!

Also, we should not be checking in jar binaries. Luckily, if we're just loading off the class path there is nothing to do, so no need for the custom UDF jar.

Finally, this PR does not fix the issue of test jars being available in our release (#3094) - this is cause by the testing tool continuing dependency on test jars.

big-andy-coates · 2019-08-23T13:26:40Z

docs/developer-guide/ksql-testing-tool.rst

+--------------
+
+The KSQL testing tool supports UDFs and UDAFs the same as KSQL. However, unlike KSQL where the location of the jar file that contains the UDF/UDAF code is configured by ``ksql.extension.dir``,
+in the KSQL testing tool, you should put the jar file in the same folder as the test statements file.


(replying to Almog's thread: https://github.com/confluentinc/ksql/pull/3252/files#r316900701)

Totally agree - we should not require the jar in the same folder as the statements file.

We should be able to run the test with a directory structure where test files are in sub-directories, and we shouldn't expect people to copy the jar into multiple places.

I would say we just load it from the class path. There is no need to do anything special if we do that.

I'm not sure we need a --udf-jar path/to/jar flag for the application, the same can be achieved by adding it to the class path on the command line. Of course we could have a wrapper script, (we may already!), that took such a parameter and included it in the class path.

Side note: Can we currently run the tool against a directory of files or are we expecting users to script up reading a directory of files and invoking per-file? Surely the former... otherwise just the cost of spinning up the JVM for each test is going to be prohibitive!

big-andy-coates · 2019-08-23T13:28:40Z

ksql-functional-tests/src/main/java/io/confluent/ksql/test/tools/KsqlTestingTool.java

@@ -122,14 +117,16 @@ static void runWithTripleFiles(
        true
    );

+    final File statementFileObject = new File(statementFile);
+    final FunctionRegistry functionRegistry
+        = TestRunnerFunctionRegistryUtil.getFunctionRegistry(statementFileObject);


(Replying to Almog's thread: https://github.com/confluentinc/ksql/pull/3252/files#r316902086)

So TestFunctionRegistry.INSTANCE will also be loading UDFs once per test file if the testing tool doesn't support running a directory structure of tests.

If we do as I suggest and require the UDF jar on the class path then TestFunctionRegistry.INSTANCE will already have loaded the custom UDFs.

We certainly shouldn't be loading UDFs twice. This is really slow!

So our fix should not be to mess around with the logic here - it should always use TestFuncitonRegistry.INSTANCE - and our fix for performance should be to support running the tool against recursively against a directory structure of tests. This can either be with single combined test files, or with the separate files, but matching them using some naming convention.

big-andy-coates · 2019-08-23T13:29:25Z

...ctional-tests/src/main/java/io/confluent/ksql/test/utils/TestRunnerFunctionRegistryUtil.java

+  }
+
+  public static FunctionRegistry getFunctionRegistry(final File statementFileObject) {
+    Objects.requireNonNull(statementFileObject, "statementFileObject");


nit: unnecessary check

big-andy-coates · 2019-08-23T13:35:22Z

ksql-engine/src/main/java/io/confluent/ksql/function/UdfLoader.java

@@ -119,7 +119,7 @@ public void load() {
  }

  // Does not handle customer udfs, i.e the loader is the ParentClassLoader and path is internal
-  public void loadUdfFromClass(final Class<?> ... udfClass) {
+  void loadUdfFromClass(final Class<?> ... udfClass) {


this method is not actually used, (except in tests), you could just delete it.

This method is actually necessary to test UDFs whose loading crashes the UdfLoader. Moreover, I think it is useful to be able to specify a subset of UDFs one wants to load. Currently, there is no way to specify which specific UDFs to load since all on the classpath are loaded.

big-andy-coates · 2019-08-23T13:50:25Z

ksql-engine/src/main/java/io/confluent/ksql/function/UdfLoader.java

@@ -79,7 +79,7 @@


  @SuppressWarnings("OptionalUsedAsFieldOrParameterType")
-  UdfLoader(
+  public UdfLoader(


Rather than make this internal constructor public, it would be better to add a new public constructor with just the bits needed. This internal one requires stuff that no client of the class should need to pass - this constructor is for testing only.

I suggest:

public UdfLoader( final MutableFunctionRegistry functionRegistry, final Optional<File> customUdfDirectory, final Optional<Metrics> metrics ) { this( functionRegistry, customUdfDirectory.orElse(new File("")), Thread.currentThread().getContextClassLoader(), customUdfDirectory .map(pluginDir -> (Predicate<String>) new Blacklist( new File(pluginDir, "resource-blacklist.txt"))) .orElse(s -> false), new UdfCompiler(metrics), metrics, customUdfDirectory.isPresent() ); } @SuppressWarnings("OptionalUsedAsFieldOrParameterType") @VisibleForTesting UdfLoader( final MutableFunctionRegistry functionRegistry, final File pluginDir, final ClassLoader parentClassLoader, final Predicate<String> blacklist, final UdfCompiler compiler, final Optional<Metrics> metrics, final boolean loadCustomerUdfs ) { ... }

This uses the presence of the customPluginDirectory to determine if plugins should be loaded.

The internal use of the constructor can also be switched to use this new constructor, to remove code duplication:

public static UdfLoader newInstance(final KsqlConfig config, final MutableFunctionRegistry metaStore, final String ksqlInstallDir ) { final boolean loadCustomerUdfs = config.getBoolean(KsqlConfig.KSQL_ENABLE_UDFS); final boolean collectMetrics = config.getBoolean(KsqlConfig.KSQL_COLLECT_UDF_METRICS); final String extDirName = config.getString(KsqlConfig.KSQL_EXT_DIR); final Optional<File> pluginDir = loadCustomerUdfs ? Optional.of(KsqlConfig.DEFAULT_EXT_DIR.equals(extDirName) ? new File(ksqlInstallDir, extDirName) : new File(extDirName)) : Optional.empty(); final Optional<Metrics> metrics = collectMetrics ? Optional.of(MetricCollectors.getMetrics()) : Optional.empty(); if (config.getBoolean(KsqlConfig.KSQL_UDF_SECURITY_MANAGER_ENABLED)) { System.setSecurityManager(ExtensionSecurityManager.INSTANCE); } return new UdfLoader(metaStore, pluginDir, metrics); }

For extra brownie points you can drop the loadCustomerUdfs field completely by switching the pluginDir field to Optional<File>...

big-andy-coates · 2019-08-23T14:05:06Z

ksql-functional-tests/src/test/resources/test-runner/correct/udf/statements.sql

@@ -0,0 +1,2 @@
+CREATE STREAM TEST (ID bigint, NAME varchar, VALUE double) WITH (kafka_topic='test_topic', value_format='DELIMITED', key='ID');
+CREATE STREAM S1 as SELECT myudf(name) FROM test;


We already have test UDFS in the code base, right? We should be either using those, or adding a new test-udf module. We shouldn't be checking in such jars as binaries.

big-andy-coates · 2019-09-18T10:35:39Z

Hey @vpapavas

I talked to @apurvam and we thought you might be able to pick up this PR and see it through...?

big-andy-coates · 2019-09-20T08:44:11Z

Linking to #2804, which is the ticket that #3094 was closed as a duplicate of

holgerbrandl · 2020-02-25T12:31:41Z

UDF support is key to efficiently use the ksql-test-runner. Any news on this subject? Or is there any workaround to do so?

cla-assistant · 2023-11-15T20:46:05Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Added udf/udaf support to th etesting tool.

0e99aeb

hjafarpour requested review from JimGalasyn and a team August 22, 2019 17:58

hjafarpour changed the title ~~Added udf/udaf support to th etesting tool.~~ feat: Added udf/udaf support to th etesting tool. Aug 22, 2019

JimGalasyn approved these changes Aug 22, 2019

View reviewed changes

agavra suggested changes Aug 22, 2019

View reviewed changes

agavra requested a review from a team August 22, 2019 22:00

big-andy-coates suggested changes Aug 23, 2019

View reviewed changes

big-andy-coates assigned vpapavas Sep 18, 2019

big-andy-coates mentioned this pull request Sep 20, 2019

ksql-functional-test module depends on test-jars #2804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added udf/udaf support to th etesting tool. #3252

feat: Added udf/udaf support to th etesting tool. #3252

hjafarpour commented Aug 22, 2019

JimGalasyn left a comment

agavra left a comment •

edited

Loading

agavra Aug 22, 2019

big-andy-coates Aug 23, 2019

agavra Aug 22, 2019

agavra Aug 22, 2019

big-andy-coates Aug 23, 2019

agavra Aug 22, 2019

big-andy-coates Aug 23, 2019

big-andy-coates left a comment •

edited

Loading

big-andy-coates Aug 23, 2019

big-andy-coates Aug 23, 2019

big-andy-coates Aug 23, 2019

big-andy-coates Aug 23, 2019

vpapavas Aug 26, 2019

big-andy-coates Aug 23, 2019

big-andy-coates Aug 23, 2019

big-andy-coates commented Sep 18, 2019

big-andy-coates commented Sep 20, 2019

holgerbrandl commented Feb 25, 2020

cla-assistant bot commented Nov 15, 2023

		@@ -0,0 +1,2 @@
		CREATE STREAM TEST (ID bigint, NAME varchar, VALUE double) WITH (kafka_topic='test_topic', value_format='DELIMITED', key='ID');
		CREATE STREAM S1 as SELECT myudf(name) FROM test;

feat: Added udf/udaf support to th etesting tool. #3252

Are you sure you want to change the base?

feat: Added udf/udaf support to th etesting tool. #3252

Conversation

hjafarpour commented Aug 22, 2019

Description

Testing done

Reviewer checklist

JimGalasyn left a comment

Choose a reason for hiding this comment

agavra left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

big-andy-coates left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

big-andy-coates commented Sep 18, 2019

big-andy-coates commented Sep 20, 2019

holgerbrandl commented Feb 25, 2020

cla-assistant bot commented Nov 15, 2023

agavra left a comment •

edited

Loading

big-andy-coates left a comment •

edited

Loading