Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF tidy ups #1429

Merged
merged 5 commits into from Jun 14, 2018
Merged

UDF tidy ups #1429

merged 5 commits into from Jun 14, 2018

Conversation

dguy
Copy link
Contributor

@dguy dguy commented Jun 13, 2018

Description

Follow up from #1366

Copy link
Contributor

@apurvam apurvam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dguy, these are some nice improvements. My main comment is around the handling of unmatched types.

@@ -21,6 +21,10 @@
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
* Used to signal that a method in a class that has the @UdfDescription annotation
* as a function that can be invoked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean "is a function that can be invoked"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: "The {@code Udf} annotation on a method tells KSQL that this method should be exposed as a user-defined function in KSQL. The enclosing class must also be annotated with {@code UdfDescription}."

@@ -51,6 +53,22 @@
.put(List.class, index -> typeConversionCode("List", index))
.build();

// Templates used to generate the UDF code
private static final String genericTemplate =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer!

+ "if(args[#INDEX] == null) arg#INDEX = null;\n"
+ "else if (args[#INDEX] instanceof #TYPE) arg#INDEX = (#TYPE)args[#INDEX];\n"
+ "else if (args[#INDEX] instanceof String) arg#INDEX = "
+ "#TYPE.valueOf((String)args[#INDEX]);\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line basically coercing an unmatched type into a string? Wouldn't it be better to just throw an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay - grokable code!

I'm with @apurvam though - implicit type coercion scares me. Why do we need it? Why not just throw?

If we need this, then at the very least let's wrap this in a try/catch so we can throw a more meaningful error. Also, what if #Type.valueOf(String) does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - there are no unmatched types as the types need to be in typeConverters map.
For types that are not Map or List (as they are handled differently), it is converting it from a String to the type by using the valueOf() method, i.e., Boolean.valueOf(String), Long.valueOf(String)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a test to UdfCompilerTest to show that an exception will be raised if the type isn't one that we accept

this.blackList = builder.toString().equals(EMPTY_BLACKLIST)
? ""
: builder.toString();
this.blackList = Files.readLines(inputFile, Charset.forName(StandardCharsets.UTF_8.name()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a comment describing how this block is parsing the blacklist file. Specifically, how are strings being transformed with replace and join? The regexes are not straightforward to interpret.

Copy link
Contributor

@hjafarpour hjafarpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dguy - looking much better mate. Thanks for making the changes.

Some more comments, suggests and a few concerns below...

.filter(line -> !line.isEmpty())
.filter(line -> !line.startsWith("#"))
.map(line -> line.replaceAll("\\.", "\\\\."))
.collect(Collectors.joining("|", "^(?:",")\\.?.*$"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting - needs space after comma, to make it easier to grok.


private String blackList;
private String blackList = ".*";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: moving ".*" into constant BLACKLIST_ALL would help impart meaning more.


/**
* Used to restrict the classes that can be loaded by user supplied UDFs
*/
public class Blacklist implements Predicate<String> {
private static final Logger logger = LoggerFactory.getLogger(Blacklist.class);
private static final String EMPTY_BLACKLIST = "^(?)\\.?.*$";
private static final String EMPTY_BLACKLIST = "^(?:)\\.?.*$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into two constants:

private static final String BLACKLIST_PREFIX = "^(?:";
private static final String BLACKLIST_SUFFIX = ")\\.?.*$";

Then your collect line can be:

.collect(Collectors.joining("|",  BLACKLIST_PREFIX, BLACKLIST_SUFFIX));

And your empty check can be:

if (this.blackList.equals(BLACKLIST_PREFIX + BLACKLIST_SUFFIX))

Which I think is easier to understand.

@@ -60,6 +59,6 @@

@Override
public boolean test(final String resourceName) {
return blackList == null || resourceName.matches(blackList);
return resourceName.matches(blackList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of interest, why are we using a regex for such simple matching? Can't we just dump each line into a Set and use contains(resourceName)???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended to do partial matches, i.e., the regex produced would be something like:
^(?:java\.lang\.Process|java\.lang\.Runtime|javax)\.?.*$
which will match anything starting with java.lang.Process, java.lang.Runtime, javax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha.

+ "if(args[#INDEX] == null) arg#INDEX = null;\n"
+ "else if (args[#INDEX] instanceof #TYPE) arg#INDEX = (#TYPE)args[#INDEX];\n"
+ "else if (args[#INDEX] instanceof String) arg#INDEX = "
+ "#TYPE.valueOf((String)args[#INDEX]);\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay - grokable code!

I'm with @apurvam though - implicit type coercion scares me. Why do we need it? Why not just throw?

If we need this, then at the very least let's wrap this in a try/catch so we can throw a more meaningful error. Also, what if #Type.valueOf(String) does not exist?

private static String typeConversionCode(final String type, final int index) {
if (type.equals("Map") || type.equals("List")) {
return type + " arg" + index + " = (" + type + ")args[" + index + "];\n";
}
final String argArrayVal = "args[" + index + "]";
final String argVarAssignment = "arg" + index + " = ";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numericValue can just be in the code template, and then you can simplify this code:

private static final String INTEGER_NUMBER_TEMPLATE =
      "else if (args[#INDEX] instanceof Number) arg#INDEX = "
          + "((Number)args[#INDEX]).intValue();\n";

private static final String NUMBER_TEMPLATE =
      "else if (args[#INDEX] instanceof Number) arg#INDEX = "
      + "((Number)args[#INDEX]).#LC_TYPEValue();\n";

private static String typeConversionCode(final String type, final int index) {
    if (type.equals("Map") || type.equals("List")) {
      return type + " arg" + index + " = (" + type + ")args[" + index + "];\n";
    }

    final StringBuilder builder = new StringBuilder();
    builder.append(genericTemplate);
    if (type.equals("Integer")) {
      builder.append(INTEGER_NUMBER_TEMPLATE);
    } else if (!type.equals("String") && !type.equals("Boolean")) {
      builder.append(GENERIC_NUMBER_TEMPLATE);
    }
    builder.append(THROWS_TEMPLATE);

    return builder.toString()
        .replaceAll("#TYPE", type)
        .replaceAll("#LC_TYPE", type.toLowerCase())
        .replaceAll("#INDEX", String.valueOf(index));
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm still wondering if there is scope for this function to generate invalid code. Even if we know today that its only going to be called with a known set of type parameter values, I think it would still be sensible to validate type against a supported list and throw if not supported. This is better then generating code that won't compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 113

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really see your code as simplification!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it switching the code to only have one set of replaceAlls, i.e. it generates the whole set of code before doing the replacements, and moves the code generation in:

final String numericValue = type.equals("Integer") ? "intValue()" : type.toLowerCase()
         .append(argVarAssignment)	+          + "Value()";

Into the easy to grok constants.

}));
}

private Object instantiateUdfClass(final Method method,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Static

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mate

@dguy dguy merged commit f291a99 into confluentinc:5.0.x Jun 14, 2018
@dguy dguy deleted the udf-load-follow-up branch June 14, 2018 12:44
Copy link
Contributor

@apurvam apurvam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants