New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDF tidy ups #1429
UDF tidy ups #1429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dguy, these are some nice improvements. My main comment is around the handling of unmatched types.
@@ -21,6 +21,10 @@ | |||
import java.lang.annotation.RetentionPolicy; | |||
import java.lang.annotation.Target; | |||
|
|||
/** | |||
* Used to signal that a method in a class that has the @UdfDescription annotation | |||
* as a function that can be invoked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean "is a function that can be invoked"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -51,6 +53,22 @@ | |||
.put(List.class, index -> typeConversionCode("List", index)) | |||
.build(); | |||
|
|||
// Templates used to generate the UDF code | |||
private static final String genericTemplate = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer!
+ "if(args[#INDEX] == null) arg#INDEX = null;\n" | ||
+ "else if (args[#INDEX] instanceof #TYPE) arg#INDEX = (#TYPE)args[#INDEX];\n" | ||
+ "else if (args[#INDEX] instanceof String) arg#INDEX = " | ||
+ "#TYPE.valueOf((String)args[#INDEX]);\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line basically coercing an unmatched type into a string? Wouldn't it be better to just throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay - grokable code!
I'm with @apurvam though - implicit type coercion scares me. Why do we need it? Why not just throw?
If we need this, then at the very least let's wrap this in a try/catch so we can throw a more meaningful error. Also, what if #Type.valueOf(String)
does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - there are no unmatched types as the types need to be in typeConverters
map.
For types that are not Map
or List
(as they are handled differently), it is converting it from a String to the type by using the valueOf()
method, i.e., Boolean.valueOf(String)
, Long.valueOf(String)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a test to UdfCompilerTest
to show that an exception will be raised if the type isn't one that we accept
this.blackList = builder.toString().equals(EMPTY_BLACKLIST) | ||
? "" | ||
: builder.toString(); | ||
this.blackList = Files.readLines(inputFile, Charset.forName(StandardCharsets.UTF_8.name())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a comment describing how this block is parsing the blacklist file. Specifically, how are strings being transformed with replace and join? The regexes are not straightforward to interpret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dguy - looking much better mate. Thanks for making the changes.
Some more comments, suggests and a few concerns below...
.filter(line -> !line.isEmpty()) | ||
.filter(line -> !line.startsWith("#")) | ||
.map(line -> line.replaceAll("\\.", "\\\\.")) | ||
.collect(Collectors.joining("|", "^(?:",")\\.?.*$")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting - needs space after comma, to make it easier to grok.
|
||
private String blackList; | ||
private String blackList = ".*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: moving ".*" into constant BLACKLIST_ALL
would help impart meaning more.
|
||
/** | ||
* Used to restrict the classes that can be loaded by user supplied UDFs | ||
*/ | ||
public class Blacklist implements Predicate<String> { | ||
private static final Logger logger = LoggerFactory.getLogger(Blacklist.class); | ||
private static final String EMPTY_BLACKLIST = "^(?)\\.?.*$"; | ||
private static final String EMPTY_BLACKLIST = "^(?:)\\.?.*$"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split into two constants:
private static final String BLACKLIST_PREFIX = "^(?:";
private static final String BLACKLIST_SUFFIX = ")\\.?.*$";
Then your collect line can be:
.collect(Collectors.joining("|", BLACKLIST_PREFIX, BLACKLIST_SUFFIX));
And your empty check can be:
if (this.blackList.equals(BLACKLIST_PREFIX + BLACKLIST_SUFFIX))
Which I think is easier to understand.
@@ -60,6 +59,6 @@ | |||
|
|||
@Override | |||
public boolean test(final String resourceName) { | |||
return blackList == null || resourceName.matches(blackList); | |||
return resourceName.matches(blackList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of interest, why are we using a regex for such simple matching? Can't we just dump each line into a Set
and use contains(resourceName)
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intended to do partial matches, i.e., the regex produced would be something like:
^(?:java\.lang\.Process|java\.lang\.Runtime|javax)\.?.*$
which will match anything starting with java.lang.Process
, java.lang.Runtime
, javax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha.
+ "if(args[#INDEX] == null) arg#INDEX = null;\n" | ||
+ "else if (args[#INDEX] instanceof #TYPE) arg#INDEX = (#TYPE)args[#INDEX];\n" | ||
+ "else if (args[#INDEX] instanceof String) arg#INDEX = " | ||
+ "#TYPE.valueOf((String)args[#INDEX]);\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay - grokable code!
I'm with @apurvam though - implicit type coercion scares me. Why do we need it? Why not just throw?
If we need this, then at the very least let's wrap this in a try/catch so we can throw a more meaningful error. Also, what if #Type.valueOf(String)
does not exist?
private static String typeConversionCode(final String type, final int index) { | ||
if (type.equals("Map") || type.equals("List")) { | ||
return type + " arg" + index + " = (" + type + ")args[" + index + "];\n"; | ||
} | ||
final String argArrayVal = "args[" + index + "]"; | ||
final String argVarAssignment = "arg" + index + " = "; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numericValue
can just be in the code template, and then you can simplify this code:
private static final String INTEGER_NUMBER_TEMPLATE =
"else if (args[#INDEX] instanceof Number) arg#INDEX = "
+ "((Number)args[#INDEX]).intValue();\n";
private static final String NUMBER_TEMPLATE =
"else if (args[#INDEX] instanceof Number) arg#INDEX = "
+ "((Number)args[#INDEX]).#LC_TYPEValue();\n";
private static String typeConversionCode(final String type, final int index) {
if (type.equals("Map") || type.equals("List")) {
return type + " arg" + index + " = (" + type + ")args[" + index + "];\n";
}
final StringBuilder builder = new StringBuilder();
builder.append(genericTemplate);
if (type.equals("Integer")) {
builder.append(INTEGER_NUMBER_TEMPLATE);
} else if (!type.equals("String") && !type.equals("Boolean")) {
builder.append(GENERIC_NUMBER_TEMPLATE);
}
builder.append(THROWS_TEMPLATE);
return builder.toString()
.replaceAll("#TYPE", type)
.replaceAll("#LC_TYPE", type.toLowerCase())
.replaceAll("#INDEX", String.valueOf(index));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm still wondering if there is scope for this function to generate invalid code. Even if we know today that its only going to be called with a known set of type
parameter values, I think it would still be sensible to validate type
against a supported list and throw if not supported. This is better then generating code that won't compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really see your code as simplification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it switching the code to only have one set of replaceAll
s, i.e. it generates the whole set of code before doing the replacements, and moves the code generation in:
final String numericValue = type.equals("Integer") ? "intValue()" : type.toLowerCase()
.append(argVarAssignment) + + "Value()";
Into the easy to grok constants.
})); | ||
} | ||
|
||
private Object instantiateUdfClass(final Method method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Description
Follow up from #1366