Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GenericTransformFunction wrapper for simple ScalarFunctions #5440

Merged
merged 12 commits into from
Jun 3, 2020

Conversation

KKcorps
Copy link
Contributor

@KKcorps KKcorps commented May 25, 2020

Today, adding new TransformFunction is quite involved. Supporting simple static java functions requires one to write a lot of boiler plate code. This PR adds a generic Transform Function that can invoke any of the registered functions in FunctionRegistry.

This PR tests this new feature with StringFunctions.

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Change the description to say that this is a generic transform function that can call any scalarfunction that return a single value


@Override
public TransformResultMetadata getResultMetadata() {
return STRING_SV_NO_DICTIONARY_METADATA;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add another switch statement in init to set this based on the return type of the scalarfunction, see functionInvoker.getReturnType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For return type just a switch won't work
Different return types will need a different function overrides
e.g. transformToStringValuesSV for string, transformToIntValuesSV for int and so on
Or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we need to override methods for int, long, double, float. Shouldn't be hard, its mostly copy-paste. similar to _stringResultArray, create a resultArray for each type and instantiate them in the init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it look better now?

@xiangfu0
Copy link
Contributor

This is great!
Could you add tests in: CalciteSqlCompilerTest to check sql parsing logic.


public class GenericTransformFunctionTest extends BaseTransformFunctionTest {

@Test
Copy link
Contributor

@siddharthteotia siddharthteotia May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this exercises the compilation path. You should add tests in CalciteSqlCompiler test file as well.

Also, we should add tests for end to end query execution. See TransformQueriesTest class or consider adding execution tests to one of the existing query tests

@KKcorps
Copy link
Contributor Author

KKcorps commented May 26, 2020

@fx19880617 @siddharthteotia Should I add tests in CalciteSQL for all the functions?

@kishoreg kishoreg changed the title Add UDFs for String Transformation Add GenericTransformFunction wrapper for simple ScalarFunctions May 26, 2020
@siddharthteotia
Copy link
Contributor

siddharthteotia commented May 29, 2020

@fx19880617 @siddharthteotia Should I add tests in CalciteSQL for all the functions?

@KKcorps , sorry missed this. Yes, the query compilation tests should be in CalciteSqlCompilerTest. Here we can verify that PinotQuery is being built correctly and that gets converted to BrokerRequest correctly. Most other tests in this file do this validation.

The other suggestion was to also add unit tests for exercising end-to-end execution path. Please consider adding these tests to an appropriate file in /incubator-pinot/pinot-core/src/test/java/org/apache/pinot/queries/. May be TransformQueriesTest

pinot-common/pom.xml Outdated Show resolved Hide resolved
TransformFunction function = arguments.get(i);
if (function instanceof LiteralTransformFunction) {
String literal = ((LiteralTransformFunction) function).getLiteral();
Class paramType = _functionInvoker.getParameterTypes()[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest directly comparing class instead of String comparison for both performance and readability:

if (paramType == Integer.class) {
  ...
} else if (paramType == Long.class) {
 ...

Copy link
Contributor Author

@KKcorps KKcorps Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require changing switch case to a lot of else if since switch doesn't accept Class type. IMO, switch case looks neat so made this trade off

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature! LGTM except for the unaddressed comments

@siddharthteotia
Copy link
Contributor

Is the plan to use this wrapper solely for invoking scalar functions (like already done in this PR for StringFunctions) or are we expecting follow-ups to integrate it with rest of the transform functions. I think only the former?

@kishoreg
Copy link
Member

kishoreg commented Jun 3, 2020

@sidd only for scalarfunctions.

@kishoreg kishoreg merged commit f7417ff into apache:master Jun 3, 2020
Jackie-Jiang added a commit that referenced this pull request Jun 10, 2020
The new FunctionRegistry change of using reflection to plug in methods (introduced in #5440) is causing failure in query compilation.
The problem is caused by Reflections library not being thread-safe when multiple threads are accessing the same jar file.
We have multiple libraries (jersey, swagger) using reflection to load classes. The FunctionRegistry uses reflection in the static block, which happens when the class is loaded. If the first query arrives (first usage of FunctionRegistry which runs the static block) during the setup of the jersey server, Reflections will throw exception.
Using reflection in the static block can also cause the first query to block on reflection scanning the classes, which can potentially take seconds.
Read more about the thread-safety issue here: ronmamo/reflections#81
Users are reporting the same issue even with the latest Reflections version 0.9.12.
A common solution is to downgrade Reflections to 0.9.9, but we have other dependencies rely on 0.9.11, so downgrading might cause other issues.

The solution here is to introduce a no-op init() method to trigger the static block so that we can control when to scan the files to avoid the thread-safety issue as well as blocking the first query.
Another benefit of using an init() method is that the exception won't be swallowed (the exception in static block will be swallowed and query engine will start getting ClassNotFoundException)

This PR also introduces a convention for the plugin methods using reflection, where the class must includes ".function." in its class path.
This can significantly reduce the time of class scanning (reduced from 4 seconds to 200 ms locally)
@Jackie-Jiang Jackie-Jiang mentioned this pull request Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants