TAJO-1092: Improve the function system to allow other function implementation types.#178
TAJO-1092: Improve the function system to allow other function implementation types.#178hyunsik wants to merge 2 commits intoapache:masterfrom
Conversation
|
An example of static method in Java class is https://github.com/apache/tajo/pull/178/files#diff-f6daa76b2459470a9f3412131c0f726bR34. I designed the function annotation system to point Function Collection, which is a class including multiple static functions. For user-defined functions and built-in functions, just add function as the example. It is very easy and it enables Tajo to reuse existing functions. Besides, as you can see, SQL is based on three-valued logic (http://en.wikipedia.org/wiki/Three-valued_logic). So, each value can be nullable. Despite of boolean type, one boolean type value can be three values: TRUE, FALSE, and UNKNOWN (NULL in SQL). In the current function system, each function must deal with NULL value explicitly. Most of functions usually return NULL if at least of one parameter is NULL. In order to mitigate such a problem and to make function invocation more efficiently, I designed new function binder and new function definition approach to keep hints how a function handles NULL value. The hints are described in function parameters in a function definition. You can specify the hints by using java primitive type or class primitive type as each parameter according to null handling way. For example: This The following function definition allow NULL value as both input parameters. In this case, this function must handle NULL value explicitly. In addition, the function binder allows a mixed use of primitive types and class primitive types. When mixed definition is used, the function binder only allow class primitive types to handle NULL values explicitly. Finally, the function binder is generated on the fly by java byte code generation technique, and it does not have any overheads even though the logic is very complex. Also, I'm expecting that this idea will remove significantly the overhead of Datum uses in the existing function system. |
|
After this patch is committed, I'll add a documentation about how making Tajo user-defined functions using the proposed design. |
…into TAJO-1092 Conflicts: tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java
|
rebased. |
|
Looks great to me! |
|
The function support will be added in my next patch. Thank you for your review. |
|
This patch provide backward compatibility, so there is no issue. |
|
Thank you for your review. I'll commit it shortly. |
See https://issues.apache.org/jira/browse/TAJO-1092.
In the current function system, each function implementation is a single Java class subclassed from org.apache.tajo.catalog.function.Function.
In this approach, there are many rooms for improvement. This approach always uses Datum as input and output values of functions, creating unnecessary objects. It does not likely to exploit given information included query statements; for example, some parameters are constants or variables.
In this issue, I propose the improvement to allow the function system to support other function implementation types. In addition, I propose three function implementation types:
Later, we could expand this feature to allow Pig or Hive functions in Tajo.