-
Notifications
You must be signed in to change notification settings - Fork 111
Conversation
…into python_udf
…into TAJO-1344
…into TAJO-1344
…into TAJO-1344
…into TAJO-1344
…into TAJO-1344 Conflicts: tajo-core/src/test/java/org/apache/tajo/TajoTestingCluster.java
…into TAJO-1344
…into TAJO-1344
Ok. I think that this patch is ready for review. To address @hyunsik's comment, I added a class called On executing Python scripts, I used the approach of using an external UDF controller that is responsible for executing python scripts as commented above. When a submitted query involves one or more python UDFs, several UDF controllers are executed to compute UDFs. Input/output tuples are transmitted via stdio. This approach may have an issue on performance, but I think it is inevitable without using Jython. Currently, the controller is executed for each Python functions. That is, if a query involves 5 Python functions even some of them are same, at least 5 different controllers are executed during query processing. I chose this architecture due to its simplicity. Here are some highlights of changes.
For reviewers, I apologize for a large patch. But many changes are related to just refactoring of the bind() function and renaming some functions. |
Great!! @jihoonson |
Thank you all guys for your efforts. I also check only some design points. |
…into TAJO-1344_3 Conflicts: tajo-core/src/test/java/org/apache/tajo/engine/query/TestSelectQuery.java
…into TAJO-1344_3 Conflicts: tajo-core/src/main/java/org/apache/tajo/worker/Task.java
* @return | ||
* @throws IOException | ||
*/ | ||
public static Map<FunctionSignature, FunctionDesc> loadOptionalFunctions(TajoConf conf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, 'user-defined function' would be a better word than 'optional function'
I leaved some trivial comments. The patch looks good to me. It's a great job. Here is my additional comments. You try to change the signature EvalNode::bind to take EvalContext instance in order to get an instance of a launched script engine. Even though this refactoring is already a breaking change. EvalNode still needs more information about task. In this chance, it would be great if we refactor EvalNode to take more meta context including TajoConf, task attempt information, and shared resources of workers. Then, we can probably remove OverridableConf parameter from all constructors of EvalNode. If you are concerned with a large patch, we can do this work in another jira. But, it will cause the second breaking change. It is also not good as much as a large patch. You can choose either two breaking changes or a large patch. It's up to you. |
…into TAJO-1344_3
This reverts commit 43e687d.
Thanks @hyunsik. I've changed the function name and removed Regarding on refactoring bind() function, I think that it would be better to work in another jira. For this issue, we first should decide what information are required to be passed to EvalNode. I booked another jira (https://issues.apache.org/jira/browse/TAJO-1566). |
I got your point and your plan. Please keep going. Here is my +1. |
I've tested on my laptop. |
No, Looks good to me. Here is my +1 |
I found some problems of version confliction when using Jython.
So, I used another approach of using pipe.
Most codes are borrowed from Pig.