Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-29000][python] Support python UDF in the SQL Gateway #21725

Merged
merged 3 commits into from
Jan 19, 2023

Conversation

HuangXingBo
Copy link
Contributor

What is the purpose of the change

This pull request will support python UDF in the SQL Gateway

Brief change log

  • Support python UDF in the SQL Gateway

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 19, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@HuangXingBo HuangXingBo changed the title [FLINK-2900][python] Support python UDF in the SQL Gateway [FLINK-29000][python] Support python UDF in the SQL Gateway Jan 19, 2023
Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great work. I left some suggestions.

@@ -1097,6 +1097,12 @@ under the License.
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add python dependencies here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't add the python jar as test dependency, HiveServer2EndpointITCase and HiveServer2EndpointStatementITCase will failed.

dependencies.add(location);
}
} catch (URISyntaxException | ClassNotFoundException e) {
throw new SqlExecutionException("Failed to find flink-python jar.", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw SqlGatewayException. In the SqlGateway#startSqlGateway, we will notice users it's a bug if the type of the exception is not SqlGatewayException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Comment on lines 62 to 77
final List<URL> dependencies = new ArrayList<>();
// add python dependencies by default
try {
URL location =
Class.forName(
"org.apache.flink.python.PythonFunctionRunner",
false,
Thread.currentThread().getContextClassLoader())
.getProtectionDomain()
.getCodeSource()
.getLocation();
if (Paths.get(location.toURI()).toFile().isFile()) {
dependencies.add(location);
}
} catch (URISyntaxException | ClassNotFoundException e) {
throw new SqlExecutionException("Failed to find flink-python jar.", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this code block to DefaultContext#load? I think we will introduce -l/-j comand line paramters in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is the -j approach, I guess that you want to make the loading of python jar an optional behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can modify the constructor of the DefaultContext to pass the dependencies. During the loading, we always try to find the python dependencies and add it into the dependencies.

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@fsk119 fsk119 merged commit 7b69e93 into apache:master Jan 19, 2023
chucheng92 pushed a commit to chucheng92/flink that referenced this pull request Feb 3, 2023
…1725)

* [FLINK-29000][python] Support python UDF in the SQL Gateway

* fix comments

* fix checkstyle
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
…1725)

* [FLINK-29000][python] Support python UDF in the SQL Gateway

* fix comments

* fix checkstyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants