-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17125][python] Add a Usage Notes Page to Answer Common Questions Encountered by PyFlink Users #11878
Conversation
…ns Encountered by PyFlink Users
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 06e2423 (Thu Apr 23 08:45:25 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, providing answers to these types of questions greatly improves usability for our users. I found some English grammar and spelling issues that need to be resolved. I also proposed a different way of supplying the virtual env bash script.
docs/dev/table/python/usage_notes.md
Outdated
@@ -0,0 +1,120 @@ | |||
--- | |||
title: "Usage Notes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about changing this to "Common Questions"?
title: "Usage Notes" | |
title: "Common Questions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Make sense.
docs/dev/table/python/usage_notes.md
Outdated
## How to Add Jars | ||
A PyFlink job often depends on some jar packages, such as connector jar packages, java udf jar packages and so on. | ||
So how to make PyFlink refer to these jar packages? | ||
|
||
You can specify the dependencies with the following Python Table APIs or through command line arguments directly when submitting the job。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## How to Add Jars | |
A PyFlink job often depends on some jar packages, such as connector jar packages, java udf jar packages and so on. | |
So how to make PyFlink refer to these jar packages? | |
You can specify the dependencies with the following Python Table APIs or through command line arguments directly when submitting the job。 | |
## How to Add Jars | |
A PyFlink job often depends on some jar packages, such as connector jar packages, java UDF jar packages, and so on. | |
So how to make PyFlink refer to these jar packages? | |
You can specify the dependencies with the following Python Table APIs or through command-line arguments directly when submitting the job. |
docs/dev/table/python/usage_notes.md
Outdated
## How to Watch UDF Log | ||
There are different solutions for watching UDF log in local mode and cluster mode. | ||
#### Local | ||
you will see all udf log infos in the console directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will see all udf log infos in the console directly. | |
When running locally, all you will see all UDF logs in the console directly. |
docs/dev/table/python/usage_notes.md
Outdated
#### Local | ||
you will see all udf log infos in the console directly. | ||
#### Cluster | ||
you can see the udf log infos in the taskmanager log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can see the udf log infos in the taskmanager log. | |
On cluster deployments, you can see the UDF logs in the taskmanager logs. |
docs/dev/table/python/usage_notes.md
Outdated
you can see the udf log infos in the taskmanager log. | ||
|
||
## How to Prepare a Python Virtual Env Used by PyFlink | ||
You can refer to the following script to prepare a Python virtual env zip which can be used in mac os and most Linux distributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, why don't we include the below script as a downloadable bash script? You can simply copy the below into a file setup-pyflink-virtual-env.sh
in the same directory and then the sentence could be the following:
You can refer to the following script to prepare a Python virtual env zip which can be used in mac os and most Linux distributions | |
You can download a [convenience script](setup-pyflink-virtual-env.sh) to prepare a Python virtual env zip which can be used on Mac OS and most Linux distirbutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.Thanks.
docs/dev/table/python/usage_notes.md
Outdated
Firstly, you need to prepare a Python Virtual Env Used by PyFlink (You can refer to the previous section). | ||
|
||
Then, you should execute the following script to activate your used Python virtual environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, you need to prepare a Python Virtual Env Used by PyFlink (You can refer to the previous section). | |
Then, you should execute the following script to activate your used Python virtual environment | |
To run Python UDFs locally, first, prepare a Python Virtual Env used by PyFlink as described in the previous section. | |
Then, execute the following script to activate your environment. |
docs/dev/table/python/usage_notes.md
Outdated
{% endhighlight %} | ||
|
||
## How to Run Python UDF in Cluster | ||
There are two ways of preparing a Python environment installed PyFlink which can be used in the Cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways of preparing a Python environment installed PyFlink which can be used in the Cluster | |
There are two ways of preparing a Python environment installed PyFlink, which can be used in the Cluster. |
docs/dev/table/python/usage_notes.md
Outdated
#### Use Uploaded Virtual Environment | ||
You can use `pyarch` command line arg or `add_python_archive` api of table_env to upload Python virtual environment and use `pyexec` command line arg or `set_python_executable` api to specify the path of the python interpreter which is used to execute the python udf workers. | ||
|
||
For details about the command line args of `pyarch` and `pyexec`, you can refer to <a href="{{ site.baseurl }}/ops/cli.html#usage">command line arguments.</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For details about the command line args of `pyarch` and `pyexec`, you can refer to <a href="{{ site.baseurl }}/ops/cli.html#usage">command line arguments.</a> | |
For details about the command line args of `pyarch` and `pyexec`, you can refer to [the relevant documentation]({{ site.baseurl }}/ops/cli.html#usage). |
Thanks a lot for @sjwiesman review. Before you help review, I made relatively large changes to the content of this document, I put these in the fix commit. Then based on your comments, I made some changes, including adding the script setup-pyflink-virtual-env.sh, I put these changes in the fix-comment commit. |
b327e09
to
e5a64af
Compare
e5a64af
to
fd3c481
Compare
source venv/bin/activate | ||
|
||
# install PyFlink | ||
pip install apache-flink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow users to specify the version of pyflink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.Make sense
under the License. | ||
--> | ||
|
||
This page describes the solutions to some frequently encountered problems for PyFlink users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page describes the solutions to some frequently encountered problems for PyFlink users. | |
This page describes the solutions to some come questions for PyFlink users. |
|
||
{% highlight shell %} | ||
# you will get a Python virtual environment required by PyFlink version 1.10 | ||
setup-pyflink-virtual-env.sh 1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way the version is always up to date.
setup-pyflink-virtual-env.sh 1.10 | |
setup-pyflink-virtual-env.sh {{ site.version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it, and {{site.version}} will get something like 1.11-SNAPSHOT
which is not what I want. My script will install the latest version for the case of not passing the version parameter. Perhaps for the master's doc, it will be more natural to not pass the version parameter.What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I support a specified version example and a no version example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{% highlight shell %}
# you will get a Python virtual environment required by PyFlink version 1.10
setup-pyflink-virtual-env.sh 1.10
# you will get a Python virtual environment required by PyFlink with the latest version
setup-pyflink-virtual-env.sh
{% endhighlight %}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
{% if site.is_stable %}
$ setup-pyflink-virtual-env.sh {{ site.version }}
{% else %}
$ setup-pyflink-virtual-env.sh
{% endif %}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks better.I will update soon.Thanks.
One final comment and then +1 to merge |
Thanks a lot for @sjwiesman and @dianfu review, I have addressed comments at the latest commit. |
This looks good to me! Thank you for putting this together, I’ll go ahead and merge latter today |
Hello, @sjwiesman , I think there is a little problem about that commit
|
Hi @HuangXingBo, Very sorry about that. I have fixed the links in the chinese document and moved the setup script to a common downloads folder so that it works properly on both pages. |
Thanks a lot for the fix @sjwiesman . I have cherry-pick these commits to release-1.10 #11844 and change the content of |
Sure |
…ns Encountered by PyFlink Users This closes apache#11878
What is the purpose of the change
This pull request add a Usage Notes doc to answer common questions encountered by PyFlink users
Brief change log
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation