Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-17125][python] Add a Usage Notes Page to Answer Common Questions Encountered by PyFlink Users #11878

Closed
wants to merge 5 commits into from

Conversation

HuangXingBo
Copy link
Contributor

What is the purpose of the change

This pull request add a Usage Notes doc to answer common questions encountered by PyFlink users

Brief change log

  • Add usage_notes.md and usage_notes.zh.md

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 06e2423 (Thu Apr 23 08:45:25 UTC 2020)

Warnings:

  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 23, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@sjwiesman sjwiesman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, providing answers to these types of questions greatly improves usability for our users. I found some English grammar and spelling issues that need to be resolved. I also proposed a different way of supplying the virtual env bash script.

@@ -0,0 +1,120 @@
---
title: "Usage Notes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing this to "Common Questions"?

Suggested change
title: "Usage Notes"
title: "Common Questions"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Make sense.

Comment on lines 29 to 33
## How to Add Jars
A PyFlink job often depends on some jar packages, such as connector jar packages, java udf jar packages and so on.
So how to make PyFlink refer to these jar packages?

You can specify the dependencies with the following Python Table APIs or through command line arguments directly when submitting the job。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## How to Add Jars
A PyFlink job often depends on some jar packages, such as connector jar packages, java udf jar packages and so on.
So how to make PyFlink refer to these jar packages?
You can specify the dependencies with the following Python Table APIs or through command line arguments directly when submitting the job
## How to Add Jars
A PyFlink job often depends on some jar packages, such as connector jar packages, java UDF jar packages, and so on.
So how to make PyFlink refer to these jar packages?
You can specify the dependencies with the following Python Table APIs or through command-line arguments directly when submitting the job.

## How to Watch UDF Log
There are different solutions for watching UDF log in local mode and cluster mode.
#### Local
you will see all udf log infos in the console directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
you will see all udf log infos in the console directly.
When running locally, all you will see all UDF logs in the console directly.

#### Local
you will see all udf log infos in the console directly.
#### Cluster
you can see the udf log infos in the taskmanager log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
you can see the udf log infos in the taskmanager log.
On cluster deployments, you can see the UDF logs in the taskmanager logs.

you can see the udf log infos in the taskmanager log.

## How to Prepare a Python Virtual Env Used by PyFlink
You can refer to the following script to prepare a Python virtual env zip which can be used in mac os and most Linux distributions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, why don't we include the below script as a downloadable bash script? You can simply copy the below into a file setup-pyflink-virtual-env.sh in the same directory and then the sentence could be the following:

Suggested change
You can refer to the following script to prepare a Python virtual env zip which can be used in mac os and most Linux distributions
You can download a [convenience script](setup-pyflink-virtual-env.sh) to prepare a Python virtual env zip which can be used on Mac OS and most Linux distirbutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.Thanks.

Comment on lines 77 to 79
Firstly, you need to prepare a Python Virtual Env Used by PyFlink (You can refer to the previous section).

Then, you should execute the following script to activate your used Python virtual environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Firstly, you need to prepare a Python Virtual Env Used by PyFlink (You can refer to the previous section).
Then, you should execute the following script to activate your used Python virtual environment
To run Python UDFs locally, first, prepare a Python Virtual Env used by PyFlink as described in the previous section.
Then, execute the following script to activate your environment.

{% endhighlight %}

## How to Run Python UDF in Cluster
There are two ways of preparing a Python environment installed PyFlink which can be used in the Cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are two ways of preparing a Python environment installed PyFlink which can be used in the Cluster
There are two ways of preparing a Python environment installed PyFlink, which can be used in the Cluster.

#### Use Uploaded Virtual Environment
You can use `pyarch` command line arg or `add_python_archive` api of table_env to upload Python virtual environment and use `pyexec` command line arg or `set_python_executable` api to specify the path of the python interpreter which is used to execute the python udf workers.

For details about the command line args of `pyarch` and `pyexec`, you can refer to <a href="{{ site.baseurl }}/ops/cli.html#usage">command line arguments.</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For details about the command line args of `pyarch` and `pyexec`, you can refer to <a href="{{ site.baseurl }}/ops/cli.html#usage">command line arguments.</a>
For details about the command line args of `pyarch` and `pyexec`, you can refer to [the relevant documentation]({{ site.baseurl }}/ops/cli.html#usage).

@HuangXingBo
Copy link
Contributor Author

Thanks a lot for @sjwiesman review. Before you help review, I made relatively large changes to the content of this document, I put these in the fix commit. Then based on your comments, I made some changes, including adding the script setup-pyflink-virtual-env.sh, I put these changes in the fix-comment commit.

source venv/bin/activate

# install PyFlink
pip install apache-flink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow users to specify the version of pyflink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.Make sense

under the License.
-->

This page describes the solutions to some frequently encountered problems for PyFlink users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This page describes the solutions to some frequently encountered problems for PyFlink users.
This page describes the solutions to some come questions for PyFlink users.


{% highlight shell %}
# you will get a Python virtual environment required by PyFlink version 1.10
setup-pyflink-virtual-env.sh 1.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the version is always up to date.

Suggested change
setup-pyflink-virtual-env.sh 1.10
setup-pyflink-virtual-env.sh {{ site.version }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, and {{site.version}} will get something like 1.11-SNAPSHOT which is not what I want. My script will install the latest version for the case of not passing the version parameter. Perhaps for the master's doc, it will be more natural to not pass the version parameter.What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I support a specified version example and a no version example

Copy link
Contributor Author

@HuangXingBo HuangXingBo Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% highlight shell %}
# you will get a Python virtual environment required by PyFlink version 1.10
setup-pyflink-virtual-env.sh 1.10

# you will get a Python virtual environment required by PyFlink with the latest version
setup-pyflink-virtual-env.sh
{% endhighlight %}

Copy link
Contributor

@sjwiesman sjwiesman Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

{% if site.is_stable %}
$ setup-pyflink-virtual-env.sh {{ site.version }}
{% else %}
$ setup-pyflink-virtual-env.sh
{% endif %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks better.I will update soon.Thanks.

@sjwiesman
Copy link
Contributor

One final comment and then +1 to merge

@HuangXingBo
Copy link
Contributor Author

Thanks a lot for @sjwiesman and @dianfu review, I have addressed comments at the latest commit.

@sjwiesman
Copy link
Contributor

This looks good to me! Thank you for putting this together, I’ll go ahead and merge latter today

@sjwiesman sjwiesman closed this in fe78fde Apr 30, 2020
@HuangXingBo
Copy link
Contributor Author

Hello, @sjwiesman , I think there is a little problem about that commit [hotfix] Fix formatting on python common questions 9740469

  1. We probably shouldn't remove the zh in the Chinese document, otherwise it will index to the English document.
  2. In the Chinese document, we have no way to directly index to the setup-pyflink-virtual-env.sh script using [convenience script](setup-pyflink-virtual-env.sh), so I display the script address corresponding to the English document on the Chinese document. Do you have a better solution about this?

@sjwiesman
Copy link
Contributor

Hi @HuangXingBo,

Very sorry about that. I have fixed the links in the chinese document and moved the setup script to a common downloads folder so that it works properly on both pages.

@HuangXingBo
Copy link
Contributor Author

Thanks a lot for the fix @sjwiesman . I have cherry-pick these commits to release-1.10 #11844 and change the content of Adding Jar Files which is different in release-1.10 and master. Could you help review? Thanks.

@sjwiesman
Copy link
Contributor

Sure

sjwiesman pushed a commit to sjwiesman/flink that referenced this pull request Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants