Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-13131] [Deployment / YARN] add a custom directory for storing configuration files and jar files for user jobs #9007

Closed
wants to merge 0 commits into from

Conversation

zhangbinzaifendou
Copy link

The user-related jars and configuration files can only be placed under fileSystem.getHomeDirectory(/home/xxx), which is inflexible and increases the configuration so that users can customize the directory.

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 6, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 37aee5e (Tue Aug 06 15:59:52 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Member

@Myasuka Myasuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbinzaifendou Thanks for your contribution.
However, it did not convince me why we need this change. All files are uploaded to /user/user-name/.flink/appId and controlled totally by Flink itself (including create the folder and delete that folder). IMO this should be enough for current usage, there is no obvious benefit for user to configure this folder to another location. Moreover, if configuring to another location, I think there is no guarantee that we could set permission under not-home directory.

Besides, please refer to open-a-pull-request doc to run mvn clean verify before creating the PR. Flink does not allow star import when executing checkstyle plugin.

@zhangbinzaifendou
Copy link
Author

@Myasuka Thank you very much for your review and Suggestions

  1. In hadoop 2.7.2, if dfs.user.home.dir.prefix=/home is set, but the value returned by getHomeDirectory is /user/xxx, the actual online environment has speciality, so I think flink should provide a flexible configuration. So that the user can set a directory, if not set, use the default (/user/xxx).
  2. I have modified the problem of import *
    Thank you!

@Myasuka
Copy link
Member

Myasuka commented Jul 7, 2019

@zhangbinzaifendou Please correct me if I am wrong, if dfs.user.home.dir.prefix has been configured to another value, e.g. "/home". From Hadoop latest code, DistributedFileSystem would return the home directory as "/home/userxxx" via DFSUtilClient. In other words, there is no need for flink user to configure the staging folder since Hadoop could return the wanted home directory even dfs.user.home.dir.prefix changed.

@zhangbinzaifendou
Copy link
Author

@Myasuka Sorry for not being precise.
The apache community version of hadoop is fine, there is a problem in our hadoop version, I also said that our online environment is very special, for our online environment, so we need to add a configuration that can meet our needs very simply. It also adds flexibility to flink.If the user wants to redefine a new directory, flink can't do it now. So the benefits are obvious.
Thank you!

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 7, 2019

CI report for commit fcd38b1: FAILURE Build

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 7, 2019

CI report for commit 310dd42: FAILURE Build

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 7, 2019

CI report for commit 87442d9: FAILURE Build

@Myasuka
Copy link
Member

Myasuka commented Jul 7, 2019

@zhangbinzaifendou as you explained, the motivation to create this PR is due to your special online environment and your Hadoop. However, this is not the reason that you have to modify the Flink source code. I think the real solution is to modify your special Hadoop which caused this problem.

As an open source software, Flink should adapt to the standard or widely-used ecosystems not some special customized ones. There would be no problems if we follow the correct logic of FileSystem.getHomeDirectory() even dfs.user.home.dir.prefix has been configured. And Flink use FileSystem.getHomeDirectory() to store its internal jars and files which is totally transparent to users. Exposing this internal implementation to users by adding such a configuration, especially user might configure this folder to a folder without permission, would never be a good idea.

@zhangbinzaifendou
Copy link
Author

@Myasuka Thanks for your reply.
Our design is based on the needs of the company, the changes are simple, but the impact is huge. This is obviously very flexible if Flink supports user-defined directories. I believe that many open source software must have a lot of unreasonable design from the beginning, and it is gradually matured based on the continuous improvement of many companies' real scenes. It is because we have such a scenario that we find that Flink design is not flexible enough, so we need to increase this feature, so that it will not affect the original use, but also ensure flexibility, is it better?

@wangyang0918
Copy link
Contributor

Hi @zhangbinzaifendou, thanks for your contribution.

I agree with @Myasuka. Your private hadoop version should be updated to support this scenario. Maybe you could build your own flink-shaded-hadoop uber jar.

@zhangbinzaifendou
Copy link
Author

@wangyang0918 Thank you for your answer.
I'm afraid I don't quite agree with you. User scenarios are very complex, shouldn't there be a flexible configuration? Providing flexible configuration is a principle of good architectural design.

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 9, 2019

CI report for commit 54c43e9: FAILURE Build

@flinkbot
Copy link
Collaborator

CI report for commit 54c43e9: FAILURE Build

@flinkbot
Copy link
Collaborator

CI report for commit 716838a: FAILURE Build

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 22, 2019

CI report:

@zhangbinzaifendou zhangbinzaifendou changed the title [FLINK-13131] [Deployment / YARN] add a custom directory for storing configuration files and jar files for user jobs <a href="https://issues.apache.org/jira/browse/FLINK-13131">[FLINK-13131]</a> [Deployment / YARN] add a custom directory for storing configuration files and jar files for user jobs Apr 16, 2021
@zhangbinzaifendou zhangbinzaifendou changed the title <a href="https://issues.apache.org/jira/browse/FLINK-13131">[FLINK-13131]</a> [Deployment / YARN] add a custom directory for storing configuration files and jar files for user jobs [FLINK-13131] [Deployment / YARN] add a custom directory for storing configuration files and jar files for user jobs Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants