Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-2935] [scala-shell] Allow Scala shell to connect Flink cluster on YARN #1500

Closed
wants to merge 3 commits into from

Conversation

chiwanpark
Copy link
Member

Please check a JIRA issue related to this PR. This PR is tested with Hadoop YARN 2.6.2. It works on Scala 2.10 and 2.11 both.

Because there is duplicated method parseHostPortAddress in CliFrontend and RemoteExecutor classes, I moved the method to a new class ClientUtils. This PR contains also some refactoring of FlinkShell class.

Any comments are welcome.

bin/start-scala-shell.sh yarn
~~~

The shell reads the connection information of the deployed Flink cluster from a `.yarn-properties` file in temporary directory. If there is no deployed Flink cluster on YARN, the shell prints error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prints an error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the .yarn-properties-USER file_, which is created in the configured yarn.properties-file.location directory or the temporary directory_.

@tillrohrmann
Copy link
Contributor

Thanks for your work @chiwanpark. On the whole your changes look good. I had some minor comments.

I also think that this won't work if the YARN cluster was started with HA and the first master has died. The reason is that the .yarn-properties file is read to connect to the JobManager. But this file is never updated in case of a new leader. However, this is also a problem which will occur with a normal Flink job right now.

@tillrohrmann
Copy link
Contributor

I have to correct myself. This should not be an issue if you start a Yarn cluster in HA and then use bin/flink run ..., because a LeaderRetrievalService will be instantiated which will take care of this.

However, it should be a problem here. The FlinkILoop instantiates a ScalaShellRemoteEnvironment which is a sub class of RemoteEnvironment. Since RemoteEnvironment is called with null for the client configuration, it cannot find a possible ZooKeeper instance where the JobManager is registered in the HA case. I think you have to forward the Configuration object created from flink-conf.yaml to the RemoteEnvironment in order to make it work.

@chiwanpark
Copy link
Member Author

@tillrohrmann Thanks for review! I have addressed your comments.

@chiwanpark
Copy link
Member Author

About the type of number of containers for TaskManagers, it should be optional because the user can the Scala Shell without specifying the number when the user wants to connect already deployed YARN cluster. (bin/start-scala-shell.sh yarn)

The number is required in only case of deploying YARN cluster for the shell.

@aljoscha
Copy link
Contributor

@tillrohrmann do you have any comments on @chiwanpark's updates?


File yarnPropertiesFile = new File(propertiesFileLocation + File.separator + CliFrontend.YARN_PROPERTIES_FILE + currentUser);
// file that we write into the conf/ dir containing the jobManager address and the dop.
File yarnPropertiesFile = new File(getYarnPropertiesLocation(yarnCluster.getFlinkConfiguration()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The yarnPropertiesFile path construction is also done in CliFrontend:173. This should also be changed to use the new getYarnPropertiesLocation method. Maybe moving this method to CliFrontend makes sense.

@tillrohrmann
Copy link
Contributor

Changes look good. I had only one minor comment left. After addressing this comment, I think we can merge the PR.

@chiwanpark
Copy link
Member Author

@tillrohrmann Thanks for review. I'll address the comment in today.

@chiwanpark
Copy link
Member Author

I've addressed @tillrohrmann comment. After running CI, we can merge this.

@chiwanpark
Copy link
Member Author

Failing test seems not related to this PR.

… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class
@chiwanpark
Copy link
Member Author

Hi all, I would like to merge this. If there is no objection and travis passes, I'll merge this to master in few days.

@chiwanpark
Copy link
Member Author

I just found PR #1741 which contains a lot of changes of YARN classes. I'll wait until merging them.

@mxm
Copy link
Contributor

mxm commented Mar 24, 2016

Thanks @chiwanpark! As far as I can see, there are no overlapping classes between the two pull requests.

@chiwanpark
Copy link
Member Author

Oh, thanks for pointing it @mxm. Then I'll merge this.

@asfgit asfgit closed this in 5108f68 Mar 24, 2016
stefanobaghino pushed a commit to radicalbit/flink that referenced this pull request Apr 14, 2016
… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class

This closes apache#1500.
alkagin pushed a commit to radicalbit/flink that referenced this pull request Apr 19, 2016
… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class

This closes apache#1500.
alkagin pushed a commit to radicalbit/flink that referenced this pull request Apr 20, 2016
… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class

This closes apache#1500.
fijolekProjects pushed a commit to fijolekProjects/flink that referenced this pull request May 1, 2016
… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class

This closes apache#1500.
stefanobaghino pushed a commit to radicalbit/flink that referenced this pull request May 13, 2016
… on YARN

  - Remove duplicated parseHostPortAddress method (Move it to ClientUtils class)
  - Refactor FlinkShell class

This closes apache#1500.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants