-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] [FLINK-2288] [FLINK-2302] Setup ZooKeeper for distributed coordination #886
Conversation
- FLINK-2288: Setup ZooKeeper for distributed coordination * Add FlinkZooKeeperQuorumPeer to wrap ZooKeeper's quorum peers with utilities to write required config values (default datadir, myid) * Add default conf/zoo.cfg config for ZooKeeper * Add startup scripts for ZooKeeper quorum * Add conf/masters file for HA masters - FLINK-2302: Allow multiple instances to run on single host * Multiple TaskManager and JobManager instances can run on a single host.
server.Y=addressY:peerPort:leaderPort | ||
</pre> | ||
|
||
The script `bin/start-zookeeper-quorum.sh` will start a ZooKeeper server on each of the configured hosts. The started processes start ZooKeeper servers via a Flink wrapper, which reads the configuration from `conf/zoo.cfg` and makes sure to set some rqeuired configuration values for convenience. In production setups, it is recommended to manage your own ZooKeeper installation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: rqeuired
-> required
Looks like a good piece of work! Can we actually get into ZooKeeper version conflicts here? For example, the Kafka connector needs a certain Zookeeper version. Can it conflict with our Zookeeper version? |
I will address the comments and merge this... |
BTW: It is beautiful how easy it is to start multiple TaskManagers and JobManagers on one machine now :-) |
We need to find a solution for the webfrontend, though. Starting it on random ports is not a real solution. Also the user that wants to see progress needs to connect manually to the web server of the leader JobManager. |
Did we find a solution for the random port problem? |
In HA mode, JobManagers start with a random free port. That is fine, because no one connects to them based on a config value, but only based on ZooKeeper entries. |
Yes that makes sense. So the user will always have to connect to the web interface of the leading job manager, right? We could only circumvent that by separating the web interface from the job manager. |
The web interface is, modulo some object which are not serializable, On Thu, Jul 9, 2015 at 4:24 PM, Max notifications@github.com wrote:
|
Alright, I've opened a JIRA for this: https://issues.apache.org/jira/browse/FLINK-2340 |
utilities to write required config values (default datadir, myid)
host.
@tillrohrmann, you can base your changes on this branch. After that we can close this PR. I've added TODOs in TaskManager and JobManager, where you need to integrate your leader election/retrieval service.
From the docs:
Example: Start and stop a local HA-cluster with 2 JobManagers
Configure ZooKeeper quorum in
conf/flink.yaml
:Configure masters in
conf/masters
:Configure ZooKeeper server in
conf/zoo.cfg
(currently it's only possible to run a single ZooKeeper server per machine, because there is a single client port per configuration):Start ZooKeeper quorum:
Start an HA-cluster:
Stop ZooKeeper quorum and cluster: