New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-20357][docs] Split HA documentation up into a general overview and the specific implementations #14254
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 9f8f123 (Fri Nov 27 17:55:04 UTC 2020) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tillrohrmann for creating this ticket. The separation of HA documentation makes sense to me. I just left some minor comments and please have a look.
docs/deployment/ha/kubernetes_ha.md
Outdated
<pre>high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory</pre> | ||
|
||
- **Storage directory** (required): | ||
JobManager metadata is persisted in the file system [`high-availability.storageDir`]({% link deployment/config.md %}#high-availability-storagedir) and only a pointer to this state is stored in ZooKeeper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JobManager metadata is persisted in the file system [`high-availability.storageDir`]({% link deployment/config.md %}#high-availability-storagedir) and only a pointer to this state is stored in ZooKeeper. | |
JobManager metadata is persisted in the file system [`high-availability.storageDir`]({% link deployment/config.md %}#high-availability-storagedir) and only a pointer to this state is stored in Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :-)
docs/deployment/ha/kubernetes_ha.md
Outdated
- **Cluster id** (required): | ||
In order to identify the Flink cluster, you have to specify a [`kubernetes.cluster-id`]({% link deployment/config.md %}#kubernetes-cluster-id). | ||
|
||
<pre>kubernetes.cluster-id: Cluster1337</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubernetes.cluster-id
should only contain lower case alphanumeric characters, -
or .
. It is a Kubernetes limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good information. I guess we should add this information to the KubernetesConfigOptions.CLUSTER_ID
.
docs/deployment/ha/kubernetes_ha.md
Outdated
@@ -2,7 +2,7 @@ | |||
title: "Kubernetes HA Services" | |||
nav-title: Kubernetes HA Services | |||
nav-parent_id: ha | |||
nav-pos: 2 | |||
nav-pos: 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am not sure about why we have the nav-pos
6 for kubernetes_ha.md
and 5 for zookeeper_ha.md
. Does 2
and 1
also make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make a difference since the relative positioning wrt zookeeper_ha.md
is important. I'll update it in order to avoid future confusion, though.
|
||
### How to configure Kubernetes HA Services | ||
|
||
Both session and job/application clusters support using the Kubernetes high availability service. Users just need to add the following Flink config options to [flink-configuration-configmap.yaml]({% link deployment/resource-providers/standalone/kubernetes.md %}#common-cluster-resource-definitions). All other yamls do not need to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both session and job/application clusters support using the Kubernetes high availability service. Users just need to add the following Flink config options to [flink-configuration-configmap.yaml]({% link deployment/resource-providers/standalone/kubernetes.md %}#common-cluster-resource-definitions). All other yamls do not need to be updated. | |
Both session and job/application clusters support using the Kubernetes high availability service. Users just need to add the following Flink config options to [flink-configuration-configmap.yaml](#common-cluster-resource-definitions). All other yamls do not need to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for splitting the HA configuration pages and reworking them!
I had some minor cleanups and questions.
docs/deployment/ha/zookeeper_ha.md
Outdated
|
||
3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only possible to run a single ZooKeeper server per machine): | ||
For more information on Flink configuration for Kerberos security, please see [here]({% link deployment/config.md %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more information on Flink configuration for Kerberos security, please see [here]({% link deployment/config.md %}). | |
For more information on Flink configuration for Kerberos security, please refer to the [security section of the Flink configuration page]({% link deployment/config.md %}#security). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will update it.
docs/deployment/ha/zookeeper_ha.md
Outdated
|
||
3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only possible to run a single ZooKeeper server per machine): | ||
For more information on Flink configuration for Kerberos security, please see [here]({% link deployment/config.md %}). | ||
You can also find [here]({% link deployment/security/security-kerberos.md %}) further details on how Flink internally setups Kerberos-based security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also find [here]({% link deployment/security/security-kerberos.md %}) further details on how Flink internally setups Kerberos-based security. | |
You can also find further details on [how Flink sets up Kerberos-based security internally]({% link deployment/security/security-kerberos.md %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update it.
@@ -23,77 +23,50 @@ specific language governing permissions and limitations | |||
under the License. | |||
--> | |||
|
|||
## Kubernetes Cluster High Availability | |||
Kubernetes high availability service could support both [standalone Flink on Kubernetes]({% link deployment/resource-providers/standalone/kubernetes.md %}) and [native Kubernetes integration]({% link deployment/resource-providers/native_kubernetes.md %}). | |||
Flink's Kubernetes HA services use [Kubernetes](https://kubernetes.io/) for high availability services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any restrictions on the K8s versions supported? Or on the required K8s features? (I guess the answer is that we only need ConfigMaps?)
I'm asking, so that users can evaluate if it also works with implementations such as https://k3s.io/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need @wangyang0918 to answer this question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConfigMap and resource version is supported at the very begging(much lower that 1.9). I believe that very few users are still using the K8s with such version lower than 1.9 since the latest stable version is now 1.19.
BTW, the native K8s integration require Kubernetes 1.9 or above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to answer whether https://k3s.io/ works, one needs to try it out. I wouldn't block the PR on this, though. Thanks for the answer @wangyang0918.
docs/deployment/ha/zookeeper_ha.md
Outdated
- **high-availability mode** (required): The *high-availability mode* has to be set in `conf/flink-conf.yaml` to *zookeeper* in order to enable high availability mode. | ||
Alternatively this option can be set to FQN of factory class Flink should use to create HighAvailabilityServices instance. | ||
- **high-availability mode** (required): | ||
The `high-availability` option has to be set to *zookeeper*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `high-availability` option has to be set to *zookeeper*. | |
The `high-availability` option has to be set to `zookeeper`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
docs/deployment/ha/zookeeper_ha.md
Outdated
you have to manually configure separate cluster-ids for each cluster. | ||
**Important**: | ||
You should not set this value manually when running on YARN, native Kubernetes or on another cluster manager. | ||
In those cases a cluster-id is automatically being generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In those cases a cluster-id is automatically being generated. | |
In those cases a cluster-id is being automatically generated. |
I'm not sure about this one.
"automatically being generated" has 6k google hits, "being automatically generated" 48k ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being automatically generated sounds better to me.
|
||
|
||
<pre>high-availability.zookeeper.quorum: address1:2181[,...],addressX:2181</pre> | ||
|
||
Each *addressX:port* refers to a ZooKeeper server, which is reachable by Flink at the given address and port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each *addressX:port* refers to a ZooKeeper server, which is reachable by Flink at the given address and port. | |
Each `addressX:port` refers to a ZooKeeper server, which is reachable by Flink at the given address and port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
9f8f123
to
db5f475
Compare
Thanks for your review @rmetzger and @wangyang0918. I've addressed your comments and resolved the merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring the documentation. 👍 Only minor things popped up during my review.
|
||
### Example: Standalone Cluster with 2 JobManagers | ||
|
||
1. **Configure high availability mode and ZooKeeper quorum** in `conf/flink-conf.yaml`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. **Configure high availability mode and ZooKeeper quorum** in `conf/flink-conf.yaml`: | |
1. **Configure high availability mode and ZooKeeper quorum** in `${FLINK_HOME}/conf/flink-conf.yaml`: |
What about the proposal of adding a ${*_HOME}
variable to paths to have a clearer pointer to the actual file/script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good point to discuss for the general documentation overhaul. If we decide to do it, then these things need to be updated.
<pre> | ||
high-availability: zookeeper | ||
high-availability.zookeeper.quorum: localhost:2181 | ||
high-availability.zookeeper.path.root: /flink | ||
high-availability.cluster-id: /cluster_one # important: customize per cluster | ||
high-availability.storageDir: hdfs:///flink/recovery</pre> | ||
|
||
2. **Configure masters** in `conf/masters`: | ||
|
||
<pre> | ||
localhost:8081 | ||
localhost:8082</pre> | ||
|
||
3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only possible to run a single ZooKeeper server per machine): | ||
|
||
<pre>server.0=localhost:2888:3888</pre> | ||
|
||
4. **Start ZooKeeper quorum**: | ||
|
||
<pre> | ||
$ bin/start-zookeeper-quorum.sh | ||
Starting zookeeper daemon on host localhost.</pre> | ||
|
||
5. **Start an HA-cluster**: | ||
|
||
<pre> | ||
$ bin/start-cluster.sh | ||
Starting HA cluster with 2 masters and 1 peers in ZooKeeper quorum. | ||
Starting standalonesession daemon on host localhost. | ||
Starting standalonesession daemon on host localhost. | ||
Starting taskexecutor daemon on host localhost.</pre> | ||
|
||
6. **Stop ZooKeeper quorum and cluster**: | ||
|
||
<pre> | ||
$ bin/stop-cluster.sh | ||
Stopping taskexecutor daemon (pid: 7647) on localhost. | ||
Stopping standalonesession daemon (pid: 7495) on host localhost. | ||
Stopping standalonesession daemon (pid: 7349) on host localhost. | ||
$ bin/stop-zookeeper-quorum.sh | ||
Stopping zookeeper daemon (pid: 7101) on host localhost.</pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also
{% highlight bash %}
# ...
{% endhighlight %}
which might be the more appropriate syntax to create code blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I won't change it though because I only added it to not lose this information. It should be properly reformatted once the standalone resource provider documentation is written.
… and the specific implementations This commit splits the HA documentation up into a general overview and the specific implementations: * ZooKeeper HA services * Kubernetes HA services Moreover, this commit moves resource-provider specific documentation to the respective resource-provider documentation.
…ster-id Only lowercase alphanumeric characters, "-" or "." are allowed.
db5f475
to
7b14840
Compare
Thanks for the review @XComp. I've addressed most of your comments. |
Thanks for the review @wangyang0918, @XComp and @rmetzger. Merging this PR now. |
… and the specific implementations This commit splits the HA documentation up into a general overview and the specific implementations: * ZooKeeper HA services * Kubernetes HA services Moreover, this commit moves resource-provider specific documentation to the respective resource-provider documentation. This closes #14254.
This commit splits the HA documentation up into a general overview and the specific implementations:
Moreover, this commit moves resource-provider specific documentation to the respective resource-provider
documentation. This is done in order to not lose this information and it should be properly incorporated once the resource-provider documentation is updated.
cc @rmetzger, @XComp, @wangyang0918