From c624fe34119d11cca7ec29c08e482c1ba0498671 Mon Sep 17 00:00:00 2001 From: "Tzu-Li (Gordon) Tai" Date: Fri, 20 Jan 2017 16:05:45 +0100 Subject: [PATCH] [FLINK-5581] [doc] Improve user accessibility for Kerberos-related documentation --- docs/dev/connectors/kafka.md | 21 +++++ docs/ops/security-kerberos.md | 52 +++--------- docs/setup/config.md | 85 ++++++++++--------- docs/setup/jobmanager_high_availability.md | 14 +++ flink-dist/src/main/resources/flink-conf.yaml | 19 +++-- 5 files changed, 107 insertions(+), 84 deletions(-) diff --git a/docs/dev/connectors/kafka.md b/docs/dev/connectors/kafka.md index cc51071883f64..6a58b7a4260e3 100644 --- a/docs/dev/connectors/kafka.md +++ b/docs/dev/connectors/kafka.md @@ -353,3 +353,24 @@ The offsets committed to ZK or the broker can also be used to track the read pro the committed offset and the most recent offset in each partition is called the *consumer lag*. If the Flink topology is consuming the data slower from the topic than new data is added, the lag will increase and the consumer will fall behind. For large production deployments we recommend monitoring that metric to avoid increasing latency. + +### Enabling Kerberos Authentication (for versions 0.9+ and above only) + +Flink provides first-class support through the Kafka connector to authenticate to a Kafka installation +configured for Kerberos. Simply configure Flink in `flink-conf.yaml` to enable Kerberos authentication for Kafka like so: + +1. Configure Kerberos credentials by setting the following - + - `security.kerberos.login.use-ticket-cache`: By default, this is `true` and Flink will attempt to use Kerberos credentials in ticket caches managed by `kinit`. + Note that when using the Kafka connector in Flink jobs deployed on YARN, Kerberos authorization using ticket caches will not work. This is also the case when deploying using Mesos, as authorization using ticket cache is not supported for Mesos deployments. + - `security.kerberos.login.keytab` and `security.kerberos.login.principal`: To use Kerberos keytabs instead, set values for both of these properties. + +2. Append `KafkaClient` to `security.kerberos.login.contexts`: This tells Flink to provide the configured Kerberos credentials to the Kafka login context to be used for Kafka authentication. + +Once Kerberos-based Flink security is enabled, you can authenticate to Kafka with either the Flink Kafka Consumer or Producer by simply including the following two settings in the provided properties configuration that is passed to the internal Kafka client: + +- Set `security.protocol` to `SASL_PLAINTEXT` (default `NONE`): The protocol used to communicate to Kafka brokers. +When using standalone Flink deployment, you can also use `SASL_SSL`; please see how to configure the Kafka client for SSL [here](https://kafka.apache.org/documentation/#security_configclients). +- Set `sasl.kerberos.service.name` to `kafka` (default `kafka`): The value for this should match the `sasl.kerberos.service.name` used for Kafka broker configurations. A mismatch in service name between client and server configuration will cause the authentication to fail. + +For more information on Flink configuration for Kerberos security, please see [here]({{ site.baseurl}}/setup/config.html). +You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further details on how Flink internally setups Kerberos-based security. diff --git a/docs/ops/security-kerberos.md b/docs/ops/security-kerberos.md index 2afe7601cb885..3e5cad937e8ee 100644 --- a/docs/ops/security-kerberos.md +++ b/docs/ops/security-kerberos.md @@ -28,6 +28,7 @@ filesystems, connectors, and state backends. ## Objective The primary goals of the Flink Kerberos security infrastructure are: + 1. to enable secure data access for jobs within a cluster via connectors (e.g. Kafka) 2. to authenticate to ZooKeeper (if configured to use SASL) 3. to authenticate to Hadoop components (e.g. HDFS, HBase) @@ -36,14 +37,14 @@ In a production deployment scenario, streaming jobs are understood to run for lo data sources throughout the life of the job. Kerberos keytabs do not expire in that timeframe, unlike a Hadoop delegation token or ticket cache entry. -The current implementation supports running Flink clusters (Job Manager/Task Manager/jobs) with either a configured keytab credential +The current implementation supports running Flink clusters (JobManager / TaskManager / jobs) with either a configured keytab credential or with Hadoop delegation tokens. Keep in mind that all jobs share the credential configured for a given cluster. To use a different keytab for for a certain job, simply launch a separate Flink cluster with a different configuration. Numerous Flink clusters may run side-by-side in a YARN or Mesos environment. ## How Flink Security works In concept, a Flink program may use first- or third-party connectors (Kafka, HDFS, Cassandra, Flume, Kinesis etc.) necessitating arbitrary authentication methods (Kerberos, SSL/TLS, username/password, etc.). While satisfying the security requirements for all connectors is an ongoing effort, -Flink provides first-class support for Kerberos authentication only. The following services and connectors are tested for Kerberos authentication: +Flink provides first-class support for Kerberos authentication only. The following services and connectors are supported for Kerberos authentication: - Kafka (0.9+) - HDFS @@ -55,7 +56,7 @@ Hadoop security without necessitating the use of Kerberos for ZooKeeper, or vice Kerbreros credentials, which is then explicitly used by each component. The internal architecture is based on security modules (implementing `org.apache.flink.runtime.security.modules.SecurityModule`) which -are installed at startup. The next section describes each security module. +are installed at startup. The following sections describes each security module. ### Hadoop Security Module This module uses the Hadoop `UserGroupInformation` (UGI) class to establish a process-wide *login user* context. The login user is @@ -75,51 +76,22 @@ dynamic entries provided by this module. This module configures certain process-wide ZooKeeper security-related settings, namely the ZooKeeper service name (default: `zookeeper`) and the JAAS login context name (default: `Client`). -## Security Configuration - -### Flink Configuration -The user's Kerberos ticket cache (managed with `kinit`) is used automatically, based on the following configuration option: - -- `security.kerberos.login.use-ticket-cache`: Indicates whether to read from the user's Kerberos ticket cache (default: `true`). - -A Kerberos keytab can be supplied by adding below configuration elements to the Flink configuration file: - -- `security.kerberos.login.keytab`: Absolute path to a Kerberos keytab file that contains the user credentials. - -- `security.kerberos.login.principal`: Kerberos principal name associated with the keytab. - -These configuration options establish a cluster-wide credential to be used in a Hadoop and/or JAAS context. Whether the credential is used in a Hadoop context is based on the Hadoop configuration (see next section). To be used in a JAAS context, the configuration specifies which JAAS *login contexts* (or *applications*) are enabled with the following configuration option: - -- `security.kerberos.login.contexts`: A comma-separated list of login contexts to provide the Kerberos credentials to (for example, `Client` to use the credentials for ZooKeeper authentication). - -ZooKeeper-related configuration overrides: - -- `zookeeper.sasl.service-name`: The Kerberos service name that the ZooKeeper cluster is configured to use (default: `zookeeper`). Facilitates mutual-authentication between the client (Flink) and server. - -- `zookeeper.sasl.login-context-name`: The JAAS login context name that the ZooKeeper client uses to request the login context (default: `Client`). Should match -one of the values specified in `security.kerberos.login.contexts`. - -### Hadoop Configuration - -The Hadoop configuration is located via the `HADOOP_CONF_DIR` environment variable and by other means (see `org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils`). The Kerberos credential (configured above) is used automatically if Hadoop security is enabled. - -Note that Kerberos credentials found in the ticket cache aren't transferrable to other hosts. In this scenario, the Flink CLI acquires Hadoop -delegation tokens (for HDFS and for HBase). - ## Deployment Modes Here is some information specific to each deployment mode. ### Standalone Mode Steps to run a secure Flink cluster in standalone/cluster mode: -1. Add security-related configuration options to the Flink configuration file (on all cluster nodes). + +1. Add security-related configuration options to the Flink configuration file (on all cluster nodes) (see [here]({{site.baseurl}}/setup/config.html#kerberos-based-security)). 2. Ensure that the keytab file exists at the path indicated by `security.kerberos.login.keytab` on all cluster nodes. 3. Deploy Flink cluster as normal. ### YARN/Mesos Mode Steps to run a secure Flink cluster in YARN/Mesos mode: -1. Add security-related configuration options to the Flink configuration file on the client. + +1. Add security-related configuration options to the Flink configuration file on the client (see [here]({{site.baseurl}}/setup/config.html#kerberos-based-security)). 2. Ensure that the keytab file exists at the path as indicated by `security.kerberos.login.keytab` on the client node. 3. Deploy Flink cluster as normal. @@ -130,15 +102,17 @@ For more information, see
 $ bin/yarn-session.sh -n 2
+## Configuring for Zookeeper Security + +If ZooKeeper is running in secure mode with Kerberos, you can override the following configurations in `flink-conf.yaml` as necessary: + +
+zookeeper.sasl.service-name: zookeeper     # default is "zookeeper". If the ZooKeeper quorum is configured
+                                           # with a different service name then it can be supplied here.
+zookeeper.sasl.login-context-name: Client  # default is "Client". The value needs to match one of the values
+                                           # configured in "security.kerberos.login.contexts".
+
+ +For more information on Flink configuration for Kerberos security, please see [here]({{ site.baseurl}}/setup/config.html). +You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further details on how Flink internally setups Kerberos-based security. + ## Bootstrap ZooKeeper If you don't have a running ZooKeeper installation, you can use the helper scripts, which ship with Flink. diff --git a/flink-dist/src/main/resources/flink-conf.yaml b/flink-dist/src/main/resources/flink-conf.yaml index f759db64bee50..00ea47a93186b 100644 --- a/flink-dist/src/main/resources/flink-conf.yaml +++ b/flink-dist/src/main/resources/flink-conf.yaml @@ -89,7 +89,7 @@ jobmanager.web.port: 8081 # # Supported backends: jobmanager, filesystem, rocksdb, # -#state.backend: filesystem +# state.backend: filesystem # Directory for storing checkpoints in a Flink-supported filesystem @@ -169,11 +169,16 @@ jobmanager.web.port: 8081 # 3. make the credentials available to various JAAS login contexts # 4. configure the connector to use JAAS/SASL -#security.kerberos.login.keytab: /path/to/kerberos/keytab -#security.kerberos.login.principal: flink-user -#security.kerberos.login.use-ticket-cache: true +# The below configure how Kerberos credentials are provided. A keytab will be used instead of +# a ticket cache if the keytab path and principal are set. -#security.kerberos.login.contexts: Client,KafkaClient +# security.kerberos.login.use-ticket-cache: true +# security.kerberos.login.keytab: /path/to/kerberos/keytab +# security.kerberos.login.principal: flink-user + +# The configuration below defines which JAAS login contexts + +# security.kerberos.login.contexts: Client,KafkaClient #============================================================================== # ZK Security Configuration (optional configuration) @@ -182,5 +187,7 @@ jobmanager.web.port: 8081 # Below configurations are applicable if ZK ensemble is configured for security # Override below configuration to provide custom ZK service name if configured -# # zookeeper.sasl.service-name: zookeeper + +# The configuration below must match one of the values set in "security.kerberos.login.contexts" +# zookeeper.sasl.login-context-name: Client