From 9ce9e5895f205aa6148cc58fefcd8905b6609f46 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang Date: Fri, 15 May 2026 08:57:58 -0700 Subject: [PATCH 1/2] Document HttpFS behind a load balancer with Kerberos. Add a Performance guide for LB hostname SPNEGO principals and keytabs, with links from the Kerberos and HttpFS user pages (docs and versioned). Co-authored-by: Cursor --- .../01-client-interfaces/05-httpfs.md | 4 + .../03-security/02-kerberos.md | 4 + .../04-performance/10-httpfs-load-balancer.md | 99 +++++++++++++++++++ .../01-client-interfaces/05-httpfs.md | 4 + .../03-security/02-kerberos.md | 4 + .../04-performance/10-httpfs-load-balancer.md | 99 +++++++++++++++++++ 6 files changed, 214 insertions(+) create mode 100644 docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md create mode 100644 versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md diff --git a/docs/04-user-guide/01-client-interfaces/05-httpfs.md b/docs/04-user-guide/01-client-interfaces/05-httpfs.md index 915c348095..e12c19f2d7 100644 --- a/docs/04-user-guide/01-client-interfaces/05-httpfs.md +++ b/docs/04-user-guide/01-client-interfaces/05-httpfs.md @@ -18,6 +18,10 @@ HttpFS can be used to access data in Ozone using HTTP utilities (such as curl an The **WebHDFS** client FileSystem implementation can be used to access HttpFS using the Ozone filesystem command line tool (`ozone fs`) as well as from Java applications using the Hadoop FileSystem Java API. +:::note +If HttpFS is fronted by a **load balancer** and you use Kerberos (SPNEGO), clients obtain tickets for the load balancer’s hostname. Configure the HttpFS HTTP principal and keytab accordingly; see [HttpFS behind a load balancer (Kerberos)](../../administrator-guide/configuration/performance/httpfs-load-balancer). +::: + ## Getting started To try it out, follow the [instructions](../../quick-start/installation/docker) to start the Ozone cluster with Docker Compose. diff --git a/docs/05-administrator-guide/02-configuration/03-security/02-kerberos.md b/docs/05-administrator-guide/02-configuration/03-security/02-kerberos.md index 17a533756f..76a588cef7 100644 --- a/docs/05-administrator-guide/02-configuration/03-security/02-kerberos.md +++ b/docs/05-administrator-guide/02-configuration/03-security/02-kerberos.md @@ -63,6 +63,10 @@ The HttpFS gateway offers an HDFS-compatible REST API (`webhdfs`). It requires K | `httpfs.hadoop.authentication.kerberos.principal` | The Kerberos principal used by HttpFS to connect to the HDFS NameNode (Ozone Manager). e.g., `${user.name}/${httpfs.hostname}@${kerberos.realm}`. | | `httpfs.hadoop.authentication.kerberos.keytab` | The Kerberos keytab file for the principal used to connect to the HDFS NameNode (Ozone Manager). e.g., `${user.home}/httpfs.keytab`. | +:::note +For HttpFS placed behind a **load balancer** with Kerberos (SPNEGO), the HTTP principal must match the **load balancer hostname** that clients use, not only the backend HttpFS host. See [HttpFS behind a load balancer (Kerberos)](../performance/httpfs-load-balancer). +::: + ## Recon Server Recon provides monitoring and management capabilities and can be secured using Kerberos authentication for its web UI and REST endpoints. diff --git a/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md b/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md new file mode 100644 index 0000000000..2eccaa0b21 --- /dev/null +++ b/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md @@ -0,0 +1,99 @@ +--- +sidebar_label: HttpFS load balancer (Kerberos) +--- + +# HttpFS Gateway behind a load balancer (Kerberos) + +Clients that use HTTP SPNEGO (negotiate authentication) obtain a Kerberos service ticket for the **hostname they connect to**. If HttpFS is reachable only through a load balancer, that hostname is the load balancer’s DNS name—not the individual HttpFS gateway hosts. Requests then fail with **401 Authentication required** unless each HttpFS instance is configured to use the **same HTTP service principal** as the one clients expect for the load balancer. + +Direct access to a single HttpFS host often works because the default principal matches that host (for example `HTTP/@REALM`). The same WebHDFS calls through the load balancer fail until the client-facing principal and keytab are aligned with the load balancer name. + +:::caution +Running HttpFS behind a load balancer with Kerberos is a valid pattern (similar to HDFS HttpFS), but **your organization should validate** the full stack—clients, TLS, load balancer forwarding, and delegation tokens—in its own environment. Treat this page as operational guidance, not a product certification matrix. +::: + +## Prerequisites + +- A stable DNS name for the load balancer that clients use in URLs: ``. +- Cluster Kerberos realm: `` (for example `EXAMPLE.COM`). +- The OS user that runs the HttpFS process (often `hdfs`) must be able to read the keytab file you deploy. + +## Kerberos principal and keytab + +1. Create (or use) a service principal in the standard HTTP SPNEGO form: + + `HTTP/@` + +2. Export a keytab that contains this principal and distribute it to **each** host that runs the Ozone HttpFS Gateway. Use the **same filesystem path** on every host so one configuration snippet applies everywhere. Example path (your environment may differ): + + `/var/lib/hadoop-ozone/keytabs/httpfs.keytab` + +3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your runbook). + +## Configuration (`httpfs-site.xml`) + +Set the **client-facing** HTTP Kerberos principal and keytab to the load balancer principal and the deployed keytab. Prefer the canonical Hadoop property names: + +| Property | Value | +| -------- | ----- | +| `hadoop.http.authentication.kerberos.principal` | `HTTP/@` | +| `hadoop.http.authentication.kerberos.keytab` | Path to the keytab on the host (for example `/var/lib/hadoop-ozone/keytabs/httpfs.keytab`) | + +Example fragment: + +```xml + + hadoop.http.authentication.kerberos.principal + HTTP/@ + + + hadoop.http.authentication.kerberos.keytab + /var/lib/hadoop-ozone/keytabs/httpfs.keytab + +``` + +The older names `httpfs.authentication.kerberos.principal` and `httpfs.authentication.kerberos.keytab` are **deprecated** aliases for the same settings; prefer `hadoop.http.authentication.kerberos.*` for new configuration. See the [configuration appendix](../appendix) for reference. + +Apply these settings in `httpfs-site.xml` on **every** HttpFS Gateway host, deploy the keytab at the same path on each host, then restart HttpFS. + +The **internal** HttpFS-to-Ozone Manager authentication (`httpfs.hadoop.authentication.*`) is separate; this page only changes what clients use when they talk to the **HTTP** endpoint in front of the load balancer. + +## Multiple HttpFS instances behind one load balancer + +If more than one HttpFS gateway sits behind the same VIP, they should behave consistently for HTTP authentication cookies. Configure a **shared** signature secret so `hadoop-auth` cookies validate on any instance. See `hadoop.http.authentication.signature.secret.file` in the [configuration appendix](../appendix). + +## TLS and clients + +- Prefer **HTTPS** between clients and the load balancer when using Kerberos/SPNEGO. Some clients (including `curl`) do not follow authentication handshakes correctly when redirects move from HTTPS to HTTP, which can break SPNEGO. +- Terminating TLS on the load balancer and using HTTP or HTTPS to the HttpFS backends is common; ensure your load balancer forwards headers and protocols in a way compatible with your HttpFS and TLS settings. + +## Common mistakes + +- **Wrong principal**: Principal still references the backend host while clients use the load balancer DNS name. +- **Keytab not readable** by the user running HttpFS (for example permissions or SELinux contexts). +- **Realm mapping**: The load balancer hostname must resolve to the correct realm in **`krb5.conf`** (for example `[domain_realm]`). If the client cannot obtain a service ticket for `HTTP/@`, enable trace logging (below) and verify KDC and DNS SRV records if you use them. +- **Key version mismatch** after re-exporting keytabs without coordinating `kvno` across hosts. + +## Debugging + +- Run a client with `export KRB5_TRACE=/dev/stdout` to trace ticket requests and failures. +- Compare behavior **directly** to an HttpFS backend versus **through** the load balancer to isolate hostname and principal mismatches. +- Check HttpFS and load balancer access logs for repeated 401 responses and failed `Negotiate` exchanges. + +## How SPNEGO sees the load balancer + +```mermaid +sequenceDiagram + participant Client + participant LB as LoadBalancer + participant HttpFS as HttpFS_Gateway + Client->>LB: HTTPS WebHDFS SPNEGO + Note over Client: Service ticket for HTTP_lbHost_REALM + LB->>HttpFS: Forward request + HttpFS->>Client: SPNEGO completes using LB principal in keytab +``` + +## See also + +- [Configuring Kerberos](../security/kerberos#httpfs-gateway) — HttpFS-related Kerberos properties overview +- [HttpFS Gateway](../../../user-guide/client-interfaces/httpfs) — REST API introduction and examples diff --git a/versioned_docs/version-2.1.0/04-user-guide/01-client-interfaces/05-httpfs.md b/versioned_docs/version-2.1.0/04-user-guide/01-client-interfaces/05-httpfs.md index 915c348095..e12c19f2d7 100644 --- a/versioned_docs/version-2.1.0/04-user-guide/01-client-interfaces/05-httpfs.md +++ b/versioned_docs/version-2.1.0/04-user-guide/01-client-interfaces/05-httpfs.md @@ -18,6 +18,10 @@ HttpFS can be used to access data in Ozone using HTTP utilities (such as curl an The **WebHDFS** client FileSystem implementation can be used to access HttpFS using the Ozone filesystem command line tool (`ozone fs`) as well as from Java applications using the Hadoop FileSystem Java API. +:::note +If HttpFS is fronted by a **load balancer** and you use Kerberos (SPNEGO), clients obtain tickets for the load balancer’s hostname. Configure the HttpFS HTTP principal and keytab accordingly; see [HttpFS behind a load balancer (Kerberos)](../../administrator-guide/configuration/performance/httpfs-load-balancer). +::: + ## Getting started To try it out, follow the [instructions](../../quick-start/installation/docker) to start the Ozone cluster with Docker Compose. diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/03-security/02-kerberos.md b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/03-security/02-kerberos.md index 553b5096f6..38a40ee9f9 100644 --- a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/03-security/02-kerberos.md +++ b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/03-security/02-kerberos.md @@ -63,6 +63,10 @@ The HttpFS gateway offers an HDFS-compatible REST API (`webhdfs`). It requires K | `httpfs.hadoop.authentication.kerberos.principal` | The Kerberos principal used by HttpFS to connect to the HDFS NameNode (Ozone Manager). e.g., `${user.name}/${httpfs.hostname}@${kerberos.realm}`. | | `httpfs.hadoop.authentication.kerberos.keytab` | The Kerberos keytab file for the principal used to connect to the HDFS NameNode (Ozone Manager). e.g., `${user.home}/httpfs.keytab`. | +:::note +For HttpFS placed behind a **load balancer** with Kerberos (SPNEGO), the HTTP principal must match the **load balancer hostname** that clients use, not only the backend HttpFS host. See [HttpFS behind a load balancer (Kerberos)](../performance/httpfs-load-balancer). +::: + ## Recon Server Recon provides monitoring and management capabilities and can be secured using Kerberos authentication for its web UI and REST endpoints. diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md new file mode 100644 index 0000000000..2eccaa0b21 --- /dev/null +++ b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md @@ -0,0 +1,99 @@ +--- +sidebar_label: HttpFS load balancer (Kerberos) +--- + +# HttpFS Gateway behind a load balancer (Kerberos) + +Clients that use HTTP SPNEGO (negotiate authentication) obtain a Kerberos service ticket for the **hostname they connect to**. If HttpFS is reachable only through a load balancer, that hostname is the load balancer’s DNS name—not the individual HttpFS gateway hosts. Requests then fail with **401 Authentication required** unless each HttpFS instance is configured to use the **same HTTP service principal** as the one clients expect for the load balancer. + +Direct access to a single HttpFS host often works because the default principal matches that host (for example `HTTP/@REALM`). The same WebHDFS calls through the load balancer fail until the client-facing principal and keytab are aligned with the load balancer name. + +:::caution +Running HttpFS behind a load balancer with Kerberos is a valid pattern (similar to HDFS HttpFS), but **your organization should validate** the full stack—clients, TLS, load balancer forwarding, and delegation tokens—in its own environment. Treat this page as operational guidance, not a product certification matrix. +::: + +## Prerequisites + +- A stable DNS name for the load balancer that clients use in URLs: ``. +- Cluster Kerberos realm: `` (for example `EXAMPLE.COM`). +- The OS user that runs the HttpFS process (often `hdfs`) must be able to read the keytab file you deploy. + +## Kerberos principal and keytab + +1. Create (or use) a service principal in the standard HTTP SPNEGO form: + + `HTTP/@` + +2. Export a keytab that contains this principal and distribute it to **each** host that runs the Ozone HttpFS Gateway. Use the **same filesystem path** on every host so one configuration snippet applies everywhere. Example path (your environment may differ): + + `/var/lib/hadoop-ozone/keytabs/httpfs.keytab` + +3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your runbook). + +## Configuration (`httpfs-site.xml`) + +Set the **client-facing** HTTP Kerberos principal and keytab to the load balancer principal and the deployed keytab. Prefer the canonical Hadoop property names: + +| Property | Value | +| -------- | ----- | +| `hadoop.http.authentication.kerberos.principal` | `HTTP/@` | +| `hadoop.http.authentication.kerberos.keytab` | Path to the keytab on the host (for example `/var/lib/hadoop-ozone/keytabs/httpfs.keytab`) | + +Example fragment: + +```xml + + hadoop.http.authentication.kerberos.principal + HTTP/@ + + + hadoop.http.authentication.kerberos.keytab + /var/lib/hadoop-ozone/keytabs/httpfs.keytab + +``` + +The older names `httpfs.authentication.kerberos.principal` and `httpfs.authentication.kerberos.keytab` are **deprecated** aliases for the same settings; prefer `hadoop.http.authentication.kerberos.*` for new configuration. See the [configuration appendix](../appendix) for reference. + +Apply these settings in `httpfs-site.xml` on **every** HttpFS Gateway host, deploy the keytab at the same path on each host, then restart HttpFS. + +The **internal** HttpFS-to-Ozone Manager authentication (`httpfs.hadoop.authentication.*`) is separate; this page only changes what clients use when they talk to the **HTTP** endpoint in front of the load balancer. + +## Multiple HttpFS instances behind one load balancer + +If more than one HttpFS gateway sits behind the same VIP, they should behave consistently for HTTP authentication cookies. Configure a **shared** signature secret so `hadoop-auth` cookies validate on any instance. See `hadoop.http.authentication.signature.secret.file` in the [configuration appendix](../appendix). + +## TLS and clients + +- Prefer **HTTPS** between clients and the load balancer when using Kerberos/SPNEGO. Some clients (including `curl`) do not follow authentication handshakes correctly when redirects move from HTTPS to HTTP, which can break SPNEGO. +- Terminating TLS on the load balancer and using HTTP or HTTPS to the HttpFS backends is common; ensure your load balancer forwards headers and protocols in a way compatible with your HttpFS and TLS settings. + +## Common mistakes + +- **Wrong principal**: Principal still references the backend host while clients use the load balancer DNS name. +- **Keytab not readable** by the user running HttpFS (for example permissions or SELinux contexts). +- **Realm mapping**: The load balancer hostname must resolve to the correct realm in **`krb5.conf`** (for example `[domain_realm]`). If the client cannot obtain a service ticket for `HTTP/@`, enable trace logging (below) and verify KDC and DNS SRV records if you use them. +- **Key version mismatch** after re-exporting keytabs without coordinating `kvno` across hosts. + +## Debugging + +- Run a client with `export KRB5_TRACE=/dev/stdout` to trace ticket requests and failures. +- Compare behavior **directly** to an HttpFS backend versus **through** the load balancer to isolate hostname and principal mismatches. +- Check HttpFS and load balancer access logs for repeated 401 responses and failed `Negotiate` exchanges. + +## How SPNEGO sees the load balancer + +```mermaid +sequenceDiagram + participant Client + participant LB as LoadBalancer + participant HttpFS as HttpFS_Gateway + Client->>LB: HTTPS WebHDFS SPNEGO + Note over Client: Service ticket for HTTP_lbHost_REALM + LB->>HttpFS: Forward request + HttpFS->>Client: SPNEGO completes using LB principal in keytab +``` + +## See also + +- [Configuring Kerberos](../security/kerberos#httpfs-gateway) — HttpFS-related Kerberos properties overview +- [HttpFS Gateway](../../../user-guide/client-interfaces/httpfs) — REST API introduction and examples From 41493ed732eb0273e651141ea709b302566cae41 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang Date: Fri, 15 May 2026 09:09:37 -0700 Subject: [PATCH 2/2] Reword HttpFS LB doc for cspell (avoid "runbook"). Co-authored-by: Cursor --- .../02-configuration/04-performance/10-httpfs-load-balancer.md | 2 +- .../02-configuration/04-performance/10-httpfs-load-balancer.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md b/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md index 2eccaa0b21..bc32356d8e 100644 --- a/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md +++ b/docs/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md @@ -28,7 +28,7 @@ Running HttpFS behind a load balancer with Kerberos is a valid pattern (similar `/var/lib/hadoop-ozone/keytabs/httpfs.keytab` -3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your runbook). +3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your operational procedures). ## Configuration (`httpfs-site.xml`) diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md index 2eccaa0b21..bc32356d8e 100644 --- a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md +++ b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/10-httpfs-load-balancer.md @@ -28,7 +28,7 @@ Running HttpFS behind a load balancer with Kerberos is a valid pattern (similar `/var/lib/hadoop-ozone/keytabs/httpfs.keytab` -3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your runbook). +3. When exporting the keytab, avoid creating a new random key version unexpectedly (for example on MIT Kerberos use `kadmin`’s `ktadd` with **`-norandkey`** when adding an existing key to the keytab so the KDC key version stays consistent with what clients and other hosts expect, per your operational procedures). ## Configuration (`httpfs-site.xml`)