diff --git a/site/content/in-dev/unreleased/federation/_index.md b/site/content/in-dev/unreleased/federation/_index.md new file mode 100644 index 0000000000..e4fbe261a0 --- /dev/null +++ b/site/content/in-dev/unreleased/federation/_index.md @@ -0,0 +1,26 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Federation +type: docs +weight: 703 +--- + +Guides for federating Polaris with existing metadata services. Expand this section to select a +specific integration. diff --git a/site/content/in-dev/unreleased/federation/hive-metastore-federation.md b/site/content/in-dev/unreleased/federation/hive-metastore-federation.md new file mode 100644 index 0000000000..0d39a5e4a0 --- /dev/null +++ b/site/content/in-dev/unreleased/federation/hive-metastore-federation.md @@ -0,0 +1,125 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Hive Metastore Federation +type: docs +weight: 705 +--- + +Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external +HMS remain the source of truth for table metadata while Polaris brokers access, policies, and +multi-engine connectivity. + +## Build-time enablement + +The Hive factory is packaged as an optional extension and is not baked into default server builds. +Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle +property to include `HIVE` (and any other non-REST backends you need): + +```bash +./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \ + -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true +``` + +`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries +built without it will reject Hive federation requests. + +## Runtime requirements + +- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris + deployment. +- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the + classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via + `HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer. +- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses + the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the + service principal is logged in or holds a valid keytab/TGT before starting Polaris. +- **Object storage role:** Configure `polaris.service-identity..aws-iam.*` (or the default + realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow + STS access from the Polaris service identity and grant permissions to the table locations. + +### Kerberos setup example + +If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris: + +```bash +export KRB5_CONFIG=/etc/polaris/krb5.conf +export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal +export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf" +kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM +``` + +- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and + client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`). +- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and + point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. +- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes + the TGT at startup and for periodic renewal. + +## Creating a federated catalog + +Use the Management API (or the Python CLI) to create an external catalog whose connection type is +`HIVE`. The following request registers a catalog that proxies to an HMS running on +`thrift://hms.example.internal:9083`: + +```bash +curl -X POST https:///management/v1/catalogs \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "type": "EXTERNAL", + "name": "analytics_hms", + "storageConfigInfo": { + "storageType": "S3", + "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access", + "region": "us-east-1" + }, + "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" }, + "connectionConfigInfo": { + "connectionType": "HIVE", + "uri": "thrift://hms.example.internal:9083", + "warehouse": "s3://analytics-bucket/warehouse/", + "authenticationParameters": { "authenticationType": "IMPLICIT" } + } + }' +``` + +Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can +obtain tokens that authorize against the federated metadata. + +`default-base-location` is required; it tells Polaris and Iceberg where to place new metadata files. +`allowedLocations` is optional—supply it only when you want to restrict writers to a specific set of +prefixes. If your IAM trust policy requires an `externalId` or explicit `userArn`, include those +optional fields in `storageConfigInfo`. Polaris persists them and supplies them when assuming the +role cited by `roleArn` during metadata commits. + +## Limitations and operational notes + +- **Single identity:** Because only `IMPLICIT` authentication is permitted, Polaris cannot mix + multiple Hive identities in a single deployment (`HiveFederatedCatalogFactory` rejects other auth + types). Plan a deployment topology that aligns the Polaris process identity with the target HMS. +- **Generic tables:** The Hive extension exposes Iceberg tables registered in HMS. Generic table + federation is not implemented (`HiveFederatedCatalogFactory#createGenericCatalog` throws + `UnsupportedOperationException`). +- **Configuration caching:** Atlas-style catalog failover and multi-HMS routing are not yet handled; + Polaris initializes one `HiveCatalog` per connection and relies on the underlying Iceberg client + for retries. + +With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed +there gain OAuth-protected, multi-engine access through the Polaris REST APIs. diff --git a/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md b/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md new file mode 100644 index 0000000000..8318f45095 --- /dev/null +++ b/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md @@ -0,0 +1,71 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Iceberg REST Federation +type: docs +weight: 704 +--- + +Polaris can federate an external Iceberg REST catalog (e.g., another Polaris deployment, AWS Glue, or a custom Iceberg +REST implementation), enabling a Polaris service to access table and view entities managed by remote Iceberg REST Catalogs. + +## Runtime requirements + +- **REST endpoint:** The remote service must expose the Iceberg REST specification. Configure + firewalls so Polaris can reach the base URI you provide in the connection config. +- **Authentication:** Polaris forwards requests using the credentials defined in + `ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials, bearer tokens, and AWS + SigV4 are supported; choose the scheme the remote service expects. + +## Creating a federated REST catalog + +The snippet below registers an external catalog that forwards to a remote Polaris server using OAuth2 +client credentials. `iceberg-remote-catalog-name` is optional; supply it when the remote server multiplexes +multiple logical catalogs under one URI. + +```bash +polaris catalogs create \ + --type EXTERNAL \ + --storage-type s3 \ + --role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \ + --default-base-location "s3://analytics-bucket/warehouse/" \ + --catalog-connection-type iceberg-rest \ + --iceberg-remote-catalog-name analytics \ + --catalog-uri "https://remote-polaris.example.com/catalog/v1" \ + --catalog-authentication-type OAUTH \ + --catalog-token-uri "https://remote-polaris.example.com/catalog/v1/oauth/tokens" \ + --catalog-client-id "" \ + --catalog-client-secret "" \ + --catalog-client-scopes "PRINCIPAL_ROLE:ALL" \ + analytics_rest +``` + +Refer to the [CLI documentation](../command-line-interface.md#catalogs) for details on alternative authentication types such as BEARER or SIGV4. + +Grant catalog roles to principal roles the same way you do for internal catalogs so compute engines +receive tokens with access to the federated namespace. + +## Operational notes + +- **Connectivity checks:** Polaris does not lazily probe the remote service; catalog creation fails if + the REST endpoint is unreachable or authentication is rejected. +- **Feature parity:** Federation exposes whatever table/namespace operations the remote service + implements. Unsupported features return the remote error directly to callers. +- **Generic tables:** The REST federation path currently surfaces Iceberg tables only; generic table + federation is not implemented.