Skip to content

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Jan 22, 2026

Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7

This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint.

This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition.

OpenAPI Specification changes:

  • Add /v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider} endpoint to the main REST catalog OpenAPI spec
  • Define RemoteSignRequest, RemoteSignResult and RemoteSignResponse schemas
  • Remove the separate s3-signer-open-api.yaml from the AWS module
  • Update the Python client

Core Module changes (iceberg-core):

  • Add RemoteSignRequest and RemoteSignResponse model classes, copied from the iceberg-aws module
  • Add RemoteSignRequestParser and RemoteSignResponseParser for JSON serialization, copied from the iceberg-aws module
  • Add SIGNER_URI and SIGNER_ENDPOINT properties to CatalogProperties for configuring the signing endpoint
  • Add V1_TABLE_REMOTE_SIGN field and remoteSign() method to ResourcePaths
  • Register the new endpoint in Endpoint.java
  • Add abstract RemoteSignerServlet base class for remote signing tests, copied from the iceberg-aws module

AWS Module changes (iceberg-aws):

  • Deprecate S3SignRequest and S3SignResponse for removal
  • Deprecate S3SignRequestParser and S3SignResponseParser for removal
  • Deprecate S3ObjectMapper for removal
  • Refactor S3SignerServlet to extend RemoteSignerServlet
  • Update S3V4RestSignerClient
  • Move relevant tests to iceberg-core

Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7

This commit promotes the S3 remote signing endpoint from an AWS-specific
implementation to a first-class REST catalog API endpoint.

This enables other storage providers (GCS, Azure, etc.) to eventually reuse
the same signing endpoint pattern without duplicating the API definition.

OpenAPI Specification changes:

- Add `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}`
  endpoint to the main REST catalog OpenAPI spec
- Define `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas
- Remove the separate `s3-signer-open-api.yaml` from the AWS module
- Update the Python client

Core Module changes (iceberg-core):

- Add `RemoteSignRequest` and `RemoteSignResponse` model classes, copied from
  the iceberg-aws module
- Add `RemoteSignRequestParser` and `RemoteSignResponseParser` for JSON
  serialization, copied from the iceberg-aws module
- Add `SIGNER_URI` and `SIGNER_ENDPOINT` properties to `CatalogProperties`
  for configuring the signing endpoint
- Add `V1_TABLE_REMOTE_SIGN` field and `remoteSign()` method to
  `ResourcePaths`
- Register the new endpoint in `Endpoint.java`
- Add abstract `RemoteSignerServlet` base class for remote signing tests, copied
  from the iceberg-aws module

AWS Module changes (iceberg-aws):

- Deprecate `S3SignRequest` and `S3SignResponse` for removal
- Deprecate `S3SignRequestParser` and `S3SignResponseParser` for removal
- Deprecate `S3ObjectMapper` for removal
- Refactor `S3SignerServlet` to extend `RemoteSignerServlet`
- Update `S3V4RestSignerClient`
@adutra adutra force-pushed the promote-sign-endpoint branch from ad95a85 to f3fc095 Compare January 22, 2026 16:00
$ref: '#/components/responses/AuthenticationTimeoutResponse'
503:
$ref: '#/components/responses/ServiceUnavailableResponse'
5XX:
Copy link
Contributor

@dimas-b dimas-b Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering: is it valid in Open API to use placeholders like 5xx here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the use of 5XX as a status code in OpenAPI specifications is correct and valid:

https://spec.openapis.org/oas/v3.0.3#x4-7-16-2-patterned-fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx - TIL

schema:
type: string
enum:
- s3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this "lock" generated clients to only allow operating on s3 until the spec is changed? The other parts of this spec do not appear to be bound to S3... I wonder if we could relax this enum to be a free-form string with possible values defined in a way that does not require spec changes to adopt on the client and server sides. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I hesitated as well. I am OK with a free-form string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to free-form.

5XX:
$ref: '#/components/responses/ServerErrorResponse'

/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{provide} why do we need that ? a table would ideally be in one object store ? if there are multiple thats fine too, i believe we give absolute path of the uri right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this, because if/when a catalog server eventually has remote signing available for more than one object storage provider (say, S3 and Azure), it would be good if the server could determine how exactly to sign the request. Without this path parameter, the server would need to apply some heuristics to determine the right object store provider, and hence how to sign the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server would need to apply some heuristics to determine the right object store provider

didn't get this part, we give the path we want to be signed from client to server as part of payload of this request right ? can't we extract that from there (Are you concerned with s3 / s3a / s3n semantics ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that easy.

As an example, a request to sign looks like the one below for S3:

PUT /warehouse/db/sales_table/data/date=2024-05/00022-44-55.parquet HTTP/1.1
Host: my-datalake.s3.us-east-1.amazonaws.com
Date: Fri, 24 May 2024 12:45:00 GMT
Content-Length: 134217728
Content-Type: application/octet-stream

A similar request to GCP would look like:

POST /upload/storage/v1/b/my-datalake-bucket/o?uploadType=media&name=warehouse/db/sales/data/file.parquet HTTP/1.1
Host: storage.googleapis.com
Date: Fri, 24 May 2024 12:45:00 GMT
Content-Length: 134217728
Content-Type: application/octet-stream

And for Azure:

PATCH /my-container/warehouse/db/sales/data/file.parquet?action=append&position=0 HTTP/1.1
Host: my-datalake.dfs.core.windows.net
x-ms-date: Fri, 24 May 2024 12:45:00 GMT
x-ms-version: 2023-11-03
Content-Length: 134217728
Content-Type: application/octet-stream

The question is: how do you know the object storage provider so that the server can pick the right signing algorithm? The only (heuristic) way is to inspect the Host header, but that's brittle. It's much simpler if the client tells the server what object storage provider to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am taking of sign request from IRC client to IRC server, i believe what you are showing is IRC server to object store sign ? am i missing something
like IRC client will do a post to /v1/{prefix}/namespaces/{namespace}/tables/{table}/sign with uri as param
https://github.com/apache/iceberg/pull/15112/changes#diff-02549ca620d020dc9ead80088cc14e311e12a69651fa8d394cd41a4308debb2eR4725

i think this would an absolute path right ? s3:////table/data/a.parquet

If remote signing for a specific storage provider is enabled, clients must respect the following configurations when creating a remote signer client:
- `signer.uri`: the base URI of the remote signer endpoint. Optional; if absent, defaults to the catalog's base URI.
- `signer.endpoint`: the path of the remote signer endpoint. Required. Should be concatenated with `signer.uri` to form the complete URI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOULD or MUST ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complicated 😄

The signer client impl uses org.apache.iceberg.rest.RESTUtil#resolveEndpoint to perform the concatenation of signer.uri and signer.endpoint.

So, signer.endpoint could also be an absolute URL, in which case, signer.uri would be ignored.

I will try to come up with a better wording.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased, lmk what you think!

allOf:
- $ref: '#/components/schemas/Expression'

MultiValuedMap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is S3Headers eq section in the s3 signer spec ? can we say like ObjectStoreProviderHeader ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for a more generic name because there is nothing specific to remote signing here. This component could perfectly be used for something else in the spec.

- `s3.secret-access-key`: secret for credentials that provide access to data in S3
- `s3.session-token`: if present, this value should be used for as the session token
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification
- `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `RemoteSignRequest` schema section of this spec document.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I chose to keep this property specific to S3. I think that even if the signer endpoint is now generic, enablement should be performed for each specific object storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually do google GCS cloud access via its s3 gateway; same signing algorithm, just a few different settings to change listing version, endpoint, &c

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/third_party_stores.md#google-cloud-storage-through-the-s3a-connector

public String baseSignerUri() {
return properties().getOrDefault(S3_SIGNER_URI, properties().get(CatalogProperties.URI));
return properties()
.getOrDefault(CatalogProperties.SIGNER_URI, properties().get(CatalogProperties.URI));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this breaking existing behavior where one could have provided the s3.signer.uri but now we don't read that property anymore and rely on signer.uri. The same for the endpoint

* @deprecated since 1.11.0, will be removed in 1.12.0; use {@link CatalogProperties#SIGNER_URI}
* instead.
*/
@Deprecated public static final String S3_SIGNER_URI = CatalogProperties.SIGNER_URI;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can just change the value here as that would break backwards compatibility

"true",
CatalogProperties.URI,
uri,
CatalogProperties.SIGNER_ENDPOINT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't needed before but is needed now, which indicates that this is a breaking change for users?


paths:

/v1/aws/s3/sign:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we would want to remove this spec yet. We should probably first deprecate it

}
}

public static class RemoteSignRequestSerializer<T extends RemoteSignRequest>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these all should probably just be package-private and not public

gen.writeEndArray();
}
gen.writeEndObject();
public static void headersToJson(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether we need to make this one and the one below public

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants