Skip to content

Core: Add session catalog implementation for REST#4830

Merged
rdblue merged 2 commits intoapache:masterfrom
rdblue:rest-add-session-catalog
May 25, 2022
Merged

Core: Add session catalog implementation for REST#4830
rdblue merged 2 commits intoapache:masterfrom
rdblue:rest-add-session-catalog

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 20, 2022

This adds RESTSessionCatalog that implements SessionCatalog. This is the last PR to replace #4575.

RESTSessionCatalog implements SessionCatalog and adds support for session-based auth handling using OAuth2.

For the catalog:

  • If the catalog is configured with credentials, those credentials are used to fetch a bearer token using OAuth2 client credentials flow, and that token is used for the Authorization header
  • If the catalog is configured with a bearer token, that token is used in the Authorization header.

Session context is updated to have a map of credentials. For each session context:

  • If the session context is configured with a Bearer token ("token" credential), that token is used as an access token in the Authorization header
  • If the session context is configured with a credential ("credential" credential), those credentials are used to fetch a bearer token using OAuth2 client credentials flow, and that token is used for the Authorization header
  • If the session context has any OAuth2 token type as a key in its credential map, that token is exchanged for an access token using OAuth2 token exchange flow, and the access token is used for the Authorization header. This exchange uses the catalog's token, if present, as the actor_token.

When a table is loaded:

  • If the table config contains a Bearer token, that token is used for the Authorization header
  • If the table config contains credentials, the credentials are used to fetch an access token using OAuth2 client credentials flow, and that token is used for the Authorization header.
  • If the table config contains any other OAuth2 token type, that token is exchanged for an access token using OAuth2 token exchange flow, and the access token is used for the Authorization header. This exchange uses the context token or catalog token as actor_token.

A scheduled thread pool is used to refresh tokens using the token exchange flow with no actor token. The token exchange uses the token that is being refreshed in the Authorization header.

@github-actions github-actions bot added the core label May 20, 2022
@rdblue rdblue force-pushed the rest-add-session-catalog branch from 4e48a2d to b02d046 Compare May 20, 2022 22:24
@rdblue
Copy link
Contributor Author

rdblue commented May 20, 2022

Looks like the file names are misleading git. RESTSessionCatalog is based on RESTCatalog, so it is more useful to diff those. The RESTCatalog implementation in this PR is new and should not be diffed with the previous.


public AuthSession refresh() {
Map<String, String> request = OAuth2Util.tokenExchangeRequest(
token, tokenType, ImmutableList.of(RESTCatalogProperties.CATALOG_SCOPE));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token exchange request uses the current token in the Authorization header and as the subject_token to refresh.

@danielcweeks
Copy link
Contributor

Overall, the logic and sequencing looks sound. The one thing I might suggest is actually creating a separate package org...iceberg.rest.auth and put much of the auth code there. RESTSessionCatalog class not has a lot of internal logic that's really about OAuth2 which could be pulled out separately (e.g. AuthSession and the static methods supporting flows). Some minor refactoring could help in terms of adding other auth in the future.

import org.apache.iceberg.rest.OAuth2Util;
import org.apache.iceberg.rest.RESTResponse;

public class OAuthTokenResponse implements RESTResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like this is deserialized using ObjectMapper.readValue() but I didn't see any setting for handling snake case field names, so just wanted to confirm that works...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a serialization implementation and tests in #4833. Thanks for catching this, @bryanck!

String credential = props.get(RESTCatalogProperties.CREDENTIAL);
if (credential != null && !credential.isEmpty()) {
String scope = props.getOrDefault(RESTCatalogProperties.SCOPE, RESTCatalogProperties.CATALOG_SCOPE);
authResponse = fetchClientCredentials(initClient, initHeaders, credential, scope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have the option for a different URL base for auth requests, to support an authorization server separate from the catalog server

@rdblue rdblue force-pushed the rest-add-session-catalog branch 3 times, most recently from 5df7ed4 to 35b1107 Compare May 22, 2022 19:46
@github-actions github-actions bot added the API label May 22, 2022
@rdblue rdblue force-pushed the rest-add-session-catalog branch 2 times, most recently from 2f50a55 to 0ec9bdc Compare May 22, 2022 20:47
@rdblue rdblue force-pushed the rest-add-session-catalog branch from 0ec9bdc to cbda0a5 Compare May 22, 2022 22:01
@rdblue rdblue force-pushed the rest-add-session-catalog branch 3 times, most recently from bfd0079 to 1c11698 Compare May 23, 2022 22:01
@rdblue rdblue force-pushed the rest-add-session-catalog branch from 1c11698 to 105b70f Compare May 23, 2022 23:55
'-Xep:StrictUnusedVariable:OFF',
'-Xep:TypeParameterShadowing:OFF',
'-Xep:TypeParameterUnusedInFormals:OFF',
'-Xep:DangerousThreadPoolExecutorUsage:OFF',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed to add ThreadPools.newScheduledPool. For some reason, errorprone doesn't like it when you create a ScheduledExecutorService.

// convert expiration interval to milliseconds
long expiresInMillis = unit.toMillis(expiresIn);
// how much ahead of time to start the request to allow it to complete
long refreshWindowMillis = Math.min(expiresInMillis / 10, MAX_REFRESH_WINDOW_MILLIS);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is to start the refresh with 10% of the retry window left, but no more than some maximum (in case expires-in is long, like 10 hours).

if (authResponse != null) {
this.catalogAuth = newSession(authResponse, startTimeMillis, catalogAuth);
} else if (initToken != null) {
this.catalogAuth = newSession(initToken, expiresInMs(mergedProps), catalogAuth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little awkward because if there's a token initially, the init doesn't go through the token exchange flow, so we don't actually know when the token is supposed to expire but still scheduling to refresh. It feels like we should either exchange immediately and use the expires response to schedule the interval or assume that a set token has no expiration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case where you specify an bearer token directly using "token". In that case, we use a catalog property to determine when to refresh that bearer token. This is the path for a scheduler to get an access token and pass it into a job. It can also pass the expiration interval, or just rely on the default.

Right now, we don't support an initial token exchange. We could go through the same logic as a new session or table load, where we look for different token-exchange types and exchange for an access token. The only trouble there is that I though it would be weird to take the token and exchange it right away. It seems like you should be using client credentials to bootstrap.

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rdblue rdblue merged commit c4c8611 into apache:master May 25, 2022
@rdblue
Copy link
Contributor Author

rdblue commented May 25, 2022

Thanks for the reviews, @kbendick and @danielcweeks!

kbendick pushed a commit to kbendick/iceberg that referenced this pull request May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants