Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ airbyte-cdk - Adds JwtAuthenticator to low-code #37005

Merged
merged 47 commits into from
Apr 19, 2024

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented Apr 11, 2024

What

Notes

  • Most commonly used properties are included in the JWT Headers and JWT Payload objects in the schema.
  • Additional fields can be included within the additional_jwt_headers and additional_jwt_payload objects.
  • I did not include the jti property within the JWT Payload as it requires a unique value for every requests and as a result is not commonly used when accessing public APIs.
  • Developed based on the following resources:

Copy link

vercel bot commented Apr 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Apr 18, 2024 11:38pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Apr 11, 2024
Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't spent enough time, so just dumping comments that I've had so I don't forget about them.

I think it's a good idea to start the frontend piece as well and see how far we can get!

@@ -3,7 +3,9 @@
#

from airbyte_cdk.sources.declarative.auth.oauth import DeclarativeOauth2Authenticator
from airbyte_cdk.sources.declarative.auth.jwt import JwtAuthenticator

__all__ = [
"DeclarativeOauth2Authenticator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why SelectiveAuthenticator isn't here.

Copy link
Contributor Author

@pnilan pnilan Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer for you. Oauth is re-exported via __init__ but all other authenticators are accessed directly from their files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not intentional. It shouldn't matter as these classes should only be used through the YAML interface


__all__ = [
"DeclarativeOauth2Authenticator",
"JwtAuthenticator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should this also be DeclarativeJWTAuthenticator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the oauth2 auth would be the outlier: ApiKeyAuthenticator, BearerAuthenticator, BasicHttpAuthenticator, etc don't explicitly note declarative.



@dataclass
class JwtAuthenticator(DeclarativeAuthenticator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what our naming convention is — do we want acronyms to be all caps? I.e. JWTAuthenticator or JwtAuthenticator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not attached to it but I find JwtAuthenticator a bit easier to read

Copy link
Contributor Author

@pnilan pnilan Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed ApiKeyAuthenticator's lead.

title: Type
description: The media type of the complete JWT.
examples:
- JWT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's set JWT as a default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Incorporated.

description: The amount of time in seconds a JWT token can be valid after being issued.
examples:
- 1200
jwt_headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any of those properties required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but they are the most common properties.

type: string
description: Algorithm used to sign the JSON web token.
examples:
- "ES256"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so... I haven't found any consistency in the algorithm that APIs use:

title: Token Duration
description: The amount of time in seconds a JWT token can be valid after being issued.
examples:
- 1200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a common default value we can use here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any consistency, although 1200s (20 min) is the smallest duration I've seen -- maybe a good default.

That being said, does this mean if a sync takes longer than 20 minutes it would fail? Should I include some sort of refresh mechanism? Or, is the authenticator re-instantiated per read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! can we refresh the token at runtime? this is how we do it for oauth authenticators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda Actually I believe we're good -- each time the the request is prepared it will invoke _get_jwt_headers which will "refresh" the expiration time (and therefore refresh the token, as the token is the headers, payload, secret_key all encoded into a single string).

@@ -807,6 +809,24 @@ def create_json_decoder(model: JsonDecoderModel, config: Config, **kwargs: Any)
def create_json_file_schema_loader(model: JsonFileSchemaLoaderModel, config: Config, **kwargs: Any) -> JsonFileSchemaLoader:
return JsonFileSchemaLoader(file_path=model.file_path or "", config=config, parameters=model.parameters or {})

@staticmethod
def create_jwt_authenticator(model: JwtAuthenticatorModel, config: Config, **kwargs: Any) -> JwtAuthenticator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add unit tests here. be sure to cover to cover cases where optional value are set and omitted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Next step is to add unit tests.

parameters: InitVar[Mapping[str, Any]]
secret_key: Union[InterpolatedString, str]
algorithm: Union[InterpolatedString, str]
token_duration: Union[InterpolatedString, str] = 1200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a union[interpolated string, str, int]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to make token duration only an integer.

algorithm: str = self._algorithm.eval(self.config)
if not algorithm:
raise ValueError("Algorithm is required")
return f"{algorithm}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"{algorithm}"
return algorithm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

secret_key: str = self._secret_key.eval(self.config)
if not secret_key:
raise ValueError("secret_key is required")
return f"{secret_key}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"{secret_key}"
return secret_key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

headers = {}
headers.update(self._additional_jwt_headers.eval(self.config))
if self._kid:
headers["kid"] = f"{self._kid.eval(self.config)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail if "kid" or other required params was also defined in the additional jwt headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a check for both jwt_headers and jwt_payload.



@dataclass
class JwtAuthenticator(DeclarativeAuthenticator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not attached to it but I find JwtAuthenticator a bit easier to read

@pnilan pnilan marked this pull request as ready for review April 17, 2024 21:26
@pnilan pnilan requested a review from a team as a code owner April 17, 2024 21:26
@@ -40,7 +40,7 @@ Fix CDK version mismatch introduced in 0.78.8
Update error messaging/type for missing streams. Note: version mismatch, please use 0.78.9 instead

## 0.78.6
low-code: add backward compatibility for old close slice behavior
low-code: add backward compatibility for old close slice behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were changes to this file intentional?

Copy link
Contributor Author

@pnilan pnilan Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were unintentional -- I didn't realize there was a Publish CDK action so I had previously manually updated the changelog/pyproject, then reverted the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok please only revert changes to this file. It'll be updated as part of the publish action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

"HS384",
"HS512",
"ES256",
"ES256K",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this enum value isn't in jwt.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated.

self._sub = InterpolatedString.create(self.sub, parameters=parameters)
self._aud = InterpolatedString.create(self.aud, parameters=parameters)
self._cty = InterpolatedString.create(self.cty, parameters=parameters)
self._token_duration = self.token_duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means the value can't be interpolated. Is that ok?

Builds and returns the payload used when signing the JWT.
"""
now = int(datetime.now().timestamp())
exp = now + self._token_duration if isinstance(self._token_duration, int) else now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._token_duration can only be a int as far as I can tell

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was a result of being flagged by mypy --> the issue is I set token_duration as optional in the schema, and therefore it has type Optional[int] even though I have a default value to use. So it'll never be None but because of the auto-generated Optional[int] in the model, I can't avoid it from being flagged by mypy. Any workarounds here you're aware of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also re "this means the [token_duration] value can't be interpolated. Is that ok?"

I can't think of a scenario where a connector dev would want to use a string to define the token duration. It's in the spec that the iat (issued at), exp (expires at), and nbf (not before) claims be seconds since epoch. Instead of directly defining the exp claim, I have the dev set the token duration in seconds (typically a max value allowed by the API) to dynamically set the exp. This seems to me to be a better approach then expecting a connector dev to define iat and exp direction via jinja expressions (i.e. exp: {{ int(datetime.now().timestamp() + 1200 }}.

@staticmethod
def create_jwt_authenticator(model: JwtAuthenticatorModel, config: Config, **kwargs: Any) -> JwtAuthenticator:
model.jwt_headers = (
JwtHeadersModel(kid=None, typ="JWT", cty=None) if not isinstance(model.jwt_headers, JwtHeadersModel) else model.jwt_headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are other valid types for model.jwt_headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made both jwt_headers and jwt_payload optional in the schema, because all their properties are also optional there are scenarios where a dev connector may not need to set kid, typ, cty within the manifest). I covered this scenario in test_model_to_component_factory.py.

Additionally, I woud've preferred just instantiating the model without arguments since all three parameters have defaults, by mypy flagged it, so I added in the arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Can you change the check to is not None? isinstance implies the type might be something else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

JwtHeadersModel(kid=None, typ="JWT", cty=None) if not isinstance(model.jwt_headers, JwtHeadersModel) else model.jwt_headers
)
model.jwt_payload = (
JwtPayloadModel(iss=None, sub=None, aud=None) if not isinstance(model.jwt_payload, JwtPayloadModel) else model.jwt_payload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are other valid types for model.jwt_payload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment regarding model.jwt_headers.

Copy link
Contributor Author

@pnilan pnilan Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to not None

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Apr 18, 2024
@pnilan pnilan requested a review from girarda April 18, 2024 23:01
@pnilan pnilan merged commit b20cd1b into master Apr 19, 2024
32 checks passed
@pnilan pnilan deleted the pnilan/airbyte-cdk-jwt-auth branch April 19, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants