Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1356 implements OAuth2 Two-legged flow #1357

Open
wants to merge 26 commits into
base: devel
Choose a base branch
from

Conversation

willi-mueller
Copy link
Contributor

@willi-mueller willi-mueller commented May 14, 2024

Description

This PR implements OAuth2 Client Credentials Flow with access token refresh for server-to-server authorization.

Related Issues

Additional Context

Example usage:

import dlt

from rest_api import RESTAPIConfig, rest_api_source
from dlt.sources.helpers.rest_client.auth import OAuth2ClientCredentialsFlow
from dlt.sources.helpers.rest_client.paginators import JSONResponseCursorPaginator

class OAuth2Zoom(OAuth2ClientCredentialsFlow):
    def build_access_token_request(self) -> Dict[str, Any]:
        authentication: str = b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()
        return {
            "url": "https://zoom.us/oauth/token",
            "headers": {
                "Authorization": f"Basic {authentication}",
                "Content-Type": "application/x-www-form-urlencoded",
            },
            "data": self.access_token_request_data,
        }


config: RESTAPIConfig = {
    "client": {
        "base_url": "https://api.zoom.us/v2",
        "auth": OAuth2Zoom(
            access_token_request_data={
                "grant_type": "account_credentials",
                "account_id": dlt.secrets["sources.zoom.account_id"]+"foo",
            },
            client_id=dlt.secrets["sources.zoom.client_id"],
            client_secret=dlt.secrets["sources.zoom.client_secret"],
        ),
        "paginator": JSONResponseCursorPaginator(
            cursor_path="response.next_page_token",
            cursor_param="next_page_token",
        ),
    },
    "resources": [
        "users",
        {
            "name": "meetings",
            "endpoint": {
                "path": "users/{user_id}/meetings",
                "params": {
                    "user_id": {
                        "type": "resolve",
                        "resource": "users",
                        "field": "id",
                    }
                },
            },
        },
    ],
}

zoom_source = rest_api_source(config)
pipeline = dlt.pipeline(pipeline_name="zoom_test", destination="duckdb", progress="log")
load_info = pipeline.run(zoom_source)

print(load_info)

Copy link

netlify bot commented May 14, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 352b5d7
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66718bf188732000082d4260
😎 Deploy Preview https://deploy-preview-1357--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@willi-mueller
Copy link
Contributor Author

willi-mueller commented May 14, 2024

Questions:

  1. Am I using @configspeccorrectly? Do I have to implement the __init__ in this way?
  2. make lint has many errors unrelated to this PR when invoked on my PC. How can I test if it lints properly?
  3. How to implement a test case? I have a running pipeline pasted into this PR but I'm unsure where to implement it in the dlt core. Can you help me?

Thank you!

@burnash burnash self-assigned this May 14, 2024
@burnash burnash added the enhancement New feature or request label May 14, 2024
@willi-mueller
Copy link
Contributor Author

I'm doing more research right now on which OAuth sources would be covered by this implementation. So far, I have tested only with Zoom. I know that LinkedIn and Hivebrite would not work.

@willi-mueller willi-mueller marked this pull request as draft May 15, 2024 08:11
@willi-mueller willi-mueller force-pushed the 1356-oauth2-implicit branch 2 times, most recently from 2cd8d72 to 5f4b8ec Compare May 21, 2024 17:32
self.token_expiry = pendulum.now().add(seconds=expires_in)


class OAuth2Zoom(OAuth2ImplicitFlow):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @willi-mueller I think once this out of the draft state this class would be too specific for this file. We could keep the code nonetheless and I would suggest moving this to the docs as an example of OAuth. (Right now there's nothing in the docs related to handling OAuth). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the version you reviewed was too specific and that any specific implementation, Zoom here, should not be part of the core code but part of the documentation.

I refactored the obtain_token() to a template method. After this change, it works for many OAuth 2.0 flavors. Also, I did a couple of other refactorings and simplifications of the interface.

Example implementation for Zoom:

class OAuth2Zoom(OAuth2ImplicitFlow):
    def build_access_token_request(self) -> Dict[str, Any]:
        authentication: str = b64encode(f"{self.client_id}:{self.client_secret}".encode()).decode()
        return {
            "url": "https://zoom.us/oauth/token",
            "headers": {
                "Authorization": f"Basic {authentication}",
                "Content-Type": "application/x-www-form-urlencoded",
            },
            "data": self.access_token_request_data,
        }


auth = OAuth2Zoom(
          access_token_request_data={
              "grant_type": "account_credentials",
              "account_id": dlt.secrets["sources.zoom.account_id"],
          },
          client_id=dlt.secrets["sources.zoom.client_id"],
          client_secret=dlt.secrets["sources.zoom.client_secret"],
      )

Implementation for LinkedIn:

class OAuth2LinkedIn(OAuth2ImplicitFlow):
    def build_access_token_request(self) -> Dict[str, Any]:
        return {
            "url": "https://www.linkedin.com/oauth/v2/accessToken",
            "headers": {
                "Content-Type": "application/x-www-form-urlencoded",
            },
            "data": {
                **self.access_token_request_data,
                "client_id": dlt.secrets["sources.linkedin.client_id"],
                "client_secret": dlt.secrets["sources.linkedin.client_secret"],
            }
        }


auth = OAuth2LinkedIn(
    access_token_request_data={
        "grant_type": "client_credentials",
    },
    client_id=dlt.secrets["sources.linkedin.client_id"],
    client_secret=dlt.secrets["sources.linkedin.client_secret"],
),

What do you think about this direction? Do you have ideas on how to refine it further?

Thank you for your review and your comments!

@willi-mueller willi-mueller marked this pull request as ready for review May 25, 2024 07:01
@willi-mueller willi-mueller changed the title #1356 implements OAuth2 implicit flow #1356 implements OAuth2 Two-legged flow May 29, 2024
@willi-mueller
Copy link
Contributor Author

for the record: one user in slack reported that he used this OAuth2 implementation successfully for Hivebrite

https://hivebrite.com/documentation/api/admin#oauth-post

https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1716812907076409?thread_ts=1715003601.550219&cid=C04DQA7JJN6

@burnash
Copy link
Collaborator

burnash commented Jun 5, 2024

Hey @willi-mueller, thanks again for the PR. Looks good in general. I'll help you with the tests, let me know if it's ok if I push to your branch.

@willi-mueller
Copy link
Contributor Author

Hey @willi-mueller, thanks again for the PR. Looks good in general. I'll help you with the tests, let me know if it's ok if I push to your branch.

Yes, please go ahead and refactor and add tests, I'll learn from that <3
The branch is yours.

assert response.status_code == 200
assert rest_client.auth.access_token is not None
expiry_0 = rest_client.auth.token_expiry
rest_client.auth.token_expiry = rest_client.auth.token_expiry.subtract(seconds=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burnash do you see a better way of testing a token refresh? An alternative might be to spy on the token refresh method and assert it got called. But maybe this integration test is more effective and it completes in milliseconds too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willi-mueller Yes, I like the idea of checking if token refresh got called. I'd include this in the test as well.

@burnash
Copy link
Collaborator

burnash commented Jun 17, 2024

@willi-mueller the CI is failing tests/sources/helpers/rest_client/test_client.py::TestRESTClient::test_oauth2_client_credentials_flow_wrong_client_id
Could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement OAuth2 Implicit Flow
2 participants