Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch AzureDataLakeStorageV2Hook to use ManagedIdentityCredential for managed identity/workload auth #38497

Merged
merged 6 commits into from
May 27, 2024

Conversation

TJaniF
Copy link
Contributor

@TJaniF TJaniF commented Mar 26, 2024

When testing the AzureDataLakeStorageV2Hook with a managed identity authentication @melugoyal got the following error:

[2024-03-26, 04:32:41 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity
[2024-03-26, 04:32:41 UTC] {adls.py:167} INFO - account_url: <our account url>
[2024-03-26, 04:32:41 UTC] {adls.py:206} INFO - Error while attempting to get file system 'testcontainer': Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>
[2024-03-26, 04:32:46 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 408, in execute
    return hook.create_file(file_system_name=self.file_system_name, file_name=self.file_name).upload_data(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 249, in create_file
    file_client = self.get_file_system(file_system_name).create_file(file_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 200, in get_file_system
    file_system_client = self.service_client.get_file_system_client(file_system=file_system)
                         ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 132, in service_client
    return self.get_conn()
           ^^^^^^^^^^^^^^^
  File "/usr/local/airflow/include/azure_operators/adls.py", line 169, in get_conn
    return DataLakeServiceClient(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/filedatalake/_data_lake_service_client.py", line 96, in __init__
    self._blob_service_client = BlobServiceClient(blob_account_url, credential, **kwargs)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_blob_service_client.py", line 139, in __init__
    super(BlobServiceClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 110, in __init__
    self._config, self._pipeline = self._create_pipeline(self.credential, sdk_moniker=self._sdk_moniker, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 234, in _create_pipeline
    raise TypeError(f"Unsupported credential: {type(credential)}")
TypeError: Unsupported credential: <class 'airflow.providers.microsoft.azure.utils.AzureIdentityCredentialAdapter'>

It seems like AzureIdentityCredentialAdapter is not accepted by DataLakeServiceClient (potentially relevant Azure SDK line)

This PR worked for our workload identity auth. :)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 12, 2024
@TJaniF TJaniF marked this pull request as ready for review May 16, 2024 20:31
@TJaniF
Copy link
Contributor Author

TJaniF commented May 16, 2024

@Lee-W Thank you! Made the change you suggested and added a test, sorry it took so long! 🙂

I used the changed hook in a deployment with managed workload identity set up and got a successful task with:

[2024-05-16, 19:02:30 UTC] {managed_identity.py:80} INFO - ManagedIdentityCredential will use workload identity

On the testing side, I hope this works, I don't know that much about azure/ workload identity so I hope I am testing the right configuration. 😅

@TJaniF TJaniF requested a review from Lee-W May 16, 2024 20:36
@eladkal eladkal removed the stale Stale PRs per the .github/workflows/stale.yml policy file label May 16, 2024
@Lee-W Lee-W merged commit d5f81a4 into apache:main May 27, 2024
41 checks passed
RNHTTR pushed a commit to RNHTTR/airflow that referenced this pull request Jun 1, 2024
fdemiane pushed a commit to fdemiane/airflow that referenced this pull request Jun 6, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants