Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure fileshare hook to use azure-storage-file-share instead of azure-storage-file #33904

Merged
merged 5 commits into from
Sep 2, 2023

Conversation

pankajastro
Copy link
Member

@pankajastro pankajastro commented Aug 30, 2023

closes: #33850
azure-storage-file has not been updated since August 2019 and it is now outdated.

AzureFileShareHook is getting just in one operator AzureFileShareToGCSOperator, that too only list_files and get_file_to_stream methods so little risk if we release some breaking changes.
The new azure-storage-file-share provides a more elegant way to call Azure API. Also, it allows to use features like a TokenCredential.

Since the update on the Azure SDK side is huge making it fully backward-compatible is a bit difficult and may not be possible

In the Airflow repo, we consider hook as a public interface/API so we might not like to do a direct breaking change release in that case I'll like to propose adding deprecation warring in the existing method and the param here is a WIP PR #33905 and then in the subsequent release, we can remove the deprecated one and use release this PR and also we will expose only whose method from Azure file share hook which is getting used in Airflow codebase.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues labels Aug 30, 2023
@pankajastro
Copy link
Member Author

pankajastro commented Aug 30, 2023

Hi @potiuk / @eladkal I'm trying to update AzureFileShareHook to use the newly updated Azure SDK but interfaces on the Azure SDK side have changed a lot which makes it difficult to deprecate the existing hook API and same time update it to use the new Azure SDK. I'm thinking either

  1. Add warring about Deprecation and release it and then the next release replaces the deprecated one with this PR
  2. Or, do a breaking release

I would like to hear your thoughts on this

@eladkal
Copy link
Contributor

eladkal commented Aug 30, 2023

We can do a major release. I think this case justify it.
Add entry in provider.yaml and make sure to add detailed information about what changed in the top of changelog

@pankajastro pankajastro marked this pull request as ready for review August 31, 2023 00:52
Comment on lines 41 to 53
* Remove protocol param from Azure fileshare connection extras
* Remove deprecated extra__azure_fileshare__ prefix from Azure fileshare connection extras
* Remove share_name, directory_name param from AzureFileShareHook.check_for_directory in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name param from AzureFileShareHook.check_for_file method in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name param from AzureFileShareHook.list_directories_and_files
* Remove share_name, directory_name param from AzureFileShareHook.list_files in favor of AzureFileShareHook share_name and directory_path param
* AzureFileShareHook method create_share accept kwargs from ShareServiceClient.create_share instead of FileService.create_share
* AzureFileShareHook method delete_share accept kwargs from ShareServiceClient.delete_share instead of FileService.delete_share
* Remove share_name, directory_name param from AzureFileShareHook.create_directory in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name, file_name param from AzureFileShareHook.get_file in favor of AzureFileShareHook share_name and file_path
* Remove share_name, directory_name, file_name param from AzureFileShareHook.get_file_to_stream in favor of AzureFileShareHook share_name and file_path
* Remove share_name, directory_name, file_name param from AzureFileShareHook.load_file in favor of AzureFileShareHook share_name and file_path
* Remove AzureFileShareHook.load_string, AzureFileShareHook.load_stream in favor of AzureFileShareHook.load_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Remove protocol param from Azure fileshare connection extras
* Remove deprecated extra__azure_fileshare__ prefix from Azure fileshare connection extras
* Remove share_name, directory_name param from AzureFileShareHook.check_for_directory in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name param from AzureFileShareHook.check_for_file method in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name param from AzureFileShareHook.list_directories_and_files
* Remove share_name, directory_name param from AzureFileShareHook.list_files in favor of AzureFileShareHook share_name and directory_path param
* AzureFileShareHook method create_share accept kwargs from ShareServiceClient.create_share instead of FileService.create_share
* AzureFileShareHook method delete_share accept kwargs from ShareServiceClient.delete_share instead of FileService.delete_share
* Remove share_name, directory_name param from AzureFileShareHook.create_directory in favor of AzureFileShareHook share_name and directory_path param
* Remove share_name, directory_name, file_name param from AzureFileShareHook.get_file in favor of AzureFileShareHook share_name and file_path
* Remove share_name, directory_name, file_name param from AzureFileShareHook.get_file_to_stream in favor of AzureFileShareHook share_name and file_path
* Remove share_name, directory_name, file_name param from AzureFileShareHook.load_file in favor of AzureFileShareHook share_name and file_path
* Remove AzureFileShareHook.load_string, AzureFileShareHook.load_stream in favor of AzureFileShareHook.load_data
* Remove protocol param from Azure fileshare connection extras
* Remove deprecated extra__azure_fileshare__ prefix from Azure fileshare connection extras
* Remove share_name, directory_name param from the following in favor of AzureFileShareHook share_name and directory_path param
* AzureFileShareHook.check_for_directory
* AzureFileShareHook.check_for_file
* AzureFileShareHook.list_files
* AzureFileShareHook.create_directory
* AzureFileShareHook.get_file
* AzureFileShareHook.get_file_to_stream
* AzureFileShareHook.load_file
* Remove share_name, directory_name param from AzureFileShareHook.list_directories_and_files
* AzureFileShareHook method create_share accept kwargs from ShareServiceClient.create_share instead of FileService.create_share
* AzureFileShareHook method delete_share accept kwargs from ShareServiceClient.delete_share instead of FileService.delete_share
* Remove AzureFileShareHook.load_string, AzureFileShareHook.load_stream in favor of AzureFileShareHook.load_data

Should we consolidate similar changes into one nested list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it a bit, PTAL

Comment on lines 55 to 57
self._connection_string = None
self._account_access_key = None
self._sas_token = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we annotate these values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, PTAL.

airflow/providers/microsoft/azure/hooks/fileshare.py Outdated Show resolved Hide resolved
if self._conn:
return self._conn
conn = self.get_connection(self.conn_id)
def get_conn(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why do we want to return None for this method? I thought we'd get a file service client

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

azure-storage-file-share exposes multiple clients, so I have added some property (share_service_client, share_directory_client, share_file_client) to access them instead of returning just one client from this method.

Copy link
Member

@Lee-W Lee-W Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, should we add a message to avoid confusion? calling get_conn and return nothing might be confusing

airflow/providers/microsoft/azure/hooks/fileshare.py Outdated Show resolved Hide resolved
airflow/providers/microsoft/azure/hooks/fileshare.py Outdated Show resolved Hide resolved
airflow/providers/microsoft/azure/hooks/fileshare.py Outdated Show resolved Hide resolved
airflow/providers/microsoft/azure/hooks/fileshare.py Outdated Show resolved Hide resolved
@alexbegg
Copy link
Contributor

I added an issue for this recently. Can you add to the description that this closes #33850

Thanks

@eladkal eladkal merged commit b7f84e9 into apache:main Sep 2, 2023
64 checks passed
@pankajastro pankajastro deleted the az-fs branch September 2, 2023 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Azure File Share to v12
5 participants