Skip to content

[Python][FS][Azure] Pickling SubTreeFileSystem(base_path, AzureFileSystem(...)) is lossy #49078

@Tom-Newton

Description

@Tom-Newton

Describe the bug, including details regarding any error messages, version, and platform.

Reproduce:

import pyarrow.fs

azure_fs = pyarrow.fs.AzureFileSystem(account_name="test", sas_token="test")
print(azure_fs.__reduce__())

subtree_fs = pyarrow.fs.SubTreeFileSystem("/tmp", azure_fs)
print(subtree_fs.base_fs.__reduce__())

Returns

(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': 'test', 'tenant_id': ''},))
(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': '', 'tenant_id': ''},))

Notice how the first result the sas_token is not empty but the second one is.

Cause:

The sas_token and a couple of the other values returned by AzureFileSystem.__reduce__ read from self of the python side AzureFileSystem object. When constructing a SubTreeFileSystem, the python side AzureFileSystem object is discarded and the SubTreeFileSystem only holds a pointer to the CAzureFileSystem. Therefore its not possible to reconstruct a python side AzureFileSystem including the attributes on self of the original AzureFileSystem.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions