Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][Docs] Update outdated examples for hdfs/azure on the Parquet doc page #28745

Closed
asfimport opened this issue Jun 10, 2021 · 2 comments
Closed

Comments

@asfimport
Copy link

From #10492

  • The chapter "Writing to Partitioned Datasets" still presents a "solution" with "hdfs.connect" but since it's mentioned as deprecated no more a good idea to mention it.
  • The chapter "Reading a Parquet File from Azure Blob storage" is based on the package "azure.storage.blob" ... but an old one and the actual "azure-sdk-for-python" doesn't have any-more methods like get_blob_to_stream(). Possible to update this part with new blob storage possibilities, and also another mentioning the same concept with Delta Lake (similar principle but since there are differences ...)

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

Note: This issue was originally created as ARROW-13034. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
And additionally, for the part about reading from the cloud, we should more clearly refer to fsspec based systems as a workaround for now for filesystems that are not supported natively by Arrow.

@asfimport
Copy link
Author

David Li / @lidavidm:
Issue resolved by pull request 10548
#10548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants