Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): add HDFS support via hdfs-native package #2612

Merged
merged 13 commits into from
Jun 21, 2024

Conversation

Kimahriman
Copy link
Contributor

Description

Add support for HDFS using hdfs-native, a pure* Rust client for interacting with HDFS. Creates a new hdfs sub-crate, adds it as a feature to deltalake meta crate, and includes it in Python wheels by default. There is a Rust integration test that requires Hadoop and Java to be installed, and makes use of a small Maven program I ship under the integration-test feature flag to run a MiniDFS server.

*Dynamically loads libgssapi_krb5 using libloading for Kerberos support

Related Issue(s)

Resolves #2611

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Jun 19, 2024
@ion-elgreco
Copy link
Collaborator

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

@rtyler can you also go over it?

@Kimahriman
Copy link
Contributor Author

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

Yeah I agree there should be some, but I didn't see anything for other storage backends so I wasn't sure where to add it. Any recommendation? A new page under integrations/?

@ion-elgreco
Copy link
Collaborator

@Kimahriman LGTM! Can you please also add some docs on the integration in the .MD files and if possible links some docs of the possible configs that can be set

Yeah I agree there should be some, but I didn't see anything for other storage backends so I wasn't sure where to add it. Any recommendation? A new page under integrations/?

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

@avriiil any inputs on this? Having some small explanation per object store for S3, adls, gcs and mounted storage would make sense, do you want to help on this?

@Kimahriman
Copy link
Contributor Author

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

I added a page. I couldn't get the docs to build locally to verify things though. Just kept getting

griffe.exceptions.AliasResolutionError: Could not resolve alias deltalake._internal.DeltaError pointing at _internal.DeltaError (in python/deltalake/_internal.abi3.so:None)

@ion-elgreco
Copy link
Collaborator

Yeah under integrations makes sense, perhaps called object storage and then a page for hdfs there

I added a page. I couldn't get the docs to build locally to verify things though. Just kept getting

griffe.exceptions.AliasResolutionError: Could not resolve alias deltalake._internal.DeltaError pointing at _internal.DeltaError (in python/deltalake/_internal.abi3.so:None)

Yeah there is something broken with the docs for some time now

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

@ion-elgreco ion-elgreco enabled auto-merge (squash) June 21, 2024 06:05
@ion-elgreco ion-elgreco merged commit d17ed97 into delta-io:main Jun 21, 2024
22 of 23 checks passed
@avriiil
Copy link
Contributor

avriiil commented Jun 25, 2024 via email

@ion-elgreco
Copy link
Collaborator

@avriiil yeah some docs for each object store would be great 😃

@avriiil
Copy link
Contributor

avriiil commented Jun 27, 2024

sounds good, adding this to my list for next week @ion-elgreco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support HDFS via hdfs-native package
3 participants