Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate hdfs services with hdfs-native #3144

Open
Xuanwo opened this issue Sep 21, 2023 · 18 comments
Open

Integrate hdfs services with hdfs-native #3144

Xuanwo opened this issue Sep 21, 2023 · 18 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Sep 21, 2023

https://github.com/Kimahriman/hdfs-native is a pure Rust implementation of an HDFS client. It might be worth considering integrating it as a new service in opendal, allowing our users to give it a try.

@shbhmrzd
Copy link
Contributor

I am looking for basic hdfs operations read, write and delete in my app.
The current hdfs-sys based approach is costing in terms of image size due to the hadoop jars which need to be pulled in. If this is a pure rust implementation then it could be worth investing.
I can take this up as we already use OpenDAL in our application, and having support for hdfs-native will reduce need for an additional layer in my app.
Please let me know if this is up for grab and I can spend time on it.
Thank you!

@Xuanwo
Copy link
Member Author

Xuanwo commented Dec 22, 2023

Hi, @shbhmrzd, I'm waiting for Kimahriman/hdfs-native#13.

@shbhmrzd
Copy link
Contributor

shbhmrzd commented Dec 22, 2023

Sure @Xuanwo
I would like to pick up OpenDAL integration when it is planned :)

@Xuanwo
Copy link
Member Author

Xuanwo commented Dec 22, 2023

Sure @Xuanwo I would like to pick up OpenDAL integration when it is planned :)

Thanks! I will ping you here when it's ready.

@Xuanwo Xuanwo added good first issue Good for newcomers help wanted Extra attention is needed labels Jan 2, 2024
@Xuanwo
Copy link
Member Author

Xuanwo commented Jan 2, 2024

hdfs-native 0.5 has been released, let's do this!

@shbhmrzd
Copy link
Contributor

shbhmrzd commented Jan 2, 2024

Awesome
Can I take it up ?

@Xuanwo
Copy link
Member Author

Xuanwo commented Jan 3, 2024

Awesome Can I take it up ?

Welcome! Have fun. We can add the layout first and fill the implementaion one by one.

@Xuanwo
Copy link
Member Author

Xuanwo commented Feb 9, 2024

Thanks to @shbhmrzd and @jihuayu's efforts. We have the basic feature set now. Maybe it's time for us to establish the behavior test before adding more features.

@jihuayu
Copy link
Member

jihuayu commented Feb 9, 2024

@Xuanwo Hi, I want to add the behavior tests for it.
Is there some information about how to add the behavior test?

@Xuanwo
Copy link
Member Author

Xuanwo commented Feb 9, 2024

Is there some information about how to add the behavior test?

Our behavior test exists at https://github.com/apache/opendal/tree/main/core/tests/behavior

We can follow the same content from hdfs for our hdfs native tests:

The test should be run automanticly in our CI.

@Xuanwo
Copy link
Member Author

Xuanwo commented Feb 23, 2024

Hi, @jihuayu, do you need some help from me?

@jihuayu
Copy link
Member

jihuayu commented Feb 24, 2024

@Xuanwo Thank you!
I did encounter some trouble:
When I was preparing to write test cases, I found that the reader and writer for hdfs_native were not implemented, so I decided to implement them first.
Since I'm not very familiar with asynchronous Rust, my code has not been functioning properly. Could you help me take a look at how I should write it?
https://github.com/jihuayu/opendal/blob/f/hdfs-test/core/src/services/hdfs_native/writer.rs#L43
After running it, I found that the write never stops.

@Xuanwo
Copy link
Member Author

Xuanwo commented Feb 24, 2024

After running it, I found that the write never stops.

FileWrtier::write is an async function, every call to f.write() will create a new future. So you will need to store this future and poll it until Ready.

For example:

enum State<R> {
Idle,
SendStat(BoxedFuture<Result<RpStat>>),
SendRead(BoxedFuture<Result<(RpRead, R)>>),
Read(R),
}

@jihuayu
Copy link
Member

jihuayu commented Feb 24, 2024

@Xuanwo Ohhh! I know! Thank you. I will have a try!

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 10, 2024

Hi, @jihuayu, I did a refactor to the whole opendal's IO trait. Would you like to take another try?

@jihuayu
Copy link
Member

jihuayu commented Apr 11, 2024

@Xuanwo Thank you. I love the new trait.
I've been quite busy lately, and I'll be back in a few months.

@shbhmrzd
Copy link
Contributor

@Xuanwo Can I take a swing at implementing the read and write?
Thank you!

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 18, 2024

Can I take a swing at implementing the read and write?

Of course, have fun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants