Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-5446: Added code-gen for rust async. #2426

Closed
wants to merge 3 commits into from

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Aug 1, 2021

This draft PR adds support to generate the async version for Rust. Note that when it comes to async, Rust is effectively colored, and thus we need both versions.

It introduces:

  • a small generalization of the compiler to generate async next to the sync version
  • two traits in the library, TInputStreamProtocol and TOutputStreamProtocol used to consume the stream
  • an implementation of TInputStreamProtocol and TOutputStreamProtocol for compact

I verified that this code generates a valid implementation for parquet.thrift, from which a diff can be seen here.

NOTE: this PR introduces two dependencies to the rust library, async-trait and futures.

The first is a macro to enable support for traits with async (which TInputStreamProtocol is).
The second is a very common set of traits useful for declaring async interfaces without committing to a particular runtime (e.g. tokyo).

@jorgecarleitao jorgecarleitao marked this pull request as draft August 1, 2021 12:27
@allengeorge
Copy link
Contributor

First off, thank you for starting work on this feature!

A couple of high-level thoughts from skimming through the code. I wonder if there's a better way of generalizing the functions than adding an is_sync flag everywhere and repeating the is_sync block in multiple places to add the async prefix and use .await?. There are a couple of places where we now call a function with multiple boolean args, and without keyword args it can get confusing.

Next, I would like to see a rough cut of how TInputStreamProtocol is implemented, just to see how everything fits together.

Finally, I've never used async-trait. Do you know if the Thrift generator requires any features that are not supported by it?

@jorgecarleitao
Copy link
Member Author

Thanks!

A couple of high-level thoughts from skimming through the code. I wonder if there's a better way of generalizing the functions than adding an is_sync flag everywhere and repeating the is_sync block in multiple places to add the async prefix and use .await?. There are a couple of places where we now call a function with multiple boolean args, and without keyword args it can get confusing.

I have no strong opinions here: the approach here just avoids code duplication by branching of with the is_sync. I personally have no problem with these arguments, but happy to change it to whatever makes sense to you.

Finally, I've never used async-trait. Do you know if the Thrift generator requires any features that are not supported by it?

async-trait is just a macro to enable traits with async. We have been using it in Apache Arrow (e.g. here) to declare a trait that is async. integer-encoding, which we depend on, also uses it when compiled with futures_async. The declarations do not need this, only the lib declaring and implementing the trait.

Next, I would like to see a rough cut of how TInputStreamProtocol is implemented, just to see how everything fits together.

I agree. I have been working on it today. I just pushed a draft to this. Still not compiling, but it gives a general idea; it is just mundane code of the equivalent async of the TCompactInputProtocol.

@jorgecarleitao
Copy link
Member Author

Small update: with this version of the code, I was able to generate new code for parquet.thrift from parquet-format-rs (sunchao/parquet-format-rs#19) and use it to write an async version of parquet metadata (jorgecarleitao/parquet2#33) that reads a large metadata file correctly. Thus, this end-to-end test seems to work as intended.

@jorgecarleitao
Copy link
Member Author

@allengeorge , I have no idea what practices are used here wrt to testing, could you guide me in what you think makes sense to test so that I can cover this accordingly?

@jorgecarleitao jorgecarleitao force-pushed the rs-async-read branch 3 times, most recently from b65909c to fd0c24f Compare August 8, 2021 14:26
@jorgecarleitao
Copy link
Member Author

pushed an implementation of the writing and corresponding protocol TOutputStreamProtocol and implementation for Compact.

@jorgecarleitao jorgecarleitao changed the title THRIFT-5446: Added code-gen for rust async read. THRIFT-5446: Added code-gen for rust async. Aug 13, 2021
@allengeorge
Copy link
Contributor

allengeorge commented Sep 10, 2021

@allengeorge , I have no idea what practices are used here wrt to testing, could you guide me in what you think makes sense to test so that I can cover this accordingly?

@jorgecarleitao I'm sorry for letting this linger. The last month has been very tough in terms of getting to work on open-source. I'm gonna try get up to speed as quickly as I can with the async space so that I can properly review this PR.

wrt your question. Actually, the thrift project has quite a few test suites:

  1. There are unit tests available for the Rust crate
  2. There is a "kitchen-sink" test that allows you to build a client/server pair, and allow you to manually run through a battery of tests
  3. There is a "cross-test" application that can be utilitized to test cross-language interoperability

I'm happy to provide any details I can on any of these test suites

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@cameron-martin
Copy link

How come this has been closed? This seems like a great feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants