Add support for reading remote storage systems #811

yjshen · 2021-08-02T05:44:01Z

Which issue does this PR close?

Closes #616

Rationale for this change

Currently, we can only read files from LocalFS since we use std::fs in ParquetExec. It would be nice to add support to read files that reside on storage sources such as HDFS, Amazon S3, etc.

What changes are included in this PR?

Introduce ObjectStore API as an abstraction of the underlying storage systems, such as local filesystem, HDFS, S3, etc. And make the ObjectStore implementation pluggable through the ObjectStoreRegistery in Datafusion's ExecutionContext.

Are there any user-facing changes?

Users can provide implementations for the ObjectStore trait, register it into ExecutionContext's registry, and run queries against data that resides in remote storage systems as well as local fs (the only default implementation of ObjectStore).

houqp · 2021-08-02T05:48:12Z

Thank you @yjshen , this is huge! I will help review it tomorrow.

yjshen · 2021-08-02T06:06:30Z

Thank you @yjshen , this is huge! I will help review it tomorrow.

Thanks, @houqp. Currently, it's just a draft and not ready for a full review. I made it out early to get your idea to see if it's on the right track.

houqp

Looks like there is some duplication on file content reading and listing responsibilities between DataSource, SourceDescBuilder and ProtocolHandler traits. Would be good to give more thoughts on how these abstractions would interact with each other.

datafusion/src/datasource/protocol_registry.rs

datafusion/src/datasource/datasource2.rs

datafusion/src/execution/context.rs

alamb

This is looking quite cool @yjshen -- thank you! I left some comments of things that might be worth considering.

Also I wonder if you have thought about trying to make this interface async somehow? I realize the underlying parquet reader isn't async (yet) but I think that is the direction things are heading in Rust I/O land

datafusion/src/datasource/protocol_registry.rs

datafusion/src/execution/context.rs

datafusion/src/datasource/protocol_registry.rs

datafusion/src/datasource/local.rs

datafusion/src/execution/context.rs

rdettai

Thanks for this interesting proposition @yjshen!
I agree with @alamb that async should be taken into account, especially for fetching the file list and metadata which are typically high latency but with little processing. But here you cannot use async because the file list and statistics are materialized at the ParquetTable creation level which is too early. This early materialization will also be problematic with buckets that have thousands of files:

getting the metadata from all parquet files will be too long
it can be interesting to leave the listing to the last moment, so that in case you implement some partition pruning later on, you can list the files only in the partitions you are interested in.

Overall I would prefer (but this is just my opinion) a higher level abstraction in which we can also plug catalogs such as Delta or Iceberg

datafusion/src/datasource/object_store.rs

yjshen · 2021-08-11T09:59:00Z

@houqp @alamb I've done with the original implementation by abstracting file listing/reading logic into ObjectStore and ObjectReader, and I think it's ready for review again.

/// Objct Reader for one file in a object store
pub trait ObjectReader {
    /// Get reader for a part [start, start + length] in the file
    fn get_reader(&self, start: u64, length: usize) -> Box<dyn Read>;

    /// Get lenght for the file
    fn length(&self) -> u64;
}

/// A ObjectStore abstracts access to an underlying file/object storage.
/// It maps strings (e.g. URLs, filesystem paths, etc) to sources of bytes
pub trait ObjectStore: Sync + Send + Debug {
    /// Returns the object store as [`Any`](std::any::Any)
    /// so that it can be downcast to a specific implementation.
    fn as_any(&self) -> &dyn Any;

    /// Returns all the files with filename extension `ext` in path `prefix`
    fn list_all_files(&self, prefix: &str, ext: &str) -> Result<Vec<String>>;

    /// Get object reader for one file
    fn get_reader(&self, file_path: &str) -> Result<Arc<dyn ObjectReader>>;
}

Currently, there are several things remaining (I suppose that are not blockers for this PR, please correct me if get something wrong):

Async listing (list_all_files) as well as async reading (get_reader).
Figure out for ballista how to register ObjectStore in the client and pass the registration on to executors.
Make JSON / CSV read from ObjectReader as well.

yjshen · 2021-08-11T10:01:07Z

Regarding the async part, should I just make async fn list_all_files? and wait for parquet / csv reading asynced and proceed?

rdettai

Thanks again @yjshen !

My high level feeling is still that we lack an abstraction for the list of files (catalog).

rdettai · 2021-08-11T09:33:40Z

datafusion/src/datasource/parquet.rs

+            .unwrap()
+            .object_store_registry
+            .store_for_path(root_path);
+        let root_desc = Self::get_source_desc(root_path, object_store.clone(), "parquet");


in my experience, it is too strict to expect parquet files to have the parquet suffix

Agree, it was restricted to parquet suffix in the original implementation, so I moved it here. Probably we could make it as an argument and ask from the user?

Or just list all files that not start with _ ?

I agree with @rdettai , I think we can also address this as a quick follow up PR since this is also the old behavior.

datafusion/src/datasource/mod.rs

rdettai · 2021-08-11T09:50:09Z

datafusion/src/datasource/csv.rs

@@ -64,7 +66,8 @@ impl CsvFile {
        let schema = Arc::new(match options.schema {
            Some(s) => s.clone(),
            None => {
-                let filenames = common::build_file_list(&path, options.file_extension)?;
+                let filenames = LocalFileSystem


why not also allow csv/json files to be fetched using the object_store_registry ? this would make the behavior more consistent, but can definitively be added later.

I leave out csv/json for now for simplicity, since their reading logic are quite different from parquet, I prefer to do these as follow-ups.

datafusion/src/datasource/parquet.rs

yjshen · 2021-08-19T07:33:03Z

@alamb @andygrove @Dandandan @jorgecarleitao @rdettai On making the remote storage system object listing & data reading API async, a design choice occurs. This might be quite important, and I'd love to have your suggestions:

To which level should I propagate async?

This was because once we have async dir listing -> we can have async logical plans & async table provider -> we can have async DataFrame / context API

Two available alternatives are:

Limit async to just listing / metadata_fetch / file read, wrap a sync version over these async and keep most of the user-facing API untouched. (keep the PR lean as possible)
Propogate Async API all the way up and finally change the user-facing API: including DataFrame & ExecutionContext. (which includes huge user-facing API changes ).

Currently, This PR took the first approach by constructing all APIs in ObjectStore / ObjectReader / SourceRootDescriptor natively in async and wrap the async function to a sync one. Trying to keep other parts of the project untouched. Great thanks to @houqp for guiding me through the way.

Does approach 1 make sense to you?

If I take approach 1, how should the sync version function be constructed?

This PR tries to make a wrapper over the async counterparts and keep single logic for each functionality. therefore relies on futures::executor::block_on to bridge async to sync function.

However, this approach is flawed for block_on may block the only thread in tokio, and the future inside won't get a chance to run, therefore hanging forever if the tokio runtime is not a multi-threaded one. (I temporarily change the related test to use #[tokio::test(flavor = "multi_thread", worker_threads = 2)] to avoid hanging). Do you have any suggestions on this?

jorgecarleitao · 2021-08-19T07:45:01Z

Thanks a lot for taking a good look at this and for the proposal.

Propogate Async API all the way up and finally change the user-facing API: including DataFrame & ExecutionContext

Could you describe which APIs would be affected by this? For example, creating a logical plan would become async because we have to read metadata to build a schema, correct? So, for example, things like df = context.read_parquet(...).await?;, right?

I agree with making the planing async: there is no guarantee that we synchronously have all the information to build the plan in the first place, and imo we should not block because we need to read 50 metadata files from s3.

I agree that this would be a major change. :)

yjshen · 2021-08-19T09:55:31Z

Could you describe which APIs would be affected by this?

Mainly API change:

execution/context: read(register)_parquet / read(register)_csv / read(register)_json. etc.

Other pub function / trait touched:

logical_plan: scan(csv/parquet/json)
physical_plan: csv / parquet / json

Upstream dependencies need to change:

arrow parquet crate: ChunkReader / Length / ParquetReader

alamb · 2021-08-19T18:20:12Z

I am starting to check this out carefully

alamb

First of all, thank you again @yjshen and everyone else who has helped out on this PR.

This PR is pretty massive and I think we should begin breaking it down to merge it in -- the longer it stays open the more potential conflicts it hits as well as the longer before others can start playing with / helping out it.

TLDR; I would be in favor of making the DataFusion planning API async. async planning is inevitable, in my opinion, if we want DataFusion to operate on remote catalogs that have not been locally cached in memory and must be accessed via async I/O.

From my perspective, there are actually several major changes to this PR:

An API to read data (during async ExecutionPlan::execute) from a remote file system
An API to read the metadata from a remote filesystem (e.g. what files exist, read parquet statistics, etc)
Partial rewrite of NDJson, CSV and parquet readers to use the new ObjectStore API

The first is sufficient to do partial reads from S3 / other filesources if you already know what files exist there.

The second is needed to drive DataFusion entirely from a remote data source without having to read/cache a catalog locally.

To which level should I propagate async?

Since execution (calling ExecutionPlan::execute and then collect on the result) in DataFusion is async, I think adding the async read (change 1 above) is a relatively small change and no async propagation is needed.

However, since as of today planning (everything up to calling ExecutionPlan::execute) in Datafusion is not async if we want to support async catalog/metadata access (usecase 2 above) then I think we have no choice but to propagate async all the way into planing.

To be clear, given the direction of database systems in general towards distributed systems I am in favor of plumbing async as far up to planning as needed to allow use of DataFusion with non-local catalogs. However, as you have noted, this is a much larger code change.

The alternate compromise, which you have partly implemented in this PR, is to implement both async and non async versions. This is similar to the approach in the C/C++ filesystem api (props to @nealrichardson for the pointer), which has both having synchronous and asynchronous APIs.

I also posted an announcement to the arrow mailing list about this change for broader visibility.

alamb · 2021-08-19T18:42:41Z

ballista/rust/scheduler/src/planner.rs

@@ -269,8 +269,8 @@ mod test {
        };
    }

-    #[test]
-    fn distributed_hash_aggregate_plan() -> Result<(), BallistaError> {
+    #[tokio::test]


In case anyone else is interested, this is what happens if you don't have tokio::test:

failures: ---- planner::test::distributed_hash_aggregate_plan stdout ---- thread 'planner::test::distributed_hash_aggregate_plan' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime', /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.10.0/src/runtime/blocking/pool.rs:84:33 stack backtrace: 0: rust_begin_unwind at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5 1: core::panicking::panic_fmt at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:92:14 2: core::option::expect_failed at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/option.rs:1243:5 3: core::option::Option<T>::expect at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/option.rs:351:21 4: tokio::runtime::blocking::pool::spawn_blocking at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.10.0/src/runtime/blocking/pool.rs:84:14 5: tokio::fs::asyncify::{{closure}} at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.10.0/src/fs/mod.rs:119:11 6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/future/mod.rs:80:19 7: tokio::fs::metadata::metadata::{{closure}} at /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.10.0/src/fs/metadata.rs:46:5 8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/future/mod.rs:80:19 9: datafusion::datasource::object_store::local::list_all_async::{{closure}} at /Users/alamb/Software/arrow-datafusion/datafusion/src/datasource/object_store/local.rs:148:8 10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll at /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/future/mod.rs:80:19 11: datafusion::datasource::object_store::local::list_all::{{closure}} at /Users/alamb/Software/arrow-datafusion/datafusion/src/datasource/object_store/local.rs:111:15

alamb · 2021-08-19T18:46:38Z

datafusion/Cargo.toml

@@ -56,9 +56,9 @@ paste = "^1.0"
 num_cpus = "1.13.0"
 chrono = "0.4"
 async-trait = "0.1.41"
-futures = "0.3"
+futures = { version = "0.3", features = ["executor"] }


Since we already have tokio (which has full on executor) I don't think we also need the futures executor so I would like to avoid this new dependency.

I tried removing this change locally and it seems to work

I think you can use tokio::runtime::Handle::block_on rather than futures::executor::block_on as a way to play as nicely as possible with the tokio executor: https://docs.rs/tokio/1.10.0/tokio/runtime/struct.Handle.html#method.block_on

So something like

Handle::current() .block_on(async { .... });

While using tokio::runtime::Handle::block_on, I'm facing with:

’Cannot start a runtime from within a runtime. This happens because a function (like block_on) attempted to block the current thread while the thread is being used to drive asynchronous tasks.

Since block_on is try_entering an already entered runtime, therefore I changed to future::executor's to avoid panic in the first place. But as I noted before, future::executor::block_on is also flawed here:

However, this approach is flawed for block_on may block the only thread in tokio, and the future inside won't get a chance to run, therefore hanging forever if the tokio runtime is not a multi-threaded one. (I temporarily change the related test to use #[tokio::test(flavor = "multi_thread", worker_threads = 2)] to avoid hanging).

Do you have any suggestions on this?

I think the only real suggestion is to plumb async all the way through to planning (aka remove the non async API)

The alternate compromise, which you have partly implemented in this PR, is to implement both async and non async versions. This is similar to the approach in the C/C++ filesystem api (props to @nealrichardson for the pointer), which has both having synchronous and asynchronous APIs.

How about this alternative to reduce the scope of this PR? i.e. implement both sync and async, but only use sync API to migrate existing code to the new IO abstraction, then work on async propagation as a fast follow up.

The other thing I was thinking about was what about adding in the ObjectStore interfaces in one PR and then start hooking that up into the rest of the system / rewrite the existing data sources (like Parquet, etc) as separate PRs.

I think @yjshen has done a great job with this PR showing how everything would hook together, but I do feel like this PR is slightly beyond my ability to comprehend given its size and scope.

I am onboard with further reducing the scope by focusing only on the ObjectStore interface :)

alamb · 2021-08-19T19:53:34Z

datafusion/src/datasource/mod.rs

+        collect_statistics: bool,
+    ) -> Result<SourceRootDescriptor> {
+        let mut results: Vec<Result<PartitionedFile>> = Vec::new();
+        futures::executor::block_on(async {


As above I think you can use tokio::runtime::Handle::current().block_on

alamb · 2021-08-19T19:56:41Z

datafusion/src/datasource/mod.rs

+            results.into_iter().collect();
+        let partition_files = partition_results?;
+
+        // build a list of Parquet partitions with statistics and gather all unique schemas


it is strange to me that the collating of partitions doesn't happen in get_source_desc_async -- it seems like get_source_desc would just be doing the adapting of async --> sync code.

Yes, the get_source_desc is used to adapting async to sync to stop propagating async to API.

yjshen · 2021-08-20T08:38:02Z

@alamb

The alternate compromise, which you have partly implemented in this PR, is to implement both async and non async versions. This is similar to the approach in the C/C++ filesystem api (props to @nealrichardson for the pointer), which has both having synchronous and asynchronous APIs.

If I understand you correctly, do you mean I should tell sync and async implementation apart, with two different logics? Instead of the current wrapper way (sync function wrap over async logic.)?

ehenry2 · 2021-08-20T17:36:01Z

I have a question on the ThreadSafeRead trait...is there anything prebuilt (or recommendation) to wrap say a tokio AsyncRead or bytes.Buf to easily implement the get_reader_async function? I see the example for the local filesystem using the FileSource2 to wrap the File, but I'm assuming most remote implementations will approach this function implementation with some kind of in-memory buffer or stream. I had some issues figuring this one out trying to implement this for S3 (I'm still a bit new to rust and lifetimes, etc).

houqp · 2021-08-20T18:18:55Z

I'm assuming most remote implementations will approach this function implementation with some kind of in-memory buffer or stream.

I think this would need to be handled case by case for different remote store client. It would be helpful to share exactly what client API signatures you are trying to use within get_reader_async.

jorgecarleitao · 2021-08-20T22:08:50Z

Could it make sense to write a design doc like @houqp wrote some time ago for the qualified names? Does it feel a sufficiently impactful change to design this a bit before commiting?

alamb · 2021-08-21T10:19:42Z

Could it make sense to write a design doc like @houqp wrote some time ago for the qualified names?

I think a design doc is a great idea @jorgecarleitao -- it would let us make sure some of the larger points are clear and there is consensus (especially around adding async in various places). @yjshen I am happy to singtart draft such a document if you think it would help.

I am personally very interested in getting the ideas in this PR into DataFusion -- I think it is an important architectural step forward and since I think it will directly help IOx (the project I am working on) I can spend non trivial amounts of time working on it

yjshen · 2021-08-21T12:05:56Z

Thank you @houqp @alamb @jorgecarleitao for your great help!

This PR initially contains several functions related to reading data, including the core object store abstraction, a more general scan partitioning abstraction, and some refactoring of parquet scan. At the same time, as I think more and get more valuable input, the scope becomes more extensive. Although I try to maintain PR lean as possible, leaving out some functionality such as JSON/CVS scan, it grows inevitably huge and is hard to review.

I agree we could make the current PR a proof of concept, and I'm happy to break it down into several parts to get the work finally merged.

As for the design doc for the object store API, I can write up a draft proposal first this weekend. Please help to review and revise it when it's available. Thanks again @alamb for offering the help on the doc part :)

yjshen · 2021-08-21T15:53:02Z

I've drafted a design doc here: https://docs.google.com/document/d/1ZEZqvdohrot0ewtTNeaBtqczOIJ1Q0OnX9PqMMxpOF8/edit#. Please help to review it. Thanks!

alamb · 2021-08-22T10:53:28Z

Thanks @yjshen -- the plan you lay out sounds great

alamb

@yjshen one way to avoid the async planning code, might be to have an async version of a ParquetTable TableProvider. Would the following structure work for delta-rs?

(Create a RemoteParquetTable provider outside of Datafusion) <-- async
(Create Execution Context)
(Register Table Providers)
(Create LogicalPlan)
(Create ExecutionPlan)
(Call ExecutionPlan::execute) <-- async
(Stream results back from stream) <-- async

where RemoteParquetTable is something that knows how to interact with the ObjectStore and fetch the appropriate metadata for statistics and schema.

                              Planning these                                           
                              Table Providers                    During                
 ┌───────────────────────┐    results in same                   execution              
 │      (non async)      │     ExecutionPlan                 reads data via            
 │     ParquetTable      │──────┐                              ObjectStore             
 │                       │      │      ┌───────────────────────┐                       
 └───────────────────────┘      │      │        (async)        │      ┌───────────────┐
                                ├─────▶│       existing        │─────▶│  ObjectStore  │
 ┌───────────────────────┐      │      │      ParquetExec      │      └───────────────┘
 │         async         │      │      └───────────────────────┘                       
 │  RemoteParquetTable   │      │                                                      
 │                       │──────┘                                                      
 │  fetches metadata on  │                                                             
 │RemoteParquetTable::new│                                                             
 └───────────────────────┘                                                             
                                                                                       
     Planning Time                         Execution Time

This idea is not as nice as the unified framework you have here but it might allow DF to get to the more unified design incrementally

Today, in my mind the general flow of querying goes like this, and trying to add async to the creation of LogicalPlan or ExecutionPlan is a large change, as you have pointed out

(Create Execution Context)
(Register Table Providers)
(Create LogicalPlan)
(Create ExecutionPlan)
(Call ExecutionPlan::execute) <-- async
(Stream results back from stream) <-- async

yjshen · 2021-08-24T14:10:23Z

@alamb I might be wrong on this: is it possible to not provide a RemoteParquetTable, but provide a RemoteParquetTableBuilder that uses the ObjectStore API on the async listing but build a ParquetTable asynchronously?

By doing this, we may pass async table building logic from planning API to users' hands, during they construct ParquetTable TableProvider. Then they could register ParquetTable using context::register_table(self, table_ref, provider). Does this volatiles the idiomatic async in Rust?

I think of this from the perspective of ballista, even though I'm not quite familiar with the code there, it seems ballista could only serialize/deserialize known typed TableProviders, therefore RemoteParquetTable outside DataFusion might not be preferable? I think Rust doesn't support runtime reflection?

yjshen · 2021-08-24T14:15:00Z

Would the following structure work for delta-rs?

cc @houqp since I'm not familiar with delta-rs.

alamb · 2021-08-24T15:04:53Z

but provide a RemoteParquetTableBuilder that uses the ObjectStore API on the async listing but build a ParquetTable asynchronously?

That is a really neat idea @yjshen - I hadn't thought of that but it sounds very good

Then they could register ParquetTable using context::register_table(self, table_ref, provider). Does this volatiles the idiomatic async in Rust?

Not in my opinion.

I think Rust doesn't support runtime reflection?

That is correct -- Rust doesn't have built in runtime reflection support -- that type of behavior needs to be added in the application logic

houqp · 2021-08-26T04:33:42Z

I also think constructing ParquetTable async before passing to register_table is a good idea. This is how delta-rs implements its daatafusion integration as well.

With regards to ballista table provider protobuf ser/de limitation, I think it's something we need address in the long term, otherwise, it would impossible to support custom table sources in ballista.

yjshen · 2021-08-26T15:30:45Z

As a result of previous discussions on this PR as well as in the design doc (updated according to latest reviews as well). I break down this PR into one dedicated API adding PR #950 and a PartitionedFile abstraction PR #932, and left parquet async integrations as follow-ups.

yjshen · 2021-09-18T11:27:47Z

I'm closing this PR since most of the functionalities in this one come true or will soon get in. I am excited about the changes taking place.

alamb · 2021-09-19T10:30:01Z

Thank you again for all your work in this area @yjshen -- the improvements to DataFusion are amazing!

github-actions bot added the datafusion Changes in the datafusion crate label Aug 2, 2021

houqp added the enhancement New feature or request label Aug 2, 2021

houqp requested review from andygrove, Dandandan, alamb, jorgecarleitao and nevi-me August 2, 2021 05:47

yjshen force-pushed the source-ext branch from 710977b to 79c3242 Compare August 2, 2021 14:18

houqp reviewed Aug 3, 2021

View reviewed changes

alamb reviewed Aug 3, 2021

View reviewed changes

houqp reviewed Aug 5, 2021

View reviewed changes

github-actions bot added the ballista label Aug 9, 2021

Extend source to enable read from remote storage

0821202

yjshen force-pushed the source-ext branch from 5aaa280 to 0821202 Compare August 10, 2021 09:40

rdettai reviewed Aug 10, 2021

View reviewed changes

datafusion/src/datasource/object_store.rs Outdated Show resolved Hide resolved

yjshen added 6 commits August 10, 2021 22:38

fix read

6f59715

deadlock

5545ac7

fix prunning test

b0a353c

fix clippy

42b6f43

fix

9779395

enable shuffle_writer tests

9a8614b

yjshen marked this pull request as ready for review August 11, 2021 06:49

yjshen changed the title ~~Source ext for remote files read~~ Add support for reading remote storage systems Aug 11, 2021

yjshen requested review from alamb and houqp August 11, 2021 07:13

rdettai reviewed Aug 11, 2021

View reviewed changes

fix fmt

f6239b5

yjshen requested review from alamb and houqp August 19, 2021 07:33

ehenry2 mentioned this pull request Aug 19, 2021

S3 Support #907

Closed

alamb reviewed Aug 19, 2021

View reviewed changes

dispanser mentioned this pull request Aug 21, 2021

Support for large data sets (avoiding MemTable) roapi/roapi#62

Closed

alamb reviewed Aug 24, 2021

View reviewed changes

yjshen mentioned this pull request Aug 26, 2021

ObjectStore API to read from remote storage systems #950

Merged

yjshen mentioned this pull request Aug 26, 2021

Table Scan Enhancement Plan #944

Closed

7 tasks

rdettai mentioned this pull request Sep 17, 2021

Make the conversion from logical to physical plan async #1012

Closed

yjshen closed this Sep 18, 2021

Add support for reading remote storage systems #811

Add support for reading remote storage systems #811

Conversation

yjshen commented Aug 2, 2021 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

houqp commented Aug 2, 2021

yjshen commented Aug 2, 2021

houqp left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

rdettai left a comment

Choose a reason for hiding this comment

yjshen commented Aug 11, 2021

yjshen commented Aug 11, 2021

rdettai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yjshen commented Aug 19, 2021 • edited

To which level should I propagate async?

If I take approach 1, how should the sync version function be constructed?

jorgecarleitao commented Aug 19, 2021

yjshen commented Aug 19, 2021

alamb commented Aug 19, 2021

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yjshen commented Aug 20, 2021 • edited

ehenry2 commented Aug 20, 2021 • edited

houqp commented Aug 20, 2021

jorgecarleitao commented Aug 20, 2021

alamb commented Aug 21, 2021 • edited

yjshen commented Aug 21, 2021 • edited

yjshen commented Aug 21, 2021

alamb commented Aug 22, 2021

alamb left a comment

Choose a reason for hiding this comment

yjshen commented Aug 24, 2021 • edited

yjshen commented Aug 24, 2021

alamb commented Aug 24, 2021

houqp commented Aug 26, 2021

yjshen commented Aug 26, 2021

yjshen commented Sep 18, 2021

alamb commented Sep 19, 2021

yjshen commented Aug 2, 2021 •

edited

yjshen commented Aug 19, 2021 •

edited

yjshen commented Aug 20, 2021 •

edited

ehenry2 commented Aug 20, 2021 •

edited

alamb commented Aug 21, 2021 •

edited

yjshen commented Aug 21, 2021 •

edited

yjshen commented Aug 24, 2021 •

edited