ARROW-11935: [C++] Add push generator #9714

pitrou · 2021-03-15T18:16:19Z

A push generator has a producer end which pushes values to a queue, and a consumer end (the generator itself) which yields futures that receive the values pushed by the producer.

pitrou · 2021-03-15T18:16:35Z

@westonpace I would welcome your input on this.

github-actions · 2021-03-15T18:39:12Z

https://issues.apache.org/jira/browse/ARROW-11935

pitrou · 2021-03-15T18:39:21Z

Note the generator could perhaps be made reentrant if there's some use for that.

westonpace

This is a good utility. I added one note about queuing as caution may be needed if this is used for generators that return large blocks of data.

cpp/src/arrow/util/async_generator.h

westonpace · 2021-03-15T19:02:49Z

cpp/src/arrow/util/async_generator.h

+ public:
+  PushGenerator() : state_(std::make_shared<State>()) {}
+
+  void Push(Result<T> result) {


You could maybe check if result is not ok and mark finished to true (potentially even clearing out the result q) and then on future pushes simply return immediately if finished is true. I can see where you question on Zulip came from now. The only disadvantage I can see to this approach is potentially wasted memory keeping blocks around that are invalid.

Indeed, I could do that. The underlying question is: should an error always terminate an async generator? It doesn't seem that obvious to me.

westonpace · 2021-03-15T19:07:34Z

Also, in regards to reentrancy. I don't think there would be any advantage to doing so here because there is no backpressure / connection with the producer.

A push generator has a producer end which pushes values to a queue, and a consumer end (the generator itself) which yields futures that receive the values pushed by the producer.

pitrou · 2021-03-22T17:07:36Z

@westonpace I turned the API on its head so that PushGenerator is really a generator. Also I added comments and the ability to early-close the queue. Can you take a look again?

westonpace

Looks good. Just a few thoughts below but nothing that needs to change.

westonpace · 2021-03-22T19:20:41Z

cpp/src/arrow/util/async_generator.h

+        return;
+      }
+      state_->finished = true;
+      if (state_->consumer_fut.has_value()) {


You could potentially clear the result_q here. I could understand either approach. However, if Close is semantically the same as cancel it would seem you wouldn't want the downstream to keep processing the already generated results.

No, close has nothing to do with cancel. It signals a regular end-of-stream.

westonpace · 2021-03-22T19:28:52Z

cpp/src/arrow/util/iterator_test.cc

+  producer.Close();
+  ASSERT_FINISHES_OK_AND_EQ(IterationTraits<TestInt>::End(), fut);
+  ASSERT_FINISHES_OK_AND_EQ(IterationTraits<TestInt>::End(), gen());
+  ASSERT_FINISHES_OK_AND_EQ(IterationTraits<TestInt>::End(), gen());


I feel like this check might be unnecessary? Can't hurt though.

Perhaps over-cautious :-)

westonpace · 2021-03-22T19:33:06Z

cpp/src/arrow/util/iterator_test.cc

+  }
+  ASSERT_FINISHES_OK_AND_EQ(TestInt{1}, futures[0]);
+  ASSERT_FINISHES_AND_RAISES(Invalid, futures[1]);
+  AssertNotFinished(futures[2]);


Sorry I didn't answer the earlier question (should an error always terminate a generator?). This seems to be your test here. I think from the general async generator concept this would be UB. This possibility is valid. Terminating early would also be valid. Downstream generators should be written to expect this as a possibility and should not rely on errors terminating successive calls automatically.

Which is a long winded way of saying this is valid.

It would also be ok if futures[2] was IterationTraits<TestInt>::End() here.

pitrou · 2021-03-23T13:44:11Z

Travis-CI build: https://travis-ci.com/github/pitrou/arrow/builds/220827493

pitrou · 2021-03-23T13:44:46Z

CI failure is unrelated, will merge.

A push generator has a producer end which pushes values to a queue, and a consumer end (the generator itself) which yields futures that receive the values pushed by the producer. Closes apache#9714 from pitrou/ARROW-11935-push-gen Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

github-actions bot added the Component: C++ label Mar 15, 2021

westonpace reviewed Mar 15, 2021

View reviewed changes

pitrou added 3 commits March 22, 2021 16:00

ARROW-11935: [C++] Add push generator

cb3185d

A push generator has a producer end which pushes values to a queue, and a consumer end (the generator itself) which yields futures that receive the values pushed by the producer.

Add docstrings

05bf7a1

Turn the API on its head, allow early close

17ee984

pitrou force-pushed the ARROW-11935-push-gen branch from 8c1a0bd to 17ee984 Compare March 22, 2021 16:03

westonpace approved these changes Mar 22, 2021

View reviewed changes

pitrou closed this in 1b4d73f Mar 23, 2021

pitrou deleted the ARROW-11935-push-gen branch March 23, 2021 13:45

asfimport mentioned this pull request Mar 23, 2021

[C++] Add push generator #27771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-11935: [C++] Add push generator #9714

ARROW-11935: [C++] Add push generator #9714

pitrou commented Mar 15, 2021 •

edited

pitrou commented Mar 15, 2021

github-actions bot commented Mar 15, 2021

pitrou commented Mar 15, 2021

westonpace left a comment

westonpace Mar 15, 2021

pitrou Mar 16, 2021

westonpace commented Mar 15, 2021

pitrou commented Mar 22, 2021

westonpace left a comment

westonpace Mar 22, 2021

pitrou Mar 22, 2021

westonpace Mar 22, 2021

pitrou Mar 22, 2021

westonpace Mar 22, 2021

pitrou commented Mar 23, 2021

pitrou commented Mar 23, 2021

ARROW-11935: [C++] Add push generator #9714

ARROW-11935: [C++] Add push generator #9714

Conversation

pitrou commented Mar 15, 2021 • edited

pitrou commented Mar 15, 2021

github-actions bot commented Mar 15, 2021

pitrou commented Mar 15, 2021

westonpace left a comment

Choose a reason for hiding this comment

westonpace Mar 15, 2021

Choose a reason for hiding this comment

pitrou Mar 16, 2021

Choose a reason for hiding this comment

westonpace commented Mar 15, 2021

pitrou commented Mar 22, 2021

westonpace left a comment

Choose a reason for hiding this comment

westonpace Mar 22, 2021

Choose a reason for hiding this comment

pitrou Mar 22, 2021

Choose a reason for hiding this comment

westonpace Mar 22, 2021

Choose a reason for hiding this comment

pitrou Mar 22, 2021

Choose a reason for hiding this comment

westonpace Mar 22, 2021

Choose a reason for hiding this comment

pitrou commented Mar 23, 2021

pitrou commented Mar 23, 2021

pitrou commented Mar 15, 2021 •

edited