[C++] Create reusable Iterator<T> interface #21956

asfimport · 2019-06-04T19:39:56Z

We have various iterator-like classes. I envision a reusable interface like

template <typename T>
class Iterator {
 public:
  virtual ~Iterator() = default;
  virtual Status Next(T* out) = 0;
}

Reporter: Wes McKinney / @wesm

Related issues:

[C++] Refactor Iterator to a type erased handle (is superceded by)

_{Note: This issue was originally created as ARROW-5508. Please see the migration documentation for further details.}

asfimport · 2019-06-05T01:39:39Z

Liya Fan / @liyafan82:
@wesm, thanks for the good point.

What is the standard way to know if there is a next element in the iterator?

asfimport · 2019-06-13T19:57:42Z

Wes McKinney / @wesm:
I added a prototype for this here, see the comment on the Next method

https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/iterator.h

I notice that I missed the virtual destructor.

If this is the preferred Iterator interface, then I would suggest refactoring one or two of our ad hoc iterator classes in the codebase to use arrow::Iterator<T> as a path to resolving this issue. Thoughts?

asfimport · 2019-06-14T14:44:01Z

Ben Kietzman / @bkietz:
Why not just use the Range concept? We already have arrow::util::LazyRange which provides the functionality you want, I think. If you prefer the Iterator interface as you have written it then we should include an adapter so that range for loops are easy

asfimport · 2019-06-14T19:12:40Z

Wes McKinney / @wesm:
Let's wait until StatusOr lands and evaluate options. It might be nice to have Next return StatusOr<T>. Ultimately this is about having an abstract base type like std::unique_ptr<Iterator<T>> to return from functions that deal in streams

asfimport · 2019-06-19T10:49:36Z

Liya Fan / @liyafan82:
I still have not figured out how to check the end of the iterator by the Next method:

If it is through the return value, this could be inefficient - Status is a class, and there can be object copy when returning an Status object.
If it is through the method parameter, I think the parameter type should be T*& or T** ?

asfimport · 2019-06-19T15:00:18Z

Ben Kietzman / @bkietz:
3. A separate out parameter of type bool*

template <typename T>
class Iterator {
 public:
  virtual ~Iterator() = default;
  virtual Status Next(T* value, bool* done) = 0;
}

asfimport · 2019-08-22T22:47:32Z

Wes McKinney / @wesm:
@pitrou @bkietz @fsaintjacques is the Iterator<T> in the codebase now satisfactory?

asfimport · 2019-08-23T01:17:32Z

Francois Saint-Jacques / @fsaintjacques:
My take after implementing MapIterator, FlattenIterator and using it heavily in the dataset code.

T must be of pointer type (or support assignment/comparison of nullptr). Iterator completion is signaled by assigning T\* out to nullptr.
Due to previous point, the iterator may never yield nullptr as a valid value.
The interface forces consuming a value to know if it's empty, i.e. there's no Done()/HasNext(). This can lead to odd consumption.
I question the user of Status as a return code, maybe we should have a specialized FailableIterator<T> : Iterator<Result<T>> for the same effect.

The first and second point could be tackled by returning Option<T> (Result wouldn't work because we can't use Status::OK() as a sentinel-completion value). The third is annoying for streaming iterators (when there's no way to know completion without side effect), since the iterator itself must consume on Done() call and cache the result. I think I prefer putting the onus on the iterator implementor than the user of the interface.

asfimport · 2019-08-26T18:08:32Z

Francois Saint-Jacques / @fsaintjacques:
ping for input: @pitrou @bkietz [~emkornfield@gmail.com]

asfimport · 2019-08-26T18:14:39Z

Antoine Pitrou / @pitrou:
We could also take a page from Python and have a special Status::StopIteration to signal completion.

I also think it's essential to be able to return errors - either via Status or Result<T>.

As for signaling completion without consuming a value, well, the problem is that not all iterators may support that. Is there a use case where this matters?

asfimport · 2019-08-28T08:15:35Z

Micah Kornfield / @emkornfield:
My only comment on this is might pay to sketch out what iteration looks like in each case.

Using StopIteration and Result I think you get something like:

while (RETURN_IF_NOT_STOPPED_ERROR(v, iter.next()) {

doSomethingWith(v)

}

Not sure if the macros to make that nice are even vaible.

"As for signaling completion without consuming a value, well, the problem is that not all iterators may support that. Is there a use case where this matters?"

I think in many cases this can be solved with a "peek" style iterator?

asfimport · 2019-08-28T13:24:05Z

Ben Kietzman / @bkietz:
I think the existing interface is sufficient for our current use cases.

WRT easier iteration, we currently have Iterator<T>::Visit:

ASSERT_OK(iter.Visit([&](T v) {
  doSomethingWith(v);
  return Status::OK();
}));

asfimport · 2019-09-06T21:06:50Z

Wes McKinney / @wesm:
Moved out of 0.15.0 as it seems like there is more thinking to do on this

asfimport · 2019-12-04T19:32:55Z

Antoine Pitrou / @pitrou:
Is there something left to do here? It seems the Iterator<T> interface and its Visit method are sufficient.

asfimport closed this as completed Aug 29, 2022

asfimport added this to the 0.16.0 milestone Jan 11, 2023

asfimport mentioned this issue Jan 11, 2023

[C++] Refactor Iterator to a type erased handle #16880

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Create reusable Iterator<T> interface #21956

[C++] Create reusable Iterator<T> interface #21956

asfimport commented Jun 4, 2019 •

edited

asfimport commented Jun 5, 2019

asfimport commented Jun 13, 2019

asfimport commented Jun 14, 2019

asfimport commented Jun 14, 2019

asfimport commented Jun 19, 2019

asfimport commented Jun 19, 2019

asfimport commented Aug 22, 2019

asfimport commented Aug 23, 2019

asfimport commented Aug 26, 2019

asfimport commented Aug 26, 2019

asfimport commented Aug 28, 2019

asfimport commented Aug 28, 2019

asfimport commented Sep 6, 2019

asfimport commented Dec 4, 2019

[C++] Create reusable Iterator<T> interface #21956

[C++] Create reusable Iterator<T> interface #21956

Comments

asfimport commented Jun 4, 2019 • edited

Related issues:

asfimport commented Jun 5, 2019

asfimport commented Jun 13, 2019

asfimport commented Jun 14, 2019

asfimport commented Jun 14, 2019

asfimport commented Jun 19, 2019

asfimport commented Jun 19, 2019

asfimport commented Aug 22, 2019

asfimport commented Aug 23, 2019

asfimport commented Aug 26, 2019

asfimport commented Aug 26, 2019

asfimport commented Aug 28, 2019

asfimport commented Aug 28, 2019

asfimport commented Sep 6, 2019

asfimport commented Dec 4, 2019

asfimport commented Jun 4, 2019 •

edited