Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Create reusable Iterator<T> interface #21956

Closed
asfimport opened this issue Jun 4, 2019 · 14 comments
Closed

[C++] Create reusable Iterator<T> interface #21956

asfimport opened this issue Jun 4, 2019 · 14 comments

Comments

@asfimport
Copy link

asfimport commented Jun 4, 2019

We have various iterator-like classes. I envision a reusable interface like

template <typename T>
class Iterator {
 public:
  virtual ~Iterator() = default;
  virtual Status Next(T* out) = 0;
}

Reporter: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-5508. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Liya Fan / @liyafan82:
@wesm, thanks for the good point.

What is the standard way to know if there is a next element in the iterator?

@asfimport
Copy link
Author

Wes McKinney / @wesm:
I added a prototype for this here, see the comment on the Next method

https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/iterator.h

I notice that I missed the virtual destructor.

If this is the preferred Iterator interface, then I would suggest refactoring one or two of our ad hoc iterator classes in the codebase to use arrow::Iterator<T> as a path to resolving this issue. Thoughts?

@asfimport
Copy link
Author

Ben Kietzman / @bkietz:
Why not just use the Range concept? We already have arrow::util::LazyRange which provides the functionality you want, I think. If you prefer the Iterator interface as you have written it then we should include an adapter so that range for loops are easy

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Let's wait until StatusOr lands and evaluate options. It might be nice to have Next return StatusOr<T>. Ultimately this is about having an abstract base type like std::unique_ptr<Iterator<T>> to return from functions that deal in streams

@asfimport
Copy link
Author

Liya Fan / @liyafan82:
I still have not figured out how to check the end of the iterator by the Next method:

  1. If it is through the return value, this could be inefficient - Status is a class, and there can be object copy when returning an Status object.

  2. If it is through the method parameter, I think the parameter type should be T*& or T** ?

     

@asfimport
Copy link
Author

Ben Kietzman / @bkietz:
3. A separate out parameter of type bool*

template <typename T>
class Iterator {
 public:
  virtual ~Iterator() = default;
  virtual Status Next(T* value, bool* done) = 0;
}

@asfimport
Copy link
Author

Wes McKinney / @wesm:
@pitrou @bkietz @fsaintjacques is the Iterator<T> in the codebase now satisfactory?

@asfimport
Copy link
Author

Francois Saint-Jacques / @fsaintjacques:
My take after implementing MapIterator, FlattenIterator and using it heavily in the dataset code.

  1. T must be of pointer type (or support assignment/comparison of nullptr). Iterator completion is signaled by assigning T\* out to nullptr.

  2. Due to previous point, the iterator may never yield nullptr as a valid value.

  3. The interface forces consuming a value to know if it's empty, i.e. there's no Done()/HasNext(). This can lead to odd consumption.

  4. I question the user of Status as a return code, maybe we should have a specialized FailableIterator<T> : Iterator<Result<T>> for the same effect.

    The first and second point could be tackled by returning Option<T> (Result wouldn't work because we can't use Status::OK() as a sentinel-completion value). The third is annoying for streaming iterators (when there's no way to know completion without side effect), since the iterator itself must consume on Done() call and cache the result. I think I prefer putting the onus on the iterator implementor than the user of the interface.

@asfimport
Copy link
Author

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
We could also take a page from Python and have a special Status::StopIteration to signal completion.

I also think it's essential to be able to return errors - either via Status or Result<T>.

As for signaling completion without consuming a value, well, the problem is that not all iterators may support that. Is there a use case where this matters?

@asfimport
Copy link
Author

Micah Kornfield / @emkornfield:
My only comment on this is might pay to sketch out what iteration looks like in each case. 

 

Using StopIteration and Result I think you get something like:

while (RETURN_IF_NOT_STOPPED_ERROR(v,  iter.next()) {

    doSomethingWith(v)

}

Not sure if the macros to make that nice are even vaible.

 

"As for signaling completion without consuming a value, well, the problem is that not all iterators may support that. Is there a use case where this matters?"

I think in many cases this can be solved with a "peek" style iterator?

 

 

@asfimport
Copy link
Author

Ben Kietzman / @bkietz:
I think the existing interface is sufficient for our current use cases.

WRT easier iteration, we currently have Iterator<T>::Visit:

ASSERT_OK(iter.Visit([&](T v) {
  doSomethingWith(v);
  return Status::OK();
}));

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Moved out of 0.15.0 as it seems like there is more thinking to do on this

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Is there something left to do here? It seems the Iterator<T> interface and its Visit method are sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant