Skip to content

Performance of __next__ (efficiently signaling end of iteration) #3447

@dgrunwald

Description

@dgrunwald

I'm trying to write a custom collection class, and am struggling to write an iterator that is competitive with the iterator of a built-in tuple.

The benchmark:

timeit.timeit(lambda: sum(1 for my_tuple in iter_tuples() for item in my_tuple))

Previously iter_tuples() was yielding built-in tuples of varying lengths (but usually short: 90% of tuples have length 3 or less).
Now I'm trying to use my own cdef class for the tuples, but I can't get this benchmark anywhere near the original performance.

If my collection class uses yield inside its __iter__ method, that results in code 30% slower than the built-in tuple.
If my collection class uses a custom cdef class for the iterator, that results in code 50% slower than the built-in tuple.

@cython.final
@cython.freelist(4)
cdef class NodeTupleIterator:
    cdef NodeTuple _c
    cdef size_t _index

    def __cinit__(self, NodeTuple c not None):
        self._c = c

    def __iter__(self):
        return self

    def __next__(self):
        if self._index == self._tuple._size:
            raise StopIteration
        node = self._c._get(self._index)
        self._index += 1
        return node

    def __length_hint__(self):
        return self._c._size - self._index

Profiling shows that __next__ spends 33% of its time in __Pyx_Raise and another 40% in __Pyx_Add_Traceback.
But none of that is necessary. On the C level, tp_iternext can indicate the end of iteration by simply returning NULL without setting any exception.
Is this somehow possible from Cython? Should Cython optimize a raise StopIteration within __next__ methods that way?

Currently it seems like my only solution is to avoid Cython for the iterator class :(

By the way, I've verified that the benchmark with the builtin tuple actually calls tupleiter_next. There doesn't seem to be any special case that gives builtin tuples an inherent advantage; the problem just seems to be the overhead of Cython generator machinery (for yield) or exception-handling machinery (for __next__).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions