I'm trying to write a custom collection class, and am struggling to write an iterator that is competitive with the iterator of a built-in tuple.
The benchmark:
timeit.timeit(lambda: sum(1 for my_tuple in iter_tuples() for item in my_tuple))
Previously iter_tuples() was yielding built-in tuples of varying lengths (but usually short: 90% of tuples have length 3 or less).
Now I'm trying to use my own cdef class for the tuples, but I can't get this benchmark anywhere near the original performance.
If my collection class uses yield inside its __iter__ method, that results in code 30% slower than the built-in tuple.
If my collection class uses a custom cdef class for the iterator, that results in code 50% slower than the built-in tuple.
@cython.final
@cython.freelist(4)
cdef class NodeTupleIterator:
cdef NodeTuple _c
cdef size_t _index
def __cinit__(self, NodeTuple c not None):
self._c = c
def __iter__(self):
return self
def __next__(self):
if self._index == self._tuple._size:
raise StopIteration
node = self._c._get(self._index)
self._index += 1
return node
def __length_hint__(self):
return self._c._size - self._index
Profiling shows that __next__ spends 33% of its time in __Pyx_Raise and another 40% in __Pyx_Add_Traceback.
But none of that is necessary. On the C level, tp_iternext can indicate the end of iteration by simply returning NULL without setting any exception.
Is this somehow possible from Cython? Should Cython optimize a raise StopIteration within __next__ methods that way?
Currently it seems like my only solution is to avoid Cython for the iterator class :(
By the way, I've verified that the benchmark with the builtin tuple actually calls tupleiter_next. There doesn't seem to be any special case that gives builtin tuples an inherent advantage; the problem just seems to be the overhead of Cython generator machinery (for yield) or exception-handling machinery (for __next__).
I'm trying to write a custom collection class, and am struggling to write an iterator that is competitive with the iterator of a built-in tuple.
The benchmark:
Previously
iter_tuples()was yielding built-in tuples of varying lengths (but usually short: 90% of tuples have length 3 or less).Now I'm trying to use my own
cdef classfor the tuples, but I can't get this benchmark anywhere near the original performance.If my collection class uses
yieldinside its__iter__method, that results in code 30% slower than the built-in tuple.If my collection class uses a custom
cdef classfor the iterator, that results in code 50% slower than the built-in tuple.Profiling shows that
__next__spends 33% of its time in__Pyx_Raiseand another 40% in__Pyx_Add_Traceback.But none of that is necessary. On the C level,
tp_iternextcan indicate the end of iteration by simply returningNULLwithout setting any exception.Is this somehow possible from Cython? Should Cython optimize a
raise StopIterationwithin__next__methods that way?Currently it seems like my only solution is to avoid Cython for the iterator class :(
By the way, I've verified that the benchmark with the builtin tuple actually calls
tupleiter_next. There doesn't seem to be any special case that gives builtin tuples an inherent advantage; the problem just seems to be the overhead of Cython generator machinery (foryield) or exception-handling machinery (for__next__).