Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advancing iterator in range loop invalidates previously dereferenced values for input iterators in C++ #3055

Open
ozars opened this issue Jul 26, 2019 · 1 comment

Comments

@ozars
Copy link

commented Jul 26, 2019

I have a user-defined class implementing C++ input iterator requirements. It skips an element in the beginning and prints last element twice when I iterate over it with Cython's range based loop. Looking into produced code, I realized iterator value is increased before the loop body gets executed. Incrementing is done right after dereferenced iterator value is copied into a temporary. However, incrementing the iterator invalidates previously dereferenced values for input iterators. In my case, it was trying to parse next node in a stream.

Here is a toy example for demonstration:

# distutils: language = c++

from cython.operator cimport dereference as deref, preincrement as preinc

cdef extern from *:
    """
    //#define PRINT() std::cout << __PRETTY_FUNCTION__ << std::endl
    #define PRINT()
    #include<iostream>
    struct CountDown {
        struct Iterator {
            CountDown* ptr;
            Iterator() = default;
            Iterator(CountDown* ptr) : ptr(ptr) {}
            Iterator& operator++() { PRINT(); ptr->count--; return *this; }
            Iterator& operator++(int) { PRINT(); ptr->count--; return *this; }
            const int* operator*() { return &ptr->count; }
            bool operator!=(const Iterator&) { PRINT(); return ptr->count > 0; }
        };
        int count;
        CountDown() = default;
        CountDown(int count) : count(count) {}
        Iterator begin() { PRINT(); return Iterator(this); }
        Iterator end() { PRINT(); return Iterator(); }
    };
    """
    cdef cppclass CountDown:
        cppclass Iterator:
            Iterator()
            Iterator operator++()
            Iterator operator++(int)
            const int* operator*()
            bint operator!=(Iterator)
        CountDown()
        CountDown(int count)
        Iterator begin()
        Iterator end()

cdef countdown_range():
    cdef CountDown cd = CountDown(5)
    cdef const int* num
    for num in cd:
        print(deref(num))

cdef countdown_expected():
    cdef CountDown cd = CountDown(5)
    cdef CountDown.Iterator it = cd.begin()
    while it != cd.end():
        print(deref(deref(it)))
        it = preinc(it)

print("Actual output:")
countdown_range()
print("Expected output:")
countdown_expected()

Output:

 ~/tmp/cyissue  python3 -c "import example"
Actual output:
4
3
2
1
0
Expected output:
5
4
3
2
1

Here is the related part in the produced code:

  /* "example.pyx":42
 *     cdef CountDown cd = CountDown(5)
 *     cdef const int* num
 *     for num in cd:             # <<<<<<<<<<<<<<
 *         print(deref(num))
 * 
 */
  __pyx_t_1 = __pyx_v_cd.begin();
  for (;;) {
    if (!(__pyx_t_1 != __pyx_v_cd.end())) break;
    __pyx_t_2 = *__pyx_t_1;
    ++__pyx_t_1;
    __pyx_v_num = __pyx_t_2;

This isn't a crucial feature since it can be implemented without range-based loop as in the above example, yet this was quite surprising for me and it took some time to figure this out, so I decided to open this issue.

@ozars ozars changed the title Advancing iterator before dereferencing invalidates input iterators in C++ Advancing iterator in range loop invalidates previously dereferenced values for input iterators in C++ Jul 26, 2019

@ozars

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

Below part is where this loop translation happens AFAICS:

def generate_execution_code(self, code):
code.mark_pos(self.pos)
old_loop_labels = code.new_loop_labels()
self.iterator.generate_evaluation_code(code)
code.putln("for (;;) {")
self.item.generate_evaluation_code(code)
self.target.generate_assignment_code(self.item, code)
self.body.generate_execution_code(code)
code.mark_pos(self.pos)
code.put_label(code.continue_label)
code.putln("}")
break_label = code.break_label
code.set_loop_labels(old_loop_labels)

This translates for s in seq: loop_body expression to something like:

it = iter(seq)
while True:
   s = next(it) or break loop
   loop_body

whereas correct interpretation for C++ should have been something like:

it = seq.begin()
while True:
   if it is seq.end() break loop
   s = *it
   loop_body
   ++it

So... It looks like this issue is caused by a subtle difference between semantics of python's next and C++'s iterators. In python next does what operator++ and operator* together do in C++. I'm not sure how this can be fixed (splitting NextNode into two pieces for C++? Perhaps introducing a ForNode to split C++ implementation altogether?). Given current implementation works fine for the vast majority of cases, it may not worth the effort though, but it may be useful to note this quirk somewhere in the documentation at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.