Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of iterables in generator expressions uses wrong scope #1159

Closed
robertwb opened this issue Nov 25, 2010 · 12 comments · Fixed by #4254
Closed

Evaluation of iterables in generator expressions uses wrong scope #1159

robertwb opened this issue Nov 25, 2010 · 12 comments · Fixed by #4254

Comments

@robertwb
Copy link
Contributor

robertwb commented Nov 25, 2010

In Python, this works:

it = (1,2,3)
l = list(it for it in it)

In current Cython, this raises an error at runtime as the iterable inside of the generator expression refers to the loop target variable, not the name in the outer scope. (So the iterable happens to be None.)

The name of the iterable (or rather the entire iterable expression) must be evaluated in the outer scope.

Actually, it's even different - CPython simply pre-evaluates the outermost iterable when creating the generator. Example (Py3.2):

>>> x = [range(10)]
>>> g = (i for x in x for i in x)
>>> list(g)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> x = [range(10)]
>>> g = (i for i in range(5) for x in x)
>>> list(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'x' referenced before assignment

So, basically, the outermost iterable should be evaluated into a closure variable of the generator, instead of letting the generator evaluate it itself. This:

>>> x = [range(10)]
>>> g = (i for x in x for i in x)

should be transformed into

x = [range(10)]

def _gen(_outer_it):
    for x in _outer_it:
        for i in x:
            yield i

g = _gen(x)

Note that this also applies to list comprehensions with language_level=3.

Migrated from http://trac.cython.org/ticket/600

@robertwb
Copy link
Contributor Author

scoder changed description from

In Python, this works:

it = (1,2,3)
l = list(it for it in it)

In current Cython, this raises an error at runtime as the iterable inside of the generator expression refers to the loop target variable, not the name in the outer scope. (So the iterable happens to be None.)

The name of the iterable must be looked up before creating the entry of the target variable in the inner scope.

to

In Python, this works:

it = (1,2,3)
l = list(it for it in it)

In current Cython, this raises an error at runtime as the iterable inside of the generator expression refers to the loop target variable, not the name in the outer scope. (So the iterable happens to be None.)

The name of the iterable (or rather the entire iterable expression) must be evaluated in the outer scope.
commented

@robertwb
Copy link
Contributor Author

scoder changed cc to vitja
description from

In Python, this works:

it = (1,2,3)
l = list(it for it in it)

In current Cython, this raises an error at runtime as the iterable inside of the generator expression refers to the loop target variable, not the name in the outer scope. (So the iterable happens to be None.)

The name of the iterable (or rather the entire iterable expression) must be evaluated in the outer scope.

to

In Python, this works:

it = (1,2,3)
l = list(it for it in it)

In current Cython, this raises an error at runtime as the iterable inside of the generator expression refers to the loop target variable, not the name in the outer scope. (So the iterable happens to be None.)

The name of the iterable (or rather the entire iterable expression) must be evaluated in the outer scope.
commented

Actually, it's even different - CPython simply pre-evaluates the outermost iterable when creating the generator. Example (Py3.2):

>>> x = [g = (i for x in x for i in x)
>>> list(g)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9](range(10)]
>>>)

>>> x = [g = (i for i in range(5) for x in x)
>>> list(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'x' referenced before assignment

So, basically, the outermost iterable should be evaluated into a closure variable of the generator, instead of letting the generator evaluate it itself. This:

x = [range(10)](range(10)]
>>>)
g = (i for x in x for i in x)

should be transformed into

x = [range(10)]

def _gen(_outer_it):
    for x in _outer_it:
        for i in x:
            yield i

g = _gen(x)

@robertwb
Copy link
Contributor Author

scoder commented

Another test that currently works differently in Python than in Cython:

>>> def seff():
...     print("TEST")
...     return range(10)
... 
>>> g = (i for i in seff())
TEST
>>> list(g)
[1, 2, 3, 4, 5, 6, 7, 8, 9](0,)
>>> g = (i for i in (v for v in seff()))
TEST

@robertwb
Copy link
Contributor Author

robertwb commented Oct 2, 2013

scoder changed priority from major to critical
commented

@robertwb
Copy link
Contributor Author

scoder changed milestone from wishlist to 0.20
owner to scoder
priority from critical to major
status from new to assigned
commented

Fixed here:

87a9bc2

@robertwb
Copy link
Contributor Author

scoder changed priority from major to critical
resolution to fixed
status from assigned to closed
commented

@robertwb
Copy link
Contributor Author

scoder changed milestone from 0.20 to wishlist
resolution from fixed to empty
status from closed to reopened
commented

@robertwb robertwb added this to the wishlist milestone Aug 16, 2016
@scoder scoder changed the title lookup of iterables in genexpr uses wrong scope Evaluation of iterables in generator expressions uses wrong scope Sep 7, 2017
@embray
Copy link
Contributor

embray commented Sep 7, 2017

@scoder thanks for the cleanup--I see now how the bug described in this ticket is still a problem too. I'll see if I can do something about it...

@embray
Copy link
Contributor

embray commented Sep 7, 2017

Apparently the original attempt to fix this was also reverted in a3d4461, which is part of my confusion as well.

@scoder
Copy link
Contributor

scoder commented Sep 7, 2017

I'd be happy to have more people look into this, but be "warned", it's not entirely trivial. Cython looks at the iterable in several places in order to optimise loops. Moving the iterable out of the generator too early will prevent these optimisations. Moving it out too late will lead to incorrect declarations inside and/or outside of the generator. This is one of the bugs that have been languishing for a bit of a reason.

@embray
Copy link
Contributor

embray commented Sep 7, 2017

Thanks for the admonition. That will help me understand better what the problem is.

@scoder
Copy link
Contributor

scoder commented Apr 30, 2019

Nominating this as a release blocker for Cython 3.0 to keep it in sight, although it's not really a breaking change.

da-woods added a commit that referenced this issue Aug 8, 2022
Fixes #1159.

This should be a simpler way of dealing with the long-standing generator and comprehension scoping issues. Unlike the previous PR it is fairly non-intrusive (i.e. doesn't wrap everything in `ResultRefNodes`) and therefore the changes should be more reasonable. Instead it:

* Gives `IteratorNode` a scope (pointed at the outer scope rather than the scope of the generator/comprehension)
* Tags each `ExprNode` inside the generator iterator sequence with a number, and uses those tags later to work out what needs to be converted to an argument for the generator. If an `ExprNode` is optimized out then that's fine - one of its children will have been tagged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants