each instance of a comprehension variable should be different #1571

Closed
StefanKarpinski opened this Issue Nov 19, 2012 · 24 comments

Projects

None yet

4 participants

@StefanKarpinski
The Julia Language member

Currently we have this:

julia> map(f->f(), [ ()->i for i=1:10 ])
10-element Int64 Array:
 10
 10
 10
 10
 10
 10
 10
 10
 10
 10

I think this should definitely evaluate to [1:10]. I also think that's what for loops should do, but in the case of comprehensions, I think the case is much more powerful.

I've tagged this with "parallel" because comprehensions are a very natural place for implicit parallelism in the language, but that essentially would require the semantics I'm proposing in order to work correctly. Note that I'm talking about evaluating a normal comprehension using multiple threads, rather than producing a DArray distributed across multiple machines.

@toivoh
The Julia Language member

+1

@StefanKarpinski
The Julia Language member

Note that I'm talking about evaluating a normal comprehension using multiple threads, rather than producing a DArray distributed across multiple machines.

This distinction suggests to me that the @parallel macro should maybe be renamed to @distributed to allow for possible single-process parallelism, e.g. automatic threaded comprehensions and for loops.

@JeffBezanson
The Julia Language member
@staticfloat
The Julia Language member

This is tangentially related to the discussion on Parellel Callbacks in C on the list back in August, specifically Stefan's post regarding having threads with completely independent global state. This pretty much eliminates the (user-facing) differences between multithreading and multiprocessing, and is pretty much the model I reference in my link to "Writing Perfect Multithreaded Programs with ZMQ" further down the discussion.

But this is discussing more the difference between distributed and parallel (or lack thereof), not the instance at hand.

@JeffBezanson JeffBezanson was assigned Nov 28, 2012
@toivoh
The Julia Language member

I think it would make sense to extend this change to all variables that are local to a for loop/comprehension.
I.e., the change should apply to

map(f->f(), [ (j=i; ()->i) for i=1:10 ])

just as well as map(f->f(), [ ()->i for i=1:10 ]).

@StefanKarpinski
The Julia Language member

That seems like the best semantics to me as well. I'm not sure how hard it will be to implement though.

@JeffBezanson
The Julia Language member

If all loop-local variables are fresh on each iteration, then this loop won't work:

for i = 1:n
    if i == 1
        x = 1
    else
        x = x+1  # x is undefined
    end
end

In parallel loops and comprehensions, it's reasonable to disallow dependence on previous iterations, but this seems really strict for ordinary for loops. And might break a lot of code. Of course, you can still do reductions if the reduction variable is initialized outside the loop.

@StefanKarpinski
The Julia Language member

This example seems pretty weird to me – I would never write a loop like that, but maybe other people do. I was thinking that only the loop variable i would be fresh on each iteration. Does that wreak havoc with our scoping rules?

@StefanKarpinski
The Julia Language member

Also, this example currently works but is useless since the scope of x is limited to the for loop – you can't get its value outside of the loop. The only way you can get its value outside of the loop is if it was declared or assigned outside of the loop, in which case this issue doesn't exist since it wouldn't be loop local and therefore wouldn't be fresh on each iteration.

@JeffBezanson
The Julia Language member

I agree my example is weird. It's not totally useless though since I could imagine using the value of x inside the loop.

It's easy to make only i fresh. The current behavior of let is that in let x=0,y=1; ...; end only x and y are "new" variables, and the rest is an ordinary scope block. The idea of this was to give precise control over which variables are new. But I could imagine making new bindings for all local variables in a let, or perhaps in every scope block.

@StefanKarpinski
The Julia Language member

I prefer the precise approach. Otherwise, if I'm understanding correctly, there would be no difference between

let x = 1
  # ...
end

and

let
  x = 1
  # ...
end

Which seems weird. Why even bother with the first one in that case?

@JeffBezanson
The Julia Language member

There is one obscure difference: in let x = y, y is evaluated in the enclosing scope, not in the new scope that contains x. let x=x is meaningful.

@StefanKarpinski
The Julia Language member

Right. I think I'd still prefer the precise approach where the local variables in the let block are normal locals.

@JeffBezanson
The Julia Language member

That might be ok, but we would have

julia> map(f->f(), [ ()->i for i=1:3 ])
3-element Int64 Array:
 1
 2
 3

julia> map(f->f(), [ (m=i; ()->m) for i=1:3 ])
3-element Int64 Array:
 3
 3
 3

Or, if not in comprehensions, then in this case:

i = 1
a = cell(3)
while i <= 3
    let j = i
        m = j
        a[i] = ()->m
    end
    i += 1
end
a[1]()  # => 3
@StefanKarpinski
The Julia Language member

Yeah, it's a tough call. For comprehensions it is pretty clearly more intuitive for m to be "new" in each iteration. What if we had new bindings for every scope block? That would certainly simplify things because there would only be one behavior to learn. I don't think it's unreasonable to require that a variable be scoped outside of a loop in order to be used across loop iterations. In most use cases you need that anyway.

@JeffBezanson
The Julia Language member

Yes, I'm really starting to think every scope block should allocate new bindings for its variables every time it is entered. Jameson agrees too.

@StefanKarpinski
The Julia Language member

I think that's the least gotcha behavior by far and if we can do it without performance issues, we should. Simply avoiding the entire if issue of when there are new bindings and when there aren't is a big win.

@JeffBezanson JeffBezanson added a commit that referenced this issue Aug 1, 2013
@JeffBezanson JeffBezanson get sysimg building with new binding behavior (ref #1571)
the most problematic change was forcing for loop variables to be loop-local,
instead of reusing existing local vars. for example

    local i
    for i = 1:2
    end
    return i

currently works and returns 2.

for now, the behavior is to introduce new variables for each iteration unless
we are overwriting an existing local. comprehensions, however, always
introduce fresh variables and never overwrite existing ones.
905adb2
@JeffBezanson
The Julia Language member

To clarify that commit message, for now I'm keeping the behavior where for i can overwrite an existing i. This is technically a different issue from whether any new variables are per-loop or per-iteration. I think this is a good compromise for minimizing code breakage.

@StefanKarpinski
The Julia Language member

Yes, I like that approach. It seems the least surprising.

@JeffBezanson
The Julia Language member

jb/newvars branch merged.

@StefanKarpinski
The Julia Language member

I'm unreasonably excited about this.

@StefanKarpinski
The Julia Language member

So, I immediately set out trying to break this and mostly everything is cool, but this seems inconsistent:

julia> for i = 1:5
           if i == 1
               x = 1
           else
               x = x+1  # x is undefined
           end
           println(x)
       end
1
2
3
4
5

julia> a = {}
0-element Any Array

julia> for i = 1:5
           if i == 1
               x = 1
           else
               x = x+1  # x is undefined
           end
           println(x)
           push!(a,()->x)
       end
1
ERROR: x not defined
 in anonymous at no file:5

Of course, I never would have thought to even write an evil loop like this except that you used it as an example above, but it seems like these should either both succeed or both fail.

@JeffBezanson
The Julia Language member

There is a small change I can make to fix this. So far I've left it this way (1) to ease the transition, since less code will break, and (2) because fixing it requires adding an enormous number of (newvar x) nodes to catch every possible place where a variable might need to be re-nulled (system image +300K). These are only needed in cases where a variable might be accessed undefined, which is quite rare, but we don't know that until much later. Some work is required to cut down on these.

@StefanKarpinski
The Julia Language member

Honestly, it is probably fine to leave it this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment