Fix each() to be attached to the operator, not the data #142

schwern · 2010-06-23T14:32:50Z

each() is dangerous because the iterator is attached to the data being operated on. For example...

while( my($k,$v) = each %h ) {
    foo(\%h);
}

sub foo {
    my $h = shift;

    while( my($k, $v) = each %$h ) {
        ...
    }
}

Both calls to each() share the same iterator. They should not. The iterator should be attached to the operator. How do you do this? One possibility is to use Devel::Declare to replace "each %hash" with "safe_each 1234, %hash" where 1234 is a unique identifier for that call to safe each. The iterator is stored by %hash->mo->id and each ident.

The text was updated successfully, but these errors were encountered:

cowens · 2010-06-25T00:07:31Z

This should be trivial, he said with the air of man about to try something really difficult. I see two cases to deal with: /each \s* (?: % | @ )/x and /each \s* \( \s* (?: % | @ )/. Does Devel::Declare grab methods (i.e. $foo->each)?

schwern · 2010-06-25T06:13:04Z

No, I don't think it does. You should be able to just catch the keyword, replace it with "perl5i::2::safe_each $identifier, " where perl5i::2::safe_each has a prototype of $%.

Problem is, I don't think Devel::Declare will pick up a keyword.

cowens · 2010-06-25T12:35:02Z

Ah, because it picks up keywords that haven't been defined yet. Yes, I think I remember something about that as well. Maybe the Pluggable Keywords will allow it. Must spend some time investigating this.

cowens · 2010-06-25T12:50:49Z

Ah, caller may be the answer, override CORE::each and use caller to determine the location of the call to each. Use the location as the index to the iterator. How does that sound? Unfortunately, caller can only return file and line location information not character position, so two eachs on the same line would have the same iterator. Of course, we don't have to limit ourselves to just what Perl can do; we can use XS. Do you know how warn gets the character offset in the messages that have them?

cowens · 2010-06-25T13:04:26Z

Stumbling block: keys and values are documented as reseting the each iterator. How will they know which one to reset? If we document that they don't reset the each iterator, how will can you reset it? You don't know the magic number being used to find the iterator. Maybe it would be better to just use %h->each(ID). This would also have the benefit of letting you still doing things like iterate up to point A in the hash in one loop and from point A to the end in another like you can do today. Backwards compatibility would be maintained by making a call to each without an ID use a default ID. We would also need to change keys and values to take the same ID. I think this puts it well within what I can do, I am going to try to have a working prototype by Monday.

schwern · 2010-06-25T13:54:03Z

I'd say change keys and values to not reset the iterator. Once the iterator is tied to the position and reference, most of the use cases for resetting it goes away. I can think of one edge case. Let's say you have a subroutine which contains...

while( my($k, $v) = each $hashref ) {
    last if $k eq 'foo';
}

and the same hash ref is passed to it twice. The second time through it will have an unreset iterator. Its a shame there can't be an implied reset before the loop, I think that requires making too many assumptions.

I don't like %h->each(ID) because how do you pick an ID that isn't going to clash with someone else? And its ugly. I like your idea of using the file and line number better.

Its also worth considering that the major use case for each goes away once we have nfor and can write nfor my($k, $v) (%hash), so maybe this isn't worthwhile.

I think what each exposes is that iteration needs a way to know when its stopped. That implies a block syntax. %hash->each { ... } Then when the loop terminates, either from running out of items or via last, the iterator can be reset.

cowens · 2010-06-25T14:57:46Z

One of each's strengths is that I can break up the iteration. For instance, I use code like this when I write benchmarks:

my ($first_name, $sub) = each %subs;
my $result = $sub->();
while (my ($name, $sub) = each %subs) {
    my $cur_result = $sub->();
    unless ($result ~~ $cur_result) {
        print "$name doesn't agree with $first_name\n";
    }
}

This relies on being able to use the same iterator for both of the eaches; if they are not, then I run one of the functions twice. Fixing each would break this code.

Perhaps we could return the iterator number in list context:

my ($first_name, $sub, $iter) = %subs->each;
my $result = $sub->();
while (my ($name, $sub) = %subs->each($iter)) {
    my $cur_result = $sub->();
    unless ($result ~~ $cur_result) {
        print "$name doesn't agree with $first_name\n";
    }
}

schwern · 2010-06-26T04:46:18Z

I think we should split the use cases. An iterator associated with the data structure for doing Clever Things and one that's associated with the data + call for Mere Mortals.

cowens · 2010-06-26T13:05:29Z

Here is a proof of concept. There are at least three problems with it: we have to use "0e0" to get around the while loop, the hash implementation is incredibly inefficient and pretty much defeats the purpose of each, and if you call each twice on the same line with the same variable you get the same iterator. We also need to go back and determine exactly what calling delete does to the iterator and what happens when we add keys to hashes or splice onto an array. There may also be problems related to using refaddr as part of the key (I can't remember if variables reuse addresses) and if we can't use refaddr then there can only be one call to each per line.

edit: minor fix to use 0 when in list context

#!/usr/bin/perl

use 5.012;
use warnings;
use Scalar::Util qw/refaddr reftype/;

use subs "each";

sub each(\[@%]) {
    goto &safe_each_hash  if "HASH" eq reftype $_[0];
    goto &safe_each_array;
}

sub safe_each_hash {
    my $h    = shift;
    my $key  = join "/", (caller)[1,2], refaddr $h;
    state %iter;

    unless (exists $iter{$key}) {
        $iter{$key} = [ keys %$h ];
    }

    unless (@{$iter{$key}}) {
        delete $iter{$key};
        return;
    }

    my $cur = shift @{$iter{$key}};

    return wantarray ? ($cur, $h->{$cur}) : $cur;
}

sub safe_each_array {
    my $array = shift;
    my $key   = join "/", (caller)[1,2], refaddr $array;
    state %iter;

    if (exists $iter{$key}) {
        $iter{$key}++;
    } else {
        $iter{$key} = "0e0";
    }

    if ($iter{$key} >= @$array) {
        delete $iter{$key};
        return;
    }

    return wantarray ? ($iter{$key}, 0+$array->[$iter{$key}]) : $iter{$key};
}

my %h = (a => 1, b => 2);
while (my $k = each %h) {
    say "key $k";
    while (my ($k, $v) = each %h) {
        say "\t$k => $v";
    }
}

my @a = qw/a b c/;

while (my $index = each @a) {
    say "index $index";
    while (my ($index, $value) = each @a) {
        say "\t$index $value";
    }
}

schwern · 2010-07-12T23:14:32Z

After using jQuery, thinking about subroutine signatures and seeing the complexity of fixing each() I think this will be much better served by a method with a callback.

%hash->each( def($k, $v) {
    say "$k => $v";
});

Its simple and clear. We just need anonymous functions with signatures which are on the way.

For #142

schwern · 2011-09-27T03:23:11Z

That should do it, I'm going to call this done.

schwern added a commit that referenced this issue Sep 27, 2011

Add a test for HASH->each as it exists now.

046419f

For #142

schwern added a commit that referenced this issue Sep 27, 2011

Whaddya know, %hash->each( func($k,$v) { ... } ); already works.

f16d818

For #142

schwern added a commit that referenced this issue Sep 27, 2011

Fix %hash->each so it always resets the iterator.

c62c82c

For #142

schwern closed this as completed Sep 27, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix each() to be attached to the operator, not the data #142

Fix each() to be attached to the operator, not the data #142

schwern commented Jun 23, 2010

cowens commented Jun 25, 2010

schwern commented Jun 25, 2010

cowens commented Jun 25, 2010

cowens commented Jun 25, 2010

cowens commented Jun 25, 2010

schwern commented Jun 25, 2010

cowens commented Jun 25, 2010

schwern commented Jun 26, 2010

cowens commented Jun 26, 2010

schwern commented Jul 12, 2010

schwern commented Sep 27, 2011

Fix each() to be attached to the operator, not the data #142

Fix each() to be attached to the operator, not the data #142

Comments

schwern commented Jun 23, 2010

cowens commented Jun 25, 2010

schwern commented Jun 25, 2010

cowens commented Jun 25, 2010

cowens commented Jun 25, 2010

cowens commented Jun 25, 2010

schwern commented Jun 25, 2010

cowens commented Jun 25, 2010

schwern commented Jun 26, 2010

cowens commented Jun 26, 2010

schwern commented Jul 12, 2010

schwern commented Sep 27, 2011