Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AA] inserting into associative array invalidates foreach iteration #17641

Open
dlangBugzillaToGithub opened this issue Feb 21, 2014 · 7 comments
Labels
Druntime:AA Specific to Associative Arrays P2 Severity:major

Comments

@dlangBugzillaToGithub
Copy link

Ivan Kazmenko (@GassaFM) reported this on 2014-02-21T09:47:49Z

Transferred from https://issues.dlang.org/show_bug.cgi?id=12218

CC List

Description

I have this piece of code.  It creates an associative array of integers from 1 to 124, inclusive, where a[i] = i.  It then proceeds by iterating over that array and adding (k+1, v+1) pair to the same array for each (k, v) pair we see.  This breaks in a reproducible manner.  I understand this was perhaps not intended to work.  Even if so, there are some points of concern: memory safety, lack of documentation and the possibility for improvement.

-----
import std.exception;
import std.stdio;

/* @safe */ void main ()
{
	int lo = 1;
	int hi = 125;
	writeln ("start ", lo, " ", hi);
	int [int] a;
	foreach (i; lo..hi)
	{
		a[i] = i;
	}
	foreach (k, v; a)
	{
		writeln ("element ", k, " ", v);
		enforce (k == v);
		a[k + 1] = v + 1;
	}
}
-----

The typical output on Win32 with DMD, GDC and LDC is:
-----
start 1 125
element 31 31
element 62 62
element 93 93
element 124 124
element 273870408 2048
-----
And then, the enforcement fails.

The first number 273870408 on the last line of output can be 1404296, 3616704 or some other number.  The second number seems to always be 2048.  In short, a (key, value) pair of garbage values is added into the array.  What happens is perhaps reallocation of the array when its size reaches around 127 or 128 elements, and the foreach loop does not get to learn the new location of the data.

The points that concern me are memory safety, lack of documentation and the possibility for improvement.

1. @safe?

This is not exactly memory safe: the program ends up using uninitialized memory.  The junk number (like 273870408) appears in the array out of nowhere.  Yet, the compiler permits writing "@safe void main ()" once we comment out the calls to writeln.  Perhaps this piece of uninitialized memory could well appear under the "ptr" field of some dynamic array if the declaration was slightly more complex than "int [int]", and that would lead to memory corruption.

2. Undocumented?

There is no warning on the associative arrays documentation page (http://dlang.org/hash-map.html).  There are a few bug reports and discussions stating that one can not safely *remove* elements while iterating on an associative array.  Yet, I didn't find any report which stated the same problems when *adding* items to an associative array.

Perhaps adding while iterating is not a much used feature: there is no guarantee that the newly added (key, value) pairs will or will not be visited in the foreach loop.  Still, it can be useful in some situations where the insertion of new elements is somehow idempotent: for example, we add no more than one single special key in the course of the foreach loop.

Even if such usage is unintended, a friendly way to handle it would be a warning on the documentation page, and maybe a compile-time error in simply visible cases, like the example above.

3. Possible to improve?

The C++ STL documentation for set and map explicitly states that (an example quote from http://www.sgi.com/tech/stl/set.html):

-----
Set has the important property that inserting a new element into a set does not invalidate iterators that point to existing elements. Erasing an element from a set also does not invalidate any iterators, except, of course, for iterators that actually point to the element that is being erased.
-----

The implementation of hash_set by Microsoft also boasts a similar feature
(http://msdn.microsoft.com/en-us/library/bb398039.aspx):

-----
Moreover, inserting an element invalidates no iterators, and removing an element invalidates only those iterators which point at the removed element.
-----

These quotes suggest that it might be possible to maintain correctness of such foreach loops in D, too, while preserving efficiency.

Ivan Kazmenko.

!!!There are attachements in the bugzilla issue that have not been copied over!!!

@dlangBugzillaToGithub
Copy link
Author

gassa commented on 2014-02-21T10:04:50Z

This report is a reduction of the case that bit me during a programming contest: Google Code Jam 2013, Round 1B, problem C.

The problem statement: http://code.google.com/codejam/contest/2434486/dashboard#s=p2

My faulty solution: code.google.com/codejam/contest/scoreboard/do?cmd=GetSourceCode&contest=2434486&problem=2705486&io_set_id=1&username=Gassa

@dlangBugzillaToGithub
Copy link
Author

gassa commented on 2014-02-28T01:45:01Z

Created attachment 1333
program to find a reduced test case

Well, 124 numbers sound like much, but the test case presented above is the most reduced example I came up with.  I attached the program used to find such a test case.  Basically, it just adds an interval [lo; hi] into an associative array and then iterates on the array.  The second similar array is also created to make the breakage more likely because of data relocation, but it turns out to be unnecessary.

On the bright side, the original broken program processes several megabytes of text, so some reduction did in fact take place.

@dlangBugzillaToGithub
Copy link
Author

gassa commented on 2014-02-28T01:47:43Z

Created attachment 1334
reduced test case printing diagnostic messages

@dlangBugzillaToGithub
Copy link
Author

gassa commented on 2014-02-28T01:48:39Z

Created attachment 1335
reduced test case as a safe function - which it is not!

@dlangBugzillaToGithub
Copy link
Author

yebblies commented on 2014-02-28T06:21:53Z

I can confirm this was not intended to work, and it most likely never will.  It should be possible for the compiler to detect this and give you an error.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2014-02-28T06:35:25Z

(In reply to comment #5)
> I can confirm this was not intended to work, and it most likely never will.  It
> should be possible for the compiler to detect this and give you an error.

Yes, while fixing it in general is probably not possible, giving a good compile-time error is a good thing.

@dlangBugzillaToGithub
Copy link
Author

pro.mathias.lang (@Geod24) commented on 2020-08-13T01:20:41Z

The program in the original post now works, most likely because the hash algorithm was changed and the bug doesn't show for such low values (it doesn't re-hash).
However, we have encountered this in the wild and it was the cause of long hours of debugging.

Here's a new test code that SIGSEGV reliably on Linux and Mac:
```
import std.stdio;

alias BinBlob = int[32];
BinBlob rv(int i) @safe { BinBlob r = i; return r; }

void main () @safe
{
    int[BinBlob] myMap;
    foreach (int i; 1 .. 10)
        myMap[rv(i)] = i;

    int i = 10;
    BinBlob[] keys_second_pass;
    foreach (key, value; myMap)
    {
        version (BugFree) { /* Not storing the key does not segv */ }
        else               keys_second_pass ~= key;

        writeln(key, ": ", value);
        foreach (int j; 0 .. 100_000)
            myMap[rv(++i)]=i;
    }
}
```

Output looks like this:
```
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]: 5
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]: 6
[21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21]: 21
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]: 1
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]: 3
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]: 8
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]: 9
[24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24]: 24
[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]: 14
Error: program killed by signal 11
```

The debugger backtrace is as you would expect:
```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000100000f82 safe`_D4safe4mainFNfZ14__foreachbody1MFNfKG32iKiZi(__applyArg0=0x0180000101102420, __applyArg1=0x01800001011024a0) at safe.d:14:5
   11
   12  	    int i = 10;
   13  	    BinBlob[] keys_second_pass;
-> 14  	    foreach (key, value; myMap)
   15  	    {
   16  	        version (BugFree) { /* Not storing the key does not segv */ }
   17  	        else               keys_second_pass ~= key;
Target 0: (safe) stopped.
(lldb) bt
error: need to add support for DW_TAG_base_type 'char' encoded with DW_ATE = 0x10, bit_size = 8
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000100000f82 safe`_D4safe4mainFNfZ14__foreachbody1MFNfKG32iKiZi(__applyArg0=0x0180000101102420, __applyArg1=0x01800001011024a0) at safe.d:14:5
    frame #1: 0x000000010016df4e safe`_aaApply2 + 94
    frame #2: 0x0000000100000f4f safe`_Dmain at safe.d:14:5
    frame #3: 0x00000001001724b0 safe`_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv + 112
    frame #4: 0x00000001001722a7 safe`_d_run_main2 + 391
    frame #5: 0x000000010017210d safe`_d_run_main + 141
    frame #6: 0x0000000100001255 safe`main(argc=1, argv=0x00007ffeefbff760) at entrypoint.d:35:13
    frame #7: 0x00007fff69976cc9 libdyld.dylib`start + 1
```

@thewilsonator thewilsonator added the Druntime:AA Specific to Associative Arrays label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Druntime:AA Specific to Associative Arrays P2 Severity:major
Projects
None yet
Development

No branches or pull requests

2 participants