Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 23978 - ICE: dip1021 memory corruption #15302

Merged
merged 1 commit into from Jun 10, 2023
Merged

Conversation

dkorpel
Copy link
Contributor

@dkorpel dkorpel commented Jun 9, 2023

@ibuclaw It's not actually a regression, -preview=dip1021 has been broken from the start. It just happened to manifest in the report's piece of code after the _d_newclassT PR.

I don't want to add a slow test for this, so I added a comment instead of PERMUTE_ARGS to run it 256 times.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jun 9, 2023

Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23978 regression [REG 2.103.0] ICE: dip1021 memory corruption

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "stable + dmd#15302"

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

Honestly we should just deprecate dip1021, it's a broken design with a poor implementation. There's not even a migration path needed: all it did was add errors that don't accomplish anything, so it could just be treated as a no-op.

@CyberShadow
Copy link
Member

To prevent this class of bugs in the future, would it make sense to wrap all mem.xrealloc into a templated wrapper which checks that the given type doesn't have indirections? Or have some kind of checks in debug builds that it doesn't have pointers to GC-owned objects?

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

I think mem.xrealloc should just go away entirely. There are currently only 4 calls to mem.xrealloc left.

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

Ubuntu 22.04 x64, GDC fails on fail_compilation/fob2.d:

free(): invalid pointer
Aborted (core dumped)

I suspect one of the free() calls in ob.d is the culprit.

@maxhaton
Copy link
Member

maxhaton commented Jun 9, 2023

I think mem.xrealloc should just go away entirely. There are currently only 4 calls to mem.xrealloc left.

Kill it

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

I just noticed that xrealloc is aware of the GC:

static void* xrealloc(void* p, size_t size) pure nothrow
    {
        if (isGCEnabled)
            return GC.realloc(p, size);

But the documentation of GC.realloc says: "If p was not allocated by the GC, points inside a block, or is null, no action will be taken."

If that's true, how would it have ever begun to allocate an array when combining -preview=dip1021 and -lowmem? I think the documentation is incorrect and it allocated anyway, but the block is not being scanned.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2023

I just noticed that xrealloc is aware of the GC:

Yes, which is why I'm not sure whether this is correct.

// Clear the new section
memset(newPtr + escapeBy.length, 0, (len - escapeBy.length) * EscapeBy.sizeof);
escapeBy = newPtr[0 .. len];
escapeBy.length = len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_d_arraysetlength extends the array by using allocate+copy without marking the old data as "free" in the GC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this no different to reimplementing xrealloc to allocate and copy inline.

--- a/compiler/src/dmd/root/rmem.d
+++ b/compiler/src/dmd/root/rmem.d
@@ -71,7 +71,18 @@ extern (C++) struct Mem
     static void* xrealloc(void* p, size_t size) pure nothrow
     {
         if (isGCEnabled)
-            return GC.realloc(p, size);
+        {
+            if (auto p2 = GC.malloc(size))
+            {
+                const psize = GC.sizeOf(p);
+                if (psize < size)
+                    size = psize;
+                //p2[0 .. size] = p[0 .. size];
+                memcpy(p2, p, size);
+                return p2;
+            }
+            return null;
+        }
 
         if (!size)
         {

Copy link
Contributor

@kinke kinke Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[The druntime function is probably a bit smarter than that and tries to reuse the existing block.]

The memset-to-zero is missing now, so the existing elements would be reused in the loop below, that's probably problematic. Edit: Oh sorry, it previously just memset the newly allocated elements, all good.

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

Yes, which is why I'm not sure whether this is correct.

It does seem to be related to the EscapeBy[] array though, any idea what's going on?

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2023

Yes, which is why I'm not sure whether this is correct.

It does seem to be related to the EscapeBy[] array though, any idea what's going on?

Not beyond the pointer dump I hacked together and posted an abridged version of (the 14MB file) in the issue.

Running it through valgrind might help locate who is referencing a given memory address. I have my doubts it'd be capable though.

@rainers
Copy link
Member

rainers commented Jun 9, 2023

Maybe the failure is more reproducible if the reset-loop at the end of the function actually clears the arrays in EscapeByResults (it currently only resets the length) and also clears EscapeBy.param.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2023

I suspect the test is more reproducible if you insert import std; in the middle of struct V. However that's not so suitable for the testsuite. It's still non-deterministic though.

@kinke
Copy link
Contributor

kinke commented Jun 9, 2023

It just happened to manifest in the report's piece of code after the _d_newclassT PR.

The switch to _d_newclassT circumvents the _d_newclass override in

extern (C) Object _d_newclass(const ClassInfo ci) nothrow
{
const initializer = ci.initializer;
auto p = mem.isGCEnabled ? allocClass(ci) : allocmemoryNoFree(initializer.length);
memcpy(p, initializer.ptr, initializer.length);
return cast(Object) p;
}
- when using a host compiler with that new lowering (e.g., LDC hasn't switched yet). So this e.g. means that in default no-GC mode (no -lowmem), all class instances in the compiler are now allocated by the (disabled) GC, not with the bump-pointer scheme anymore.

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 9, 2023

That's a good point about rmem, it seems like dmd now only half-overrides druntime's allocating functions.

@dkorpel dkorpel changed the title Fix 23978 - ICE: EscapeBy[] is malloced, but contains GC-allocated objects Fix 23978 - ICE: dip1021 memory corruption Jun 9, 2023
@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

The switch to _d_newclassT circumvents the _d_newclass override

Yep, good call. I completely overlooked that.

Both GC.realloc and GC.free explicitly mark the passed address as "free" to reuse, however we (sort of) have little guarantee that any memory explicitly allocated via Mem has no other live references within the compiler at the moment either of these functions are called.

I'd be more inclined to remove all uses of both these GC hooks, letting GC.collect do the work by itself.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

As mentioned in the bug report, I've finally tracked it down.

The data being Mem.realloc'd contains many Array(T) fields, some of which have self references in their data.ptr field thanks to the smallarray optimization.

Naturally then, the memcpy from old GC data to new retains those self referenced addresses, and the GC marks the old data as "free". Some time later GC.malloc will return a pointer to said "free" data. So now we have two GC references to the same memory. One that is treating the data as an Array(VarDeclaration) in dmd.escape.escapeByStorage, and the other as an AA in the symtab of a dmd.dsymbol.ScopeDsymbol.

Long story short. Don't call Mem.xrealloc on an Array(T), or any data type have contains Array(T) fields!

@kinke
Copy link
Contributor

kinke commented Jun 10, 2023

Array has a disabled postblit, but it's un-movable too due to the potential self-reference. So I guess the same memcpy problem occurs whenever appending to/extending a GC-allocated array containing any Array fields?

@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

Array has a disabled postblit, but it's un-movable too due to the potential self-reference. So I guess the same memcpy problem occurs whenever appending to/extending a GC-allocated array containing any Array fields?

The short answer is no. Why? Because only GC.realloc(p, size) and GC.free(p) call freeNoSync(p).

The memcpy itself is not harmful, because when the GC scan happens, it will see there are still references to the old data, so will never mark it as free implicitly.

@kinke
Copy link
Contributor

kinke commented Jun 10, 2023

Not sure I follow, I'm talking in general about this unmovable Array bad boy, not specifically about Mem.realloc. Any Array with self-reference (length 1) becomes invalid after a move, with a dangling pointer, and then freeing it when getting destructed:

~this() pure nothrow
{
debug (stomp) memset(data.ptr, 0xFF, data.length);
if (data.ptr != &smallarray[0])
mem.xfree(data.ptr);
}
So GC-array expansions (with implicit moves) are problematic just as moving any aggregate containing some Array field. It's a real bad boy.

@dkorpel dkorpel merged commit 167b050 into dlang:stable Jun 10, 2023
43 of 44 checks passed
@dkorpel dkorpel deleted the stable branch June 10, 2023 18:15
@dkorpel dkorpel restored the stable branch June 10, 2023 18:16
@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

Not sure I follow, I'm talking in general about this unmovable Array bad boy, not specifically about Mem.realloc. Any Array with self-reference (length 1) becomes invalid after a move, with a dangling pointer, and then freeing it when getting destructed:

~this() pure nothrow
{
debug (stomp) memset(data.ptr, 0xFF, data.length);
if (data.ptr != &smallarray[0])
mem.xfree(data.ptr);
}
So GC-array expansions (with implicit moves) are problematic just as moving any aggregate containing some Array field. It's a real bad boy.

GC array expansions won't explicitly mark the old memory as "free" though. It still leaves that up for the next garbage collection run to clear that up. And the GC won't do that because it still sees live references via the new expanded memory.

@kinke
Copy link
Contributor

kinke commented Jun 10, 2023

Yeah okay, but then later we get the free(): invalid pointer errors when finalizing the EscapeBy array here.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

No, we don't. Because the Mem routines (well, GC) just allocate memory from the GC directly without any type information attached. RAII doesn't apply here, so ~this is never ran.

@kinke
Copy link
Contributor

kinke commented Jun 10, 2023

[Well, we did here for CI with the intermediate commit where Dennis extended the GC array length. And now don't anymore with the dropped cache, and allocating a fresh new array each time now (for -preview=dip1021 and @live functions only, so probably no big deal).]

@ibuclaw
Copy link
Member

ibuclaw commented Jun 10, 2023

[Well, we did here for CI with the intermediate commit where Dennis extended the GC array length. And now don't anymore with the dropped cache, and allocating a fresh new array each time now (for -preview=dip1021 and @live functions only, so probably no big deal).]

Ah, I didn't see that intermediate commit, was it perhaps using .length to grow the array? So there was RTTI attached to the memory block?

@kinke
Copy link
Contributor

kinke commented Jun 11, 2023

Yes, you commented on it ;) - #15302 (comment)

@baryluk
Copy link

baryluk commented Jun 11, 2023

Thank you Iain, Dennis, et al for the debugging and fix. Amazing.

I submitted original bug report to gdc, Iain minimized it further and did extensive debugging.

Honestly we should just deprecate dip1021, it's a broken design with a poor implementation. There's not even a migration path needed: all it did was add errors that don't accomplish anything, so it could just be treated as a no-op.

I didn't actually intended to use dip1021, but I used -fpreview=all (more out of curiosity), which does enable dip1021 by default too. Since then I switched to enable only relevant DIPs there.

@ibuclaw ibuclaw mentioned this pull request Jun 15, 2023
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 26, 2023
…e::lookup

Backports patch from upstream dmd mainline for fixing PR110113.

The data being Mem.xrealloc'd contains many Array(T) fields, some of
which have self references in their data.ptr field thanks to the
smallarray optimization used by Array.

Naturally then, the memcpy from old GC data to new retains those self
referenced addresses, and the GC marks the old data as "free". Some time
later GC.malloc will return a pointer to said "free" data. So now we
have two GC references to the same memory. One that is treating the data
as an Array(VarDeclaration) in dmd.escape.escapeByStorage, and the other
as an AA in the symtab of a dmd.dsymbol.ScopeDsymbol.

Fix this memory corruption by not storing the data in a global variable
for reuse.  If there are no more live references, the GC will free it.

	PR d/110113

gcc/d/ChangeLog:

	* dmd/escape.d (checkMutableArguments): Always allocate new buffer for
	computing escapeBy.

gcc/testsuite/ChangeLog:

	* gdc.test/compilable/test23978.d: New test.

Reviewed-on: dlang/dmd#15302
(cherry picked from commit ae3a4ce)
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 26, 2023
…e::lookup

Backports patch from upstream dmd mainline for fixing PR110113.

The data being Mem.xrealloc'd contains many Array(T) fields, some of
which have self references in their data.ptr field thanks to the
smallarray optimization used by Array.

Naturally then, the memcpy from old GC data to new retains those self
referenced addresses, and the GC marks the old data as "free". Some time
later GC.malloc will return a pointer to said "free" data. So now we
have two GC references to the same memory. One that is treating the data
as an Array(VarDeclaration) in dmd.escape.escapeByStorage, and the other
as an AA in the symtab of a dmd.dsymbol.ScopeDsymbol.

Fix this memory corruption by not storing the data in a global variable
for reuse.  If there are no more live references, the GC will free it.

	PR d/110113

gcc/d/ChangeLog:

	* dmd/escape.d (checkMutableArguments): Always allocate new buffer for
	computing escapeBy.

gcc/testsuite/ChangeLog:

	* gdc.test/compilable/test23978.d: New test.

Reviewed-on: dlang/dmd#15302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants