Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nxagentAdjustRandRXinerama uses freed memory #293

Closed
vatral opened this issue Nov 21, 2016 · 17 comments
Closed

nxagentAdjustRandRXinerama uses freed memory #293

vatral opened this issue Nov 21, 2016 · 17 comments
Assignees
Milestone

Comments

@vatral
Copy link
Contributor

vatral commented Nov 21, 2016

While debugging another issue, I stumbled across this one:

  1. nxagentAdjustRandRXinerama (Screen.c:3808) allocates memory by calling RRGetInfo
  2. nxagentAdjustRandRXinerama (Screen.c:3872) frees memory by calling FreeResource
  3. nxagentAdjustRandRXinerama (Screen.c:3986) attempts to use the memory it just freed with RRCrtcSet

It looks like pScrPriv->outputs should be compacted to remove the deleted entries, or the entry should be marked as unused.

Issue found in cb8af80

Valgrind output:

==5904== Invalid write of size 4
==5904==    at 0x7F3204: RROutputChanged (rroutput.c:38)
==5904==    by 0x7ED3FF: RRCrtcNotify (rrcrtc.c:197)
==5904==    by 0x7ED910: RRCrtcSet (rrcrtc.c:670)
==5904==    by 0x4A7B5D: nxagentAdjustRandRXinerama (Screen.c:3986)
==5904==    by 0x4A7D47: nxagentChangeScreenConfig (Screen.c:3711)
==5904==    by 0x493998: nxagentHandleConfigureNotify (Events.c:3472)
==5904==    by 0x4957A7: nxagentDispatchEvents (Events.c:1949)
==5904==    by 0x49F4B6: nxagentBlockHandler (Handlers.c:458)
==5904==    by 0x45B3CC: BlockHandler (dixutils.c:440)
==5904==    by 0x46FEDF: WaitForSomething (WaitFor.c:262)
==5904==    by 0x4326DA: Dispatch (NXdispatch.c:362)
==5904==    by 0x40E61D: main (main.c:433)
==5904==  Address 0xb586c20 is 112 bytes inside a block of size 152 free'd
==5904==    at 0x4C2CD5A: free (vg_replace_malloc.c:530)
==5904==    by 0x7F31A3: RROutputDestroyResource (rroutput.c:414)
==5904==    by 0x448E2F: FreeResource (NXresource.c:320)
==5904==    by 0x4A76D7: nxagentAdjustRandRXinerama (Screen.c:3872)
==5904==    by 0x4A7D47: nxagentChangeScreenConfig (Screen.c:3711)
==5904==    by 0x493998: nxagentHandleConfigureNotify (Events.c:3472)
==5904==    by 0x4957A7: nxagentDispatchEvents (Events.c:1949)
==5904==    by 0x49F4B6: nxagentBlockHandler (Handlers.c:458)
==5904==    by 0x45B3CC: BlockHandler (dixutils.c:440)
==5904==    by 0x46FEDF: WaitForSomething (WaitFor.c:262)
==5904==    by 0x4326DA: Dispatch (NXdispatch.c:362)
==5904==    by 0x40E61D: main (main.c:433)
==5904==  Block was alloc'd at
==5904==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==5904==    by 0x7F32C3: RROutputCreate (rroutput.c:92)
==5904==    by 0x7F0C80: RRScanOldConfig (rrinfo.c:100)
==5904==    by 0x7F0C80: RRGetInfo (rrinfo.c:206)
==5904==    by 0x4A74DF: nxagentAdjustRandRXinerama (Screen.c:3808)
==5904==    by 0x4A7D47: nxagentChangeScreenConfig (Screen.c:3711)
==5904==    by 0x493998: nxagentHandleConfigureNotify (Events.c:3472)
==5904==    by 0x4957A7: nxagentDispatchEvents (Events.c:1949)
==5904==    by 0x49F4B6: nxagentBlockHandler (Handlers.c:458)
==5904==    by 0x45B3CC: BlockHandler (dixutils.c:440)
==5904==    by 0x46FEDF: WaitForSomething (WaitFor.c:262)
==5904==    by 0x4326DA: Dispatch (NXdispatch.c:362)
==5904==    by 0x40E61D: main (main.c:433)

@uli42
Copy link
Member

uli42 commented Nov 21, 2016

While developing this I had the problem that the original xrandr code was missing a free() call. So I had to work around that somehow. The missing free() has been backported IIRC. So it could be that the workaround is wrong now.

@vatral
Copy link
Contributor Author

vatral commented Nov 21, 2016

It does look like something needs fixing.

FreeResource is a free() call, and the code that calls it doesn't decrement pScrPriv->numOutputs, so that entry is still in the array, and pointing to freed memory. It would seem the logical thing to do is to either compact the array after deletion, or to set the entry to NULL to mark it as unused.

I've not yet looked at what is the right thing to do there, and may be busy with other things tomorrow, so it would be good if somebody else could take a look at it.

@sunweaver
Copy link
Member

[mike@minobo nxagent (3.6.x)]$ git diff
diff --git a/nx-X11/programs/Xserver/hw/nxagent/Screen.c b/nx-X11/programs/Xserver/hw/nxagent/Screen.c
index e645129..9aa7dca 100644
--- a/nx-X11/programs/Xserver/hw/nxagent/Screen.c
+++ b/nx-X11/programs/Xserver/hw/nxagent/Screen.c
@@ -3863,16 +3863,6 @@ int nxagentAdjustRandRXinerama(ScreenPtr pScreen)
       }
     }
 
-    /* delete superfluous non-NX outputs */
-    for (i = pScrPriv->numOutputs - 1; i >= 0; i--) {
-      if (strncmp(pScrPriv->outputs[i]->name, "NX", 2)) {
-        #ifdef DEBUG
-        fprintf(stderr, "nxagentAdjustRandRXinerama: destroying output [%s]\n", pScrPriv->outputs[i]->name);
-        #endif
-        RROutputDestroy(pScrPriv->outputs[i]);
-      }
-    }
-
     /* at this stage only NX outputs are left - we delete the superfluous ones */
     for (i = pScrPriv->numOutputs - 1; i >= number; i--) {
       #ifdef DEBUG

To me it seems we should / could simply drop the above code fraction.

@uli42: @vatral: ?

@sunweaver
Copy link
Member

Oh hang on... wrong section removed, I just realized. However, that above passage does not make sense either, does it? Where should non-NX Outputs come from?

@sunweaver
Copy link
Member

more noise... the above passage was the correct code block to be removed...

@sunweaver sunweaver added the bug label Nov 30, 2016
@sunweaver sunweaver added this to the 3.6.0.0 milestone Nov 30, 2016
@sunweaver
Copy link
Member

@uli42: @vatral: ?

@vatral
Copy link
Contributor Author

vatral commented Dec 2, 2016

@sunweaver Hrmm. This is confusing. Let's see if I understand.

That chunk of code removes unneeded outputs. Note that there's another one right after it, so if you're going to remove something, it's probably both.

The code iterates from the end of the list, because RROutputDestroy actually compacts the pScrPriv->outputs array.

Lower down, the problem happens with "RRCrtcSet(pScrPriv->crtcs[i], ...", because RRCrtcSet ends up calling RROutputChanged on a freed RROutput. Also note this:

for (i = 0; i numOutputs; i++ ) {
    RROutputSetModes(pScrPriv->outputs[i], NULL, 0, 0);
    ...
    RRCrtcSet(pScrPriv->crtcs[i], NULL, 0, 0, RR_Rotate_0, 1, 

So the assumption there seems to be that pScrPriv->outputs and pScrPriv->crtcs have a 1 to 1 relationship, but that seems to be false, because when we delete from "outputs" in RROutputDestroy we are not touching crtcs.

Then there's that a RROutput has both a .crtc member and a .crtcs array. So far I see:

  • A Screen has RROutputs
  • An RROutput has a list of RRCrtcs
  • Destroying a RROutput destroys the list but not the RRCrtcs themselves, so something else owns those objects
  • An RRCrtc has a list of RROutputs inside it

Ok, now I'm confused about what is the hierarchy of this and who owns what. Are pScrPriv->outputs and pScrPriv->crtcs supposed to go in sync? Should deleting a RROutput reach into RRCrtcs and remove any references it holds to the deleted object?

@uli42
Copy link
Member

uli42 commented Dec 2, 2016

Well, concerning the NX outputs this 1:1 relationship is true, IIRC. If an output on the real xserver is not visible inside NX it gets switched off by removing the crtc. You can comment some of the cleanup routines and check what the servers knows using the xrandr binary. Maybe it gets clearer what's going on. I need to look deeper into this but I am lacking time currently, sorry.

@uli42
Copy link
Member

uli42 commented Dec 3, 2016

Maybe the spec helps for better understanding of the output<->crtc relationship: https://cgit.freedesktop.org/xorg/proto/randrproto/tree/randrproto.txt (graphical representation around line 80)

The 1:1 relationship is true in NX as long as noone is adding additional outputs using the RAND extension. Which is something that should be prevented anyway, I'd say.

@opoplawski
Copy link
Contributor

Just stumbled across this as well testing 3.5.99.3

sunweaver added a commit to sunweaver/nx-libs that referenced this issue Feb 3, 2017
sunweaver added a commit to sunweaver/nx-libs that referenced this issue Feb 3, 2017
…in RandR extension internally since backport of RandR 1.5 to nxagent's Xserver code.

 Fixes ArcticaProject#293.
@uli42
Copy link
Member

uli42 commented Feb 3, 2017

it looks like pScrPriv->outputs should be compacted to remove the deleted entries, or the entry should be marked as unused.

After looking through the code I think this remark is correct.

Update:
Some code doing this can be found in rroutput.c/RROutputDestroyResource(void *value, XID pid):

        for (i = 0; i < pScrPriv->numOutputs; i++) {
            if (pScrPriv->outputs[i] == output) {
                memmove(pScrPriv->outputs + i, pScrPriv->outputs + i + 1,
                        (pScrPriv->numOutputs - (i + 1)) * sizeof(RROutputPtr));
                --pScrPriv->numOutputs;
                break;
            }
        }

If everything goes right RROutputDestroyResource() should be called automatically since it is used as the deleteFunc for the Output resource type:

Bool
RROutputInit(void)
{
    RROutputType = CreateNewResourceType(RROutputDestroyResource
#ifndef NXAGENT_SERVER
                                         , "OUTPUT"
#endif
        );

So in theory the postulated compaction of the output array is already done.

@uli42
Copy link
Member

uli42 commented Feb 4, 2017

I have confirmed that RROutputDestroyResource() gets executed when FreeResource() is called for an output. So the output array gets compacted if necessary.

Update: we could have saved all this as RROutputDestroyResource happens to occur in the valgrind output above...

@uli42
Copy link
Member

uli42 commented Feb 4, 2017

Now, this is interesting:

==5904==    by 0x7F31A3: RROutputDestroyResource (rroutput.c:414)
==5904==    by 0x448E2F: FreeResource (NXresource.c:320)
==5904==    by 0x4A76D7: nxagentAdjustRandRXinerama (Screen.c:3872)

If I interpret valgrind's output correctly this tells me that nxagentAdjustRandRXinerama() calls FreeResource() which in turn calls RROutputDestroyResource(). So we can conclude that FreeResource() is used here to free an Output resource. However, the code contains only one FreeResource() call, and that one is freeing a mode, not an Output!

      if (prevmode && prevmode->refcnt == 1) {
        #ifdef DEBUG
        fprintf(stderr, "nxagentAdjustRandRXinerama: destroying prevmode [%s]\n", prevmode->name);
        #endif
	FreeResource(prevmode->mode.id, 0);
      }

So the question is: why do we end up in RROutputDestroyResource() when freeing a mode resource?

sunweaver added a commit to sunweaver/nx-libs that referenced this issue Feb 4, 2017
…NTERFACE. CRTC and OUTPUT setup happens in nxagent's Screen.c file.

 Fixes ArcticaProject#293.
@uli42
Copy link
Member

uli42 commented Feb 6, 2017

Update2: ignore this comment, it's nonsense. See further comments.

Well, while your commit seems to make the behaviour disappear I think I have taken a step towards the real cause: During writing Xinerama stuff I have added nxagentRandRCrtcSet() (in Extensions.c). This simply calls RRCrtcNotify(). I remember that this was the easiest approach to get the desired behaviour though I had been unsure about it. Not without a reason, it seems...

RRCrtcNotify has this comment:

 * Notify the extension that the Crtc has been reconfigured,
 * the driver calls this whenever it has updated the mode

We have not changed the mode here but the number of outputs (by destroying one). So it is probably not the right function to call.

I think nxagentRandRCrtcSet() should be reimplemented to do the right thing itself.

(Proof of concept: changing nxagentRandRCrtcSet to just return TRUE makes the issue disappear. Must check if xinerama is still working, though)

Update: http://xorg-devel.x.narkive.com/KpOwRcMX/patch-xvfb-add-randr-support also calls RRCrtcNotify here. Hmm...

@sunweaver
Copy link
Member

sunweaver commented Feb 8, 2017 via email

@uli42
Copy link
Member

uli42 commented Feb 8, 2017

I think "the right thing" is to do what RRCrtcNotify() does but without touch old stuff. The real problem is that the old output (that has been destroyed just before) is still referenced in the crtc struct and RRNotify() will try to set the "changed" flag on it.

It could also be that calling RROutputDestroy() it not sufficient and some more work is needed after it. I need to investigate but I am lacking time...

@uli42
Copy link
Member

uli42 commented Feb 8, 2017

The real reason is identified: The "delete superfluous non-NX outputs" loop adjusts the number of outputs to the number of crtcs; both should match the number of outputs of the real server. It does that by calling RROutputDestroy() for all the outputs in the outputs array whose index exceeds the required number. But it does not check if any of these outputs is still in use by a crtc. So a crtc can have reference to a destroyed output. And on RRCrtcNotify() that output reference is found and the output is marked as changed by setting the output's "changed" flag. As the output has already been destroyed and its memory has been freed valgrind complains (which is of course correct).

What does the proper fix look like? I am not sure. Maybe first try to identify unused outputs and first destroy them. But what to do if they are not enough? Can that happen at all?

Currently the outputs are deleted from the highest index to the lowest index in the outputs array. Simply reversing the order will probably help for exactly this situation where a seeing here but it feels wrong as a fix.

This memory access happens first for the "default" output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants