New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drawing sprites makes program crash after fix for context management #608

Closed
abodelot opened this Issue May 22, 2014 · 20 comments

Comments

Projects
None yet
8 participants
@abodelot
Contributor

abodelot commented May 22, 2014

After checking out the last commit in the master branch: 1fe22e2, which altered the context management in Graphics/Shader.cpp, the graphics module appears to be broken and drawing sprites result in a segmentation fault.

When rolling back to the previous commit in the master branch: 749cbb2 (add HTTP support for PUT and DELETE), everything is fine again.

Problem can be reproduced with this program:

#include <SFML/Graphics.hpp>
#include <iostream>

int main()
{
    sf::RenderWindow app(sf::VideoMode(640, 480), "sfml");

    sf::Texture texture;
    texture.loadFromFile("images/tux.png");
    sf::Sprite sprite(texture);

    int i = 0;
    while (app.isOpen())
    {
        sf::Event event;
        while (app.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
                app.close();
        }
        app.clear();
        std::cout << "drawing " << ++i << std::endl;
        app.draw(sprite);
        app.display();
    }
}

System informations:

  • OS: GNU/Linux i686 (Debian Wheezy)
  • Graphic card: NVIDIA Corporation NV43 [GeForce 6600 GT]
  • Driver: nvidia-glx 304.117-1

So issue might be a Linux/Nvidia thing only.

Valgrind backtrace:

==23552== Memcheck, a memory error detector

drawing 1
drawing 2
==23552== Invalid read of size 4
==23552==    at 0x403ACB3: ??? (in /tmp/glEw06co (deleted))
==23552==    by 0x5AEA916: ??? (in /usr/lib/i386-linux-gnu/libnvidia-glcore.so.304.117)
==23552==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==23552== 
==23552== 
==23552== Process terminating with default action of signal 11 (SIGSEGV)
==23552==  Access not within mapped region at address 0x0
==23552==    at 0x403ACB3: ??? (in /tmp/glEw06co (deleted))
==23552==    by 0x5AEA916: ??? (in /usr/lib/i386-linux-gnu/libnvidia-glcore.so.304.117)
==23552==  If you believe this happened as a result of a stack
==23552==  overflow in your program's main thread (unlikely but
==23552==  possible), you can try to increase the size of the
==23552==  main thread stack using the --main-stacksize= flag.
==23552==  The main thread stack size used in this run was 8388608.
==23552== 
==23552== HEAP SUMMARY:
==23552==     in use at exit: 4,254,049 bytes in 2,217 blocks
==23552==   total heap usage: 7,713 allocs, 5,496 frees, 8,460,236 bytes allocated
==23552== 
==23552== LEAK SUMMARY:
==23552==    definitely lost: 0 bytes in 0 blocks
==23552==    indirectly lost: 0 bytes in 0 blocks
==23552==      possibly lost: 517,200 bytes in 84 blocks
==23552==    still reachable: 3,736,849 bytes in 2,133 blocks
==23552==         suppressed: 0 bytes in 0 blocks
==23552== Rerun with --leak-check=full to see details of leaked memory
==23552== 
==23552== For counts of detected and suppressed errors, rerun with: -v
==23552== Use --track-origins=yes to see where uninitialised values come from
==23552== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 85 from 12)
Erreur de segmentation

I have also tried drawing various shapes, which result in graphic glitches instead of segaults (wrong position, wrong dimensions, or nothing visible).

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila May 22, 2014

Member

It can't be this commit, since it changes only the implementation of Shader::isAvailable()which is not called at all in your example (not even internally by SFML).

Member

LaurentGomila commented May 22, 2014

It can't be this commit, since it changes only the implementation of Shader::isAvailable()which is not called at all in your example (not even internally by SFML).

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r May 22, 2014

Member

Yeah, no idea how that commit would have any affect on your test application. Besides that a GDB backtrace would be more more useful, then at least it would be possible to see where the crash originate from.

Member

eXpl0it3r commented May 22, 2014

Yeah, no idea how that commit would have any affect on your test application. Besides that a GDB backtrace would be more more useful, then at least it would be possible to see where the crash originate from.

@dabbertorres

This comment has been minimized.

Show comment
Hide comment
@dabbertorres

dabbertorres May 22, 2014

Contributor

I've got an Nvidia/Linux system, I'll see if I can reproduce and get a gdb
back trace.
On May 22, 2014 1:30 PM, "Lukas Dürrenberger" notifications@github.com
wrote:

Yeah, no idea how that commit would have any affect on your test
application. Besides that a GDB backtrace would be more more useful, then
at least it would be possible to see where the crash originate from.


Reply to this email directly or view it on GitHubhttps://github.com//issues/608#issuecomment-43940120
.

Contributor

dabbertorres commented May 22, 2014

I've got an Nvidia/Linux system, I'll see if I can reproduce and get a gdb
back trace.
On May 22, 2014 1:30 PM, "Lukas Dürrenberger" notifications@github.com
wrote:

Yeah, no idea how that commit would have any affect on your test
application. Besides that a GDB backtrace would be more more useful, then
at least it would be possible to see where the crash originate from.


Reply to this email directly or view it on GitHubhttps://github.com//issues/608#issuecomment-43940120
.

@abodelot

This comment has been minimized.

Show comment
Hide comment
@abodelot

abodelot May 22, 2014

Contributor

Drawing a sprite like I did in my previous example does call the function Shader::isAvailable():

  1. Sprite::draw(RenderTarget& target, ...) calls RenderTarget::draw(const Vertex vertices, ...)*
  2. RenderTarget::draw(const Vertex* vertices, ...) calls RenderTarget::resetGLStates()
  3. RenderTarget::resetGLStates() calls Shader::isAvailable()

And since reverting to the last commit before fixes the issue, I'm pretty sure the commit modifying Shader::isAvailable() is the faulty one.

But the segfault does not happens in this function. You can see in the valgrind trace that:

  • the segfault is raised in the nvidia driver (libnvidia-glcore.so), not in sfml-graphics-2.so
  • the segfault occurs at the second drawing, in the second call to RenderTarget::draw(const Vertex vertices, ...)*

I didn't manage to provide a gdb backtrace (maybe because the crash occurs in the nvidia driver?) but I've investigated further and the last function call in SFML before the crash is glDrawArrays(mode, 0, vertexCount) in RenderTarget::draw. (remember, crash happens at the second drawing).

I've tested on another computer which also happens to be Linux/Nvidia system and I've encountered the same problem. It would be nice to have some feedback from other Linux users.

Contributor

abodelot commented May 22, 2014

Drawing a sprite like I did in my previous example does call the function Shader::isAvailable():

  1. Sprite::draw(RenderTarget& target, ...) calls RenderTarget::draw(const Vertex vertices, ...)*
  2. RenderTarget::draw(const Vertex* vertices, ...) calls RenderTarget::resetGLStates()
  3. RenderTarget::resetGLStates() calls Shader::isAvailable()

And since reverting to the last commit before fixes the issue, I'm pretty sure the commit modifying Shader::isAvailable() is the faulty one.

But the segfault does not happens in this function. You can see in the valgrind trace that:

  • the segfault is raised in the nvidia driver (libnvidia-glcore.so), not in sfml-graphics-2.so
  • the segfault occurs at the second drawing, in the second call to RenderTarget::draw(const Vertex vertices, ...)*

I didn't manage to provide a gdb backtrace (maybe because the crash occurs in the nvidia driver?) but I've investigated further and the last function call in SFML before the crash is glDrawArrays(mode, 0, vertexCount) in RenderTarget::draw. (remember, crash happens at the second drawing).

I've tested on another computer which also happens to be Linux/Nvidia system and I've encountered the same problem. It would be nice to have some feedback from other Linux users.

@ColinDuquesnoy

This comment has been minimized.

Show comment
Hide comment
@ColinDuquesnoy

ColinDuquesnoy May 22, 2014

I also get a segfault after drawing 2. Reverting to 749cbb2 does indeed fix the problem.

System informations:

  • OS: Arch Linux x86_x64
  • Graphic card: GeForce GTX 480
  • Driver version (proprietary): 337.19

ColinDuquesnoy commented May 22, 2014

I also get a segfault after drawing 2. Reverting to 749cbb2 does indeed fix the problem.

System informations:

  • OS: Arch Linux x86_x64
  • Graphic card: GeForce GTX 480
  • Driver version (proprietary): 337.19
@mantognini

This comment has been minimized.

Show comment
Hide comment
@mantognini

mantognini May 22, 2014

Member

Intel or Nvidia gc not affected on OS X. Bug in driver specific to Linux? :-/

Member

mantognini commented May 22, 2014

Intel or Nvidia gc not affected on OS X. Bug in driver specific to Linux? :-/

@dabbertorres

This comment has been minimized.

Show comment
Hide comment
@dabbertorres

dabbertorres May 22, 2014

Contributor

I do not get a segfault.

System:
OS: Arch Linux x86_64
Graphics: GeForce GTX 760
Driver: 337.19 (proprietary)

Contributor

dabbertorres commented May 22, 2014

I do not get a segfault.

System:
OS: Arch Linux x86_64
Graphics: GeForce GTX 760
Driver: 337.19 (proprietary)

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 May 23, 2014

Member

Some more undocumented behaviour from a driver... oh the fun.

The difference between the previous revision and this one is that in the first call to Shader::isAvailable() a temporary context is created and destroyed after probing for shader support. This means that just for the first call, the RenderTarget will not be issuing GL commands in the context that it activated. But this really should not be a problem because when the temporary context is destroyed, GlContext makes sure the thread local context is automatically activated, so there is still a valid context when the RenderTarget issues GL commands the rest of that frame.

Why does the crash only happen on the second frame? I really don't know. In the second frame, if Shader::isAvailable() would be called (it isn't because states were already reset) no temporary contexts are even created any more since SFML would use the cached result from the first frame, so for that whole frame, the RenderTarget's context stays activated. This behaviour is almost identical to the behaviour in the previous revision. The only real change I can see is that Shader::isAvailable() doesn't call ensureGlContext() every time it is called.

I just realized that the first time RenderTarget::draw() is called, it assumes that its context stays as the active one throughout the call, but when calling resetGlStates() which in turn calls Shader::isAvailable() a temporary context is created which sets the active context to the thread-local context when it is destroyed. RenderTarget::draw() will then do all its work with the thread-local context active. Because of the way it caches states between draw() calls, it always assumes that the context in which it does its stuff stays the same, but that wasn't the case in the first frame. Obviously in the second frame, draw() won't set the vertex pointers again because it thinks they are still valid from the first frame, but that was in another context. When calling glDrawArrays() it crashes because it probably tries to dereference a null pointer if that is what it's initial state is in a new context.

You can try applying this patch and see if it fixes the crash for you. All it does is move the Shader::isAvailable() call before activate() so that the RenderTarget's context overrides the thread-local one all the time.

Why does this not affect everyone? Well.... depending on how vendors implement their context management/sharing/whatever in their drivers, different things can happen. I assume that in the cases where the crash doesn't occur, the GL states themselves are shared between contexts to some extent, which would mean that in the second call the pointers would still be valid even though they were set in a different context. Which variant is more/less compliant with the GL specification? I have no idea 😉. SFML context management really needs to be cleaned up anyway.

Member

binary1248 commented May 23, 2014

Some more undocumented behaviour from a driver... oh the fun.

The difference between the previous revision and this one is that in the first call to Shader::isAvailable() a temporary context is created and destroyed after probing for shader support. This means that just for the first call, the RenderTarget will not be issuing GL commands in the context that it activated. But this really should not be a problem because when the temporary context is destroyed, GlContext makes sure the thread local context is automatically activated, so there is still a valid context when the RenderTarget issues GL commands the rest of that frame.

Why does the crash only happen on the second frame? I really don't know. In the second frame, if Shader::isAvailable() would be called (it isn't because states were already reset) no temporary contexts are even created any more since SFML would use the cached result from the first frame, so for that whole frame, the RenderTarget's context stays activated. This behaviour is almost identical to the behaviour in the previous revision. The only real change I can see is that Shader::isAvailable() doesn't call ensureGlContext() every time it is called.

I just realized that the first time RenderTarget::draw() is called, it assumes that its context stays as the active one throughout the call, but when calling resetGlStates() which in turn calls Shader::isAvailable() a temporary context is created which sets the active context to the thread-local context when it is destroyed. RenderTarget::draw() will then do all its work with the thread-local context active. Because of the way it caches states between draw() calls, it always assumes that the context in which it does its stuff stays the same, but that wasn't the case in the first frame. Obviously in the second frame, draw() won't set the vertex pointers again because it thinks they are still valid from the first frame, but that was in another context. When calling glDrawArrays() it crashes because it probably tries to dereference a null pointer if that is what it's initial state is in a new context.

You can try applying this patch and see if it fixes the crash for you. All it does is move the Shader::isAvailable() call before activate() so that the RenderTarget's context overrides the thread-local one all the time.

Why does this not affect everyone? Well.... depending on how vendors implement their context management/sharing/whatever in their drivers, different things can happen. I assume that in the cases where the crash doesn't occur, the GL states themselves are shared between contexts to some extent, which would mean that in the second call the pointers would still be valid even though they were set in a different context. Which variant is more/less compliant with the GL specification? I have no idea 😉. SFML context management really needs to be cleaned up anyway.

@abodelot

This comment has been minimized.

Show comment
Hide comment
@abodelot

abodelot May 23, 2014

Contributor

@binary1248: SFML code runs as expected with your patch, no more segfault!

Contributor

abodelot commented May 23, 2014

@binary1248: SFML code runs as expected with your patch, no more segfault!

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila May 23, 2014

Member

Aahhh, I knew it couldn't be that easy :p

Member

LaurentGomila commented May 23, 2014

Aahhh, I knew it couldn't be that easy :p

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r May 23, 2014

Member

So, is someone going to create a branch for the fix, or do you want to find a different solution? Or should I create one?

Member

eXpl0it3r commented May 23, 2014

So, is someone going to create a branch for the fix, or do you want to find a different solution? Or should I create one?

@MarioLiebisch

This comment has been minimized.

Show comment
Hide comment
@MarioLiebisch

MarioLiebisch May 23, 2014

Member

Just apply/commit the fix since current master is proven to be broken, so things "can't get worse"? Even if it's just temporary.

Member

MarioLiebisch commented May 23, 2014

Just apply/commit the fix since current master is proven to be broken, so things "can't get worse"? Even if it's just temporary.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila May 23, 2014

Member

We need at least to add a proper comment about why the test is done there.

Member

LaurentGomila commented May 23, 2014

We need at least to add a proper comment about why the test is done there.

@mantognini

This comment has been minimized.

Show comment
Hide comment
@mantognini

mantognini May 23, 2014

Member

For what it is worth, the patch has no bad effect on OS X.

Member

mantognini commented May 23, 2014

For what it is worth, the patch has no bad effect on OS X.

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r May 23, 2014

Member

@LaurentGomila What would be a good comment then?

Member

eXpl0it3r commented May 23, 2014

@LaurentGomila What would be a good comment then?

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 May 23, 2014

Member

// Check here to make sure a context change does not happen after activate(true)

... or something ...

Member

binary1248 commented May 23, 2014

// Check here to make sure a context change does not happen after activate(true)

... or something ...

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r May 23, 2014

Member

I give you 10 min to complain, otherwise it will be merged! 😄

Member

eXpl0it3r commented May 23, 2014

I give you 10 min to complain, otherwise it will be merged! 😄

@eXpl0it3r eXpl0it3r added resolved and removed new labels May 23, 2014

@eXpl0it3r eXpl0it3r added this to the 2.2 milestone May 23, 2014

@eXpl0it3r eXpl0it3r self-assigned this May 23, 2014

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r May 23, 2014

Member

Fixed in a3ab6ef

Member

eXpl0it3r commented May 23, 2014

Fixed in a3ab6ef

@eXpl0it3r eXpl0it3r closed this May 23, 2014

@mantognini

This comment has been minimized.

Show comment
Hide comment
@mantognini

mantognini May 23, 2014

Member

Well, apparently OS X was also affected. The example above worked fine but more complex app had rendering issues (no crash)... but the patch fixed that. 👍

Member

mantognini commented May 23, 2014

Well, apparently OS X was also affected. The example above worked fine but more complex app had rendering issues (no crash)... but the patch fixed that. 👍

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 May 23, 2014

Member

The things OS X does to make sure it doesn't get the same reputation as Windows 😀.

Member

binary1248 commented May 23, 2014

The things OS X does to make sure it doesn't get the same reputation as Windows 😀.

jcowgill added a commit to jcowgill/SFML that referenced this issue Sep 22, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment