New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransientContextLock program hang #1165

Closed
OrderNexus opened this Issue Oct 28, 2016 · 12 comments

Comments

Projects
5 participants
@OrderNexus

OrderNexus commented Oct 28, 2016

First of all, I'm sorry for not being able to provide a minimal code example. I've tried to reduce my code down to the essentials, but I think I just can't reproduce the timing of the issue right. Hopefully the rest of my information will be sufficient.

Machine specifics: Intel Core i5, 4GB RAM, AMD Mobility Radeon HD 5600 graphics card.
Platform specifics: Windows 7 (64-bit), Visual Studio 2015 SP1, SFML 2.4.x & 2.4.x_alt.

This seems to somehow be related to #989, although it's no longer a stack overflow. After upgrading to the latest revisions of both 2.4.x and 2.4.x_alt branches (latest I've tried was Oct-26-2016), the stack overflow was fixed, but the program hangs in the exact same spot. The setup is as follows:

  • The main thread performs the window clearing, drawing, and other usual tasks.
  • Another thread is then used to load certain resources. This thread often invokes .loadFromFile on textures.
  • Once this thread is created and it hits the point where .loadFromFile is called, the window freezes (console still works fine). When the program is paused, the stack is as follows:
test.exe!sf::priv::MutexImpl::lock() Line 52    C++
test.exe!sf::Mutex::lock() Line 57  C++
test.exe!sf::Lock::Lock(sf::Mutex & mutex) Line 39  C++
test.exe!sf::GlResource::TransientContextLock::TransientContextLock() Line 79   C++
test.exe!sf::Texture::bind(const sf::Texture * texture, sf::Texture::CoordinateType coordinateType) Line 619    C++
test.exe!sf::RenderTarget::applyTexture(const sf::Texture * texture) Line 492   C++
>test.exe!sf::RenderTarget::clear(const sf::Color & color) Line 106 C++
test.exe!Window::BeginDraw() Line 33    C++
test.exe!Game::Render() Line 57 C++
test.exe!main(int argc, void * * * argv) Line 10    C++
  • The > highlighted line is as simple .clear call of sf::RenderWindow:

void Window::BeginDraw() { m_window.clear(sf::Color::Black); }

  • The stack of the other thread that invokes .loadFromFile on a texture is as follows:
test.exe!sf::priv::MutexImpl::lock() Line 52    C++
test.exe!sf::Mutex::lock() Line 57  C++
test.exe!sf::priv::GlContext::acquireTransientContext() Line 251    C++
test.exe!sf::GlResource::TransientContextLock::TransientContextLock() Line 88   C++
test.exe!sf::Texture::getValidSize(unsigned int size) Line 722  C++
test.exe!sf::Texture::create(unsigned int width, unsigned int height) Line 120  C++
test.exe!sf::Texture::loadFromImage(const sf::Image & image, const sf::Rect<int> & area) Line 244   C++
test.exe!sf::Texture::loadFromFile(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & filename, const sf::Rect<int> & area) Line 212  C++
>test.exe!TextureManager::Load(sf::Texture * l_resource, const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & l_path) Line 11   C++
  • The texture manager method that "crashes" simply invokes the .loadFromFile like so:
bool Load(sf::Texture* l_resource, const std::string& l_path) {
    return l_resource->loadFromFile(Utils::GetWorkingDirectory() + l_path);
}

As I mentioned before, for some reason the problem didn't persist when reduced to basic code. Sorry if I didn't explain it thoroughly enough, and let me know if there's anything else I can provide.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Oct 29, 2016

Member

I can't reproduce this problem. Looking at both the stack traces you provided, this doesn't look like a deadlock either. The first thread is waiting to lock the mutex that was already acquired by the second thread. The second thread is waiting to lock the mutex that is only accessible within GlContext.cpp. If there is a deadlock, then it must have to do with at least a third thread that you are running. We will need this information or someone else will have to try to reproduce this issue independently. At the moment, there is no way to know what needs fixing because to me there is no apparent problem with the code.

Member

binary1248 commented Oct 29, 2016

I can't reproduce this problem. Looking at both the stack traces you provided, this doesn't look like a deadlock either. The first thread is waiting to lock the mutex that was already acquired by the second thread. The second thread is waiting to lock the mutex that is only accessible within GlContext.cpp. If there is a deadlock, then it must have to do with at least a third thread that you are running. We will need this information or someone else will have to try to reproduce this issue independently. At the moment, there is no way to know what needs fixing because to me there is no apparent problem with the code.

@OrderNexus

This comment has been minimized.

Show comment
Hide comment
@OrderNexus

OrderNexus Oct 30, 2016

Hello, binary1248. Thanks for your prompt reply. I have managed to track down the source of this issue. It turns out that I was loading a shader from a file before a render window was being created. Normally this doesn't change anything, but if a thread gets launched after the window creation, and if it attempts to load a texture (haven't tested it with other resource types) from a file, any thread afterwards that attempts to get a lock on the context will end up sleeping forever, waiting for it to be unlocked. Here's the minimal code snippet to reproduce the issue:

void Test() {
    sf::Texture t;
    t.loadFromFile("SomeTexture.png");
    std::cout << "Texture loaded!" << std::endl;
}

int main() {
    sf::Shader shader;
    shader.loadFromFile("default.vert", "default.frag");
    sf::RenderWindow window(sf::VideoMode(640,480), "Window");
    sf::Thread t(&Test);
    t.launch(); // <-- Program runs fine if this line is commented out.
    bool running = true;
    while (running) {
        sf::Event event;
        while (window.pollEvent(event)) {
            if (event.type == sf::Event::Closed) { running = false; }
        }
        window.clear();
        window.display();
    }
    window.close();
    return 0;
}

It was rather stupid of me to load a shader before the window is created. It did throw me for a loop though, considering the fact that everything ran just fine until a thread attempted to load a texture.

EDIT: The documentation also doesn't say anywhere that a window needs to be created.

OrderNexus commented Oct 30, 2016

Hello, binary1248. Thanks for your prompt reply. I have managed to track down the source of this issue. It turns out that I was loading a shader from a file before a render window was being created. Normally this doesn't change anything, but if a thread gets launched after the window creation, and if it attempts to load a texture (haven't tested it with other resource types) from a file, any thread afterwards that attempts to get a lock on the context will end up sleeping forever, waiting for it to be unlocked. Here's the minimal code snippet to reproduce the issue:

void Test() {
    sf::Texture t;
    t.loadFromFile("SomeTexture.png");
    std::cout << "Texture loaded!" << std::endl;
}

int main() {
    sf::Shader shader;
    shader.loadFromFile("default.vert", "default.frag");
    sf::RenderWindow window(sf::VideoMode(640,480), "Window");
    sf::Thread t(&Test);
    t.launch(); // <-- Program runs fine if this line is commented out.
    bool running = true;
    while (running) {
        sf::Event event;
        while (window.pollEvent(event)) {
            if (event.type == sf::Event::Closed) { running = false; }
        }
        window.clear();
        window.display();
    }
    window.close();
    return 0;
}

It was rather stupid of me to load a shader before the window is created. It did throw me for a loop though, considering the fact that everything ran just fine until a thread attempted to load a texture.

EDIT: The documentation also doesn't say anywhere that a window needs to be created.

@binary1248 binary1248 added s:accepted and removed s:undecided labels Nov 2, 2016

@binary1248 binary1248 added this to the 2.5 milestone Nov 2, 2016

@binary1248 binary1248 self-assigned this Nov 2, 2016

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 2, 2016

Member

Yes... now I see that this is an edge case that happens only when certain methods are called the first time and cached OpenGL values are initialized. Looking at the way the locking is done now, it has become a bit too messy for my taste and harder to maintain. I'll have to clean up the synchronization and this problem should also be solved.

Member

binary1248 commented Nov 2, 2016

Yes... now I see that this is an edge case that happens only when certain methods are called the first time and cached OpenGL values are initialized. Looking at the way the locking is done now, it has become a bit too messy for my taste and harder to maintain. I'll have to clean up the synchronization and this problem should also be solved.

@dwlemmer

This comment has been minimized.

Show comment
Hide comment
@dwlemmer

dwlemmer Nov 9, 2016

I am running into a deadlock even though I am not creating a shader before creating the window. The TransientContextLock in sf::Shader::compile() leaves the mutex in GLContext.cpp locked!

dwlemmer commented Nov 9, 2016

I am running into a deadlock even though I am not creating a shader before creating the window. The TransientContextLock in sf::Shader::compile() leaves the mutex in GLContext.cpp locked!

@OrderNexus

This comment has been minimized.

Show comment
Hide comment
@OrderNexus

OrderNexus Nov 9, 2016

You should try and narrow the specifics of your issue as well by attempting to create a minimal code example to help out the devs. Also, it would be extremely helpful if you gave some of your environment specifics, such as the version of SFML used, your OS, etc.

OrderNexus commented Nov 9, 2016

You should try and narrow the specifics of your issue as well by attempting to create a minimal code example to help out the devs. Also, it would be extremely helpful if you gave some of your environment specifics, such as the version of SFML used, your OS, etc.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 9, 2016

Member

For this issue, it has less to do with the concrete OpenGL implementation or OS and more about the order in which the (platform-independant) code is executed. I'm sure people on Linux and OS X will be able to reproduce this issue in some form or the other. As I said, I myself am unhappy with the current implementation due to the aforementioned reasons. You can be sure that I will work on resolving the issue as soon as I have the time to do so. At this point, enough information has been provided for me to work with. It's just a matter of time.

Member

binary1248 commented Nov 9, 2016

For this issue, it has less to do with the concrete OpenGL implementation or OS and more about the order in which the (platform-independant) code is executed. I'm sure people on Linux and OS X will be able to reproduce this issue in some form or the other. As I said, I myself am unhappy with the current implementation due to the aforementioned reasons. You can be sure that I will work on resolving the issue as soon as I have the time to do so. At this point, enough information has been provided for me to work with. It's just a matter of time.

@dwlemmer

This comment has been minimized.

Show comment
Hide comment
@dwlemmer

dwlemmer Nov 10, 2016

I'm sorry to say "me too" without more details. I don't really have time to create the minimal repro. I'm on Win10 64 bit. Going back to 2.4.0 fixes the problem for me.

dwlemmer commented Nov 10, 2016

I'm sorry to say "me too" without more details. I don't really have time to create the minimal repro. I'm on Win10 64 bit. Going back to 2.4.0 fixes the problem for me.

@binary1248 binary1248 modified the milestones: 2.4.2, 2.5 Nov 12, 2016

binary1248 added a commit that referenced this issue Nov 27, 2016

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes #1165.
@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Nov 27, 2016

Member

Mind to give #1172 a try?

Member

binary1248 commented Nov 27, 2016

Mind to give #1172 a try?

binary1248 added a commit that referenced this issue Nov 27, 2016

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes #1165.

binary1248 added a commit that referenced this issue Nov 27, 2016

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes #1165.
@myroidtc

This comment has been minimized.

Show comment
Hide comment
@myroidtc

myroidtc Nov 28, 2016

I was experiencing the same problem and can confirm #1172 fixes the problem. Good work, binary!

EDIT: Also, a side-note: I just noticed that the internal context change has stopped screen flickering in fullscreen when two threads are drawing to things. It drove me crazy and I'm happy it's fixed.

myroidtc commented Nov 28, 2016

I was experiencing the same problem and can confirm #1172 fixes the problem. Good work, binary!

EDIT: Also, a side-note: I just noticed that the internal context change has stopped screen flickering in fullscreen when two threads are drawing to things. It drove me crazy and I'm happy it's fixed.

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Dec 2, 2016

Member

@OrderNexus and @dwlemmer could you give #1172 a try as well?

Member

eXpl0it3r commented Dec 2, 2016

@OrderNexus and @dwlemmer could you give #1172 a try as well?

@OrderNexus

This comment has been minimized.

Show comment
Hide comment
@OrderNexus

OrderNexus Dec 5, 2016

I can also confirm that #1172 fixed this issue! Good work, guys! 👍

OrderNexus commented Dec 5, 2016

I can also confirm that #1172 fixed this issue! Good work, guys! 👍

@OrderNexus OrderNexus closed this Dec 5, 2016

@eXpl0it3r eXpl0it3r added s:superseded and removed s:accepted labels Dec 5, 2016

@eXpl0it3r

This comment has been minimized.

Show comment
Hide comment
@eXpl0it3r

eXpl0it3r Dec 6, 2016

Member

Superseded by #1172

Member

eXpl0it3r commented Dec 6, 2016

Superseded by #1172

binary1248 added a commit that referenced this issue Jan 25, 2017

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes #1165.

binary1248 added a commit that referenced this issue Jan 27, 2017

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes #1165.

@binary1248 binary1248 moved this from WIP to Merged in SFML 2.5.0 Feb 3, 2017

iamPHEN added a commit to Bablawn3d5/SFML that referenced this issue Mar 11, 2017

Replaced TransientContextLock implementation with a more elaborate on…
…e which relies on locking a single mutex and thus avoids lock order inversion. Fixes SFML#1165.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment