Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multithreading option to gl port #294

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

juliusikkala
Copy link
Contributor

This PR adds a multithreaded mode to the GL port. It can be enabled and disabled through the new pdc_threading_mode variable. It is currently enabled by default in this PR branch.

If the multithreaded mode is enabled, OpenGL rendering is delegated to a separate thread. This thread continuously renders the latest state submitted from PDC_doupdate(). In this mode, vertical sync is also enabled, meaning that rendering should be done no faster than the display's refresh rate, regardless of how often PDC_doupdate() gets called. The rendering thread also knows when no new frames have been submitted, and pauses rendering until there is a new frame.

The reason why this definitely needs to be done in a separate thread instead of just skipping rendering sometimes in PDC_doupdate(), is that the latter approach could skip the last frame in a long "animation" sequence before a pause, leaving the wrong image on screen until the next PDC_doupdate() happens sometime much later. The thread doesn't have this issue, as it can redraw at any time after PDC_doupdate() has been called and make sure that the latest available frame is on screen at all times.

In practice, the multithreaded mode completely decouples OpenGL rendering performance from the curses refresh rate: the GPU doesn't have to render every frame submitted from curses and curses API functions aren't affected by rendering performance.

In single-threaded mode, speed gives me around 5700 frames/sec with PDC_FONT_SIZE=17, and 2650 frames/sec with PDC_FONT_SIZE=64. In the new multi-threaded mode, I get around 15500 frames/sec, and as expected, this is independent of font size! The program is completely CPU-bound in multithreaded mode; the main thread of speed is now running at 100% core utilization.

Do you think the threading mode should be an environment variable? Either way, I know that I'd like to keep it optional, since multithreading can make debugging harder, and there may be use cases where PDC_doupdate() should be guaranteed to always render (even if the frame would be skipped by the screen, it could be captured by some tool like RenderDoc).

@GitMensch
Copy link
Collaborator

While it possibly has less effect - shouldn't this applied identical to other ports, at least SDL2?

@juliusikkala
Copy link
Contributor Author

juliusikkala commented Jun 3, 2023

It could actually be even more useful in other ports, as it sidesteps basically all issues with slow rendering. The rendering/windowing APIs they use will likely mandate different approaches to multithreading, and I'm not familiar with their details. I know that OpenGL only allows a GL context being used from one thread at a time, which is why there's the SDL_GL_MakeCurrent(pdc_window, NULL); call to detach the context from the main thread and re-attach it to the rendering thread.

This is actually also why this PR had to rework some of the glyph caching stuff as well; we can't load the glyph textures during draw_glyph() because it's called from the main thread, which must not call any GL functions while the context is owned by the render thread. So all of the texture loading had to migrate over to the render thread as well.

SDL2 has a blanket recommendation to not call the built-in video functions from anywhere else than the main thread (SDL_Renderer, SDL_Surface software rendering etc). This shouldn't affect the OpenGL port since it's not really using those functions, but OpenGL directly instead. But it does mean that trying to apply this same rendering thread approach to the SDL2 port is a bit iffy.

@Bill-Gray
Copy link
Owner

Seems like a good idea to me. Actually, I'd been thinking along these lines, but thought : do we really want to deal with threads in Windows, OS/X, Linux, BSD, and OS/2 (yes, there are people using the SDL2 platform in OS/2)? I'd forgotten that SDL2 includes thread-handling routines.

On non-SDL2 platforms... well, in truth, if you're trying to display, say, 1000 frames/second, the Right Thing To Do is to render 60 frames/second. On X11 or VT, say, you could render 1000 frames/second, making your fans spin wildly, and your brain and eyes would only be capable of seeing 6% of them anyway. For that matter, on this platform, you can tell PDCursesMod to render 1000 fps, and 94% of them will get thrown out. Generally speaking, if your program is generating more than 60 fps, you should stop and ask yourself why.

@juliusikkala
Copy link
Contributor Author

juliusikkala commented Jun 5, 2023

Yeah, it would be preferrable from the library's point of view that the user program wouldn't spam frames that are never going to be shown anyway.

The (simplified) use case I have for this is a program that has an adjustable rate of cumulative state updates, meaning it draws on top of what was previously drawn, one step at a time. At its slowest at like 200 ms per step, it's easy to implement in such a way that the program sets wtimeout() according to the update rate, then repeatedly adds what it wants and calls wgetch() (any key press finishes the "animation"). The fastest used update delays can be as low as 4 ms, meaning 250 fps.

I could use some scheme to start skipping wgetch() calls and using a multiple of the refresh delay in wtimeout() whenever the delay is low, but that's extra complexity on the "user" side which can be avoided by making the library more efficient. Vertical synchronization is also nice-to-have, as it's a bit difficult to get working without unpredictable delays if a separate thread isn't used.

My personal goal for the GL port is to have performance be a non-issue on every supported platform in every situation. Still, I do acknowledge that most programs don't care and performance pitfalls are usually easy to avoid on the user side, so it's probably not worth spending non-trivial effort for other ports. This threaded approach doesn't directly translate to power savings either, unless the program is actually trying exceed 60 fps (or whatever the user's display refresh rate is).

@GitMensch
Copy link
Collaborator

So... should there be:

  • internal threading for rendering
  • this be an optional / default
  • if an option: compile-time and/or runtime

Further more, should there be

  • an internal limit to ensure there is no rendering code executed > N (24/30/60/90/120/240fps)
  • should that be configurable (on/off) at compile/runtime
  • should there be a new function to increate the limit at runtime (you know, if the default is 30 but you want to create a first-person-shooter then increasing to 120 FPS is possibly reasonable good, when you use PDCurses to test the ability of insects you may want to increase it to 240 PFS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants