New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OgreTextAreaOverlayElement CPU hog #1156
Comments
how about using // Discard the buffer
access |= (offset == 0 && length == mSizeInBytes) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_INVALIDATE_RANGE_BIT; in GL3PlusHardwareBuffer::lockImpl? |
Nope. With that its the same :S Will look around in GL docs tomorrow and see if it is possibe to setup MapBuffer to not do readback. |
thanks for looking into it. Maybe a OGRE_CHECK_GL_ERROR(glBufferData(mTarget, mSizeInBytes, NULL, getGLUsage(mUsage))); at the above location will do. Edit also which GPU are you using? The existing code seems not to stall on my NVidia card. |
GTX 1070 |
This doc seems to be a good one about this topic, but my guess is the driver is misbehaving, and estimates the wrong strategy for buffer update. https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf |
Well tried with latest nvdia drivers, tried with older 398. Its the same shit, sometimes on the first run only you will get the fast path. After that (2nd launch++) it will slow down at buffer lock (mapBufferRange). The VBO itself is very small and upgraded very often, the driver might use wrong heuristics but I can not do anything about that. The "fix" I posted earlier works as expected. (EG updating a small VBO takes no time at all) Seems like glBufferData is the only reliable way to update VBO-s :/ |
Finally found it, modern drivers are multithreaded, that will not play nice with mapping. If I disable driver multithreading in NV control panel the performance is good with MapBufferRange too. |
thanks for investigating this. can you create a pull request with you shadow-buffer fix? |
Can do it tomorrow, should we patch all Overlay classes, or only Text? In my case only a text is updated frequently, but this behavoiur can be present in other Overlay classes. |
the other elements have their buffers marked |
OK, I will patch the Text first, but I have my doubts about the other Overlay Elements too. As far as I remember, no matter what buffer usage is set glMapBufferRange will stall with multi-threaded driver. And glMapBufferRange will be used, when we update the VBO with lock/unlock and OverlayPanel uses locks too. |
note that we balance stalls against memory duplication here. For frequently updating data, memory duplication is preferable, while for static data a stall is acceptable. |
Position can be a frequently updated data. I would use OgrePanelOverlayElement to implement these effects, an set its position / size etc. According to the docs this is clearly a sane use-case. Also you should not care what OGRE does under the hood, it should be fast at all use-cases. Now this is not true. Also memory duplication should not be a concern here imo, since overlayelements use very minimal amound of memory to describe vertex data. I'm pretty fixed on avoiding glMapBufferRange, since the hit from driver sync is like 7% CPU usage vs 30% CPU usage (vsynced). If you choose to thake this hit thats fine by me, but in that case I will maintain a private fork of OGRE Overlay Component, because in my use case this hit is unacceptable and can be avoided with no extra cost. |
In that case, please go ahead and change the position buffers to use shadow buffers (and declare them with I just wanted to make sure that there is a valid use-case and we are not using shadow buffers just for the sake of it 😉 |
Thats a vaild concern, I could not find a more elegant solution for the problem unfortunately thats why I opted for shadow buffer. Looks like the old usage pattern for MapBufferRange will always sync in multi threaded drivers. New OGL 4.4 flags which allow persistent mapping could be interesting though. Created pull req with changes. |
fixed by #1162 |
System Information
Detailed description
Frequently updated TextAreaOverlayElement-s (like a simple fps counter) will generate unacceptable amount of CPU usage in the buffer locking methods (updatePositionGeometry, updateColours).
This happens probably because of CPU-GPU synchronization, as TextAreaOverlayElement will use glMapBuffer, when using GL3Plus rendersystem.
In this case (and on single thread probably never) should OGRE use glMapBuffer, glBufferData is much faster with buffer orphaning. Experimented with the code a bit, and could not find a fast path when glMapBuffer was used.
My "final" solution to avoid using glMapBuffer was using a shadow buffer, at buffer creation. This put me on the glBufferData path (writeData()).
So to "fix" the CPU hog change the buffer creation code to this (changes in bold) :
An alternative solution without shadow buffer, would be to use a std::vector to assembe geomerty and color data at update, and use the writeData() method instead of lock()/unlock().
The text was updated successfully, but these errors were encountered: