Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OgreTextAreaOverlayElement CPU hog #1156

Closed
1k0 opened this issue Apr 5, 2019 · 16 comments
Closed

OgreTextAreaOverlayElement CPU hog #1156

1k0 opened this issue Apr 5, 2019 · 16 comments

Comments

@1k0
Copy link
Contributor

1k0 commented Apr 5, 2019

System Information

  • Ogre Version: 1.11.5
  • Operating System / Platform: Win10
  • RenderSystem: GL3Plus

Detailed description

Frequently updated TextAreaOverlayElement-s (like a simple fps counter) will generate unacceptable amount of CPU usage in the buffer locking methods (updatePositionGeometry, updateColours).
This happens probably because of CPU-GPU synchronization, as TextAreaOverlayElement will use glMapBuffer, when using GL3Plus rendersystem.

In this case (and on single thread probably never) should OGRE use glMapBuffer, glBufferData is much faster with buffer orphaning. Experimented with the code a bit, and could not find a fast path when glMapBuffer was used.

My "final" solution to avoid using glMapBuffer was using a shadow buffer, at buffer creation. This put me on the glBufferData path (writeData()).

So to "fix" the CPU hog change the buffer creation code to this (changes in bold) :

    HardwareVertexBufferSharedPtr vbuf = 
        HardwareBufferManager::getSingleton().
            createVertexBuffer(
                decl->getVertexSize(POS_TEX_BINDING), 
                allocatedVertexCount,
                **HardwareBuffer::HBU_DYNAMIC_WRITE_ONLY_DISCARDABLE, true**);
    bind->setBinding(POS_TEX_BINDING, vbuf);

    // colours
    vbuf = HardwareBufferManager::getSingleton().
            createVertexBuffer(
                decl->getVertexSize(COLOUR_BINDING), 
                allocatedVertexCount,
                **HardwareBuffer::HBU_DYNAMIC_WRITE_ONLY_DISCARDABLE, true**);
    bind->setBinding(COLOUR_BINDING, vbuf);

An alternative solution without shadow buffer, would be to use a std::vector to assembe geomerty and color data at update, and use the writeData() method instead of lock()/unlock().

@paroj
Copy link
Member

paroj commented Apr 5, 2019

how about using

                // Discard the buffer
                access |= (offset == 0 && length == mSizeInBytes) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_INVALIDATE_RANGE_BIT;

in GL3PlusHardwareBuffer::lockImpl?

@1k0
Copy link
Contributor Author

1k0 commented Apr 5, 2019

Nope. With that its the same :S Will look around in GL docs tomorrow and see if it is possibe to setup MapBuffer to not do readback.

@paroj
Copy link
Member

paroj commented Apr 5, 2019

thanks for looking into it. Maybe a

OGRE_CHECK_GL_ERROR(glBufferData(mTarget, mSizeInBytes, NULL, getGLUsage(mUsage)));

at the above location will do.

Edit also which GPU are you using? The existing code seems not to stall on my NVidia card.

@1k0
Copy link
Contributor Author

1k0 commented Apr 5, 2019

GTX 1070
Tried many many combinations. Some didnt stall for the first application run, but stalled when lauched the app again, its very strange.
Tried the orphaning above (the same you suggested in prev comment).
GL_STATIC_DRAW for buffer usage
GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT for mapbuffer access
combination was good. But again for just one time. I will update my drivers tomorrow and look into if nvidia caches any app profile.

@1k0
Copy link
Contributor Author

1k0 commented Apr 5, 2019

This doc seems to be a good one about this topic, but my guess is the driver is misbehaving, and estimates the wrong strategy for buffer update.

https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf

@1k0
Copy link
Contributor Author

1k0 commented Apr 5, 2019

Well tried with latest nvdia drivers, tried with older 398. Its the same shit, sometimes on the first run only you will get the fast path. After that (2nd launch++) it will slow down at buffer lock (mapBufferRange). The VBO itself is very small and upgraded very often, the driver might use wrong heuristics but I can not do anything about that.

The "fix" I posted earlier works as expected. (EG updating a small VBO takes no time at all) Seems like glBufferData is the only reliable way to update VBO-s :/

@1k0
Copy link
Contributor Author

1k0 commented Apr 6, 2019

Finally found it, modern drivers are multithreaded, that will not play nice with mapping. If I disable driver multithreading in NV control panel the performance is good with MapBufferRange too.
The only problem is that the default setting for this behaviour is "automatic". As I have seen before NV driver will choose the multithreaded case. (except for first run maybe, that would explain the strange behaviour at first run)

@paroj
Copy link
Member

paroj commented Apr 6, 2019

thanks for investigating this. can you create a pull request with you shadow-buffer fix?

@1k0
Copy link
Contributor Author

1k0 commented Apr 6, 2019

Can do it tomorrow, should we patch all Overlay classes, or only Text? In my case only a text is updated frequently, but this behavoiur can be present in other Overlay classes.

@paroj
Copy link
Member

paroj commented Apr 6, 2019

the other elements have their buffers marked HBU_STATIC, so I would only go with Text for now.
However please add a comment that shadow buffers are used to avoid GPU driver sync.

@1k0
Copy link
Contributor Author

1k0 commented Apr 6, 2019

OK, I will patch the Text first, but I have my doubts about the other Overlay Elements too. As far as I remember, no matter what buffer usage is set glMapBufferRange will stall with multi-threaded driver. And glMapBufferRange will be used, when we update the VBO with lock/unlock and OverlayPanel uses locks too.
Probably will test the other GUI elements for same behaviour, and report back if I found anything nasty.

@paroj
Copy link
Member

paroj commented Apr 6, 2019

note that we balance stalls against memory duplication here. For frequently updating data, memory duplication is preferable, while for static data a stall is acceptable.

@1k0
Copy link
Contributor Author

1k0 commented Apr 7, 2019

Position can be a frequently updated data.
EG :
-an fps crosshair which changes its size on hover on enemy
-a name tag "over" a 3D scene entity (project back to 2D, and set position)
-a damage counter, which scakes at critical, or flies upwards and wanishes (common is MMO-s)

I would use OgrePanelOverlayElement to implement these effects, an set its position / size etc. According to the docs this is clearly a sane use-case. Also you should not care what OGRE does under the hood, it should be fast at all use-cases. Now this is not true.

Also memory duplication should not be a concern here imo, since overlayelements use very minimal amound of memory to describe vertex data.

I'm pretty fixed on avoiding glMapBufferRange, since the hit from driver sync is like 7% CPU usage vs 30% CPU usage (vsynced). If you choose to thake this hit thats fine by me, but in that case I will maintain a private fork of OGRE Overlay Component, because in my use case this hit is unacceptable and can be avoided with no extra cost.

@paroj
Copy link
Member

paroj commented Apr 7, 2019

In that case, please go ahead and change the position buffers to use shadow buffers (and declare them with HBU_DYNAMIC_* for clarity).

I just wanted to make sure that there is a valid use-case and we are not using shadow buffers just for the sake of it 😉

@1k0
Copy link
Contributor Author

1k0 commented Apr 9, 2019

Thats a vaild concern, I could not find a more elegant solution for the problem unfortunately thats why I opted for shadow buffer. Looks like the old usage pattern for MapBufferRange will always sync in multi threaded drivers. New OGL 4.4 flags which allow persistent mapping could be interesting though.

Created pull req with changes.

@paroj
Copy link
Member

paroj commented Apr 10, 2019

fixed by #1162

@paroj paroj closed this as completed Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants