Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ogre OpenGL RenderSystem gives poor performance. #440

Open
juj opened this issue Mar 29, 2012 · 10 comments
Open

Ogre OpenGL RenderSystem gives poor performance. #440

juj opened this issue Mar 29, 2012 · 10 comments

Comments

@juj
Copy link
Contributor

juj commented Mar 29, 2012

For a long time now, we've observed how OpenGL loses against the Direct3D render system in Ogre. Finally had a go at reviewing Ogre's OpenGL render system on a scene with about a thousand objects, which moderately share materials, meshes and textures with each other (groups of similar objects, like trees). The scene did not have anything animated or dynamic (skinned meshes, particle systems, hydrax or similar), only static content, and SkyX.

The results look as follows: http://dl.dropbox.com/u/40949268/Ogre/OpenGLRenderSystem.png

Closer (bad) observations:

  • Utilizes the GL matrix stack (glMatrixMode/glPushMatrix/glLoadMatrix/glPopMatrix)
  • Utilizes fixed pipeline mixed with programmable pipeline (glEnable/Disable(GL_LIGHT0), glLightModelfv, etc.)
  • Calls gl getters, like glIsEnabled(GL_SCISSOR_TEST), glGetInteger
  • Clears color, glClear(GL_COLOR_BUFFER_BIT), and does so at the beginning of the frame.
  • Redundantly enables and disables GL_SCISSOR_TEST states (but with the full screen as the rectangle) between render calls, even when scissoring is not used.
  • Double-bookkeeps both color and depth buffers for rendering shadow map depth information.
  • Performs redundant glUseProgramObjectARB calls to identical values even between render calls. (glUseProgramObjectARB(0) called multiple times in a row)
  • Calls glPointSize/glPointParameter/glTexEnvi(GL_POINT_SPRITE), but immediately afterwards disables point sprites with a call to glDisable(GL_POINT_SPRITE). The whole application does not use point sprites anywhere in it. 100% of point sprite state changes were redundant (2124 calls)
  • Calls glGetInfoLogARB in the middle of an inner rendering loop, apparently to validate a shader immediately prior to applying it with glUseProgramObjectARB.
  • Redundantly re-sets glTexParameteri whenever a texture is bound, even when values haven't changed.
  • Calls the deprecated (FFP) glTexEnvi(GL_TEXTURE_ENV, xxx) functions, for each render op, to specify full state, even when values haven't changed.
  • Based on "legacy" OpenGL 2. In total, 47.82% of calls were to functions deprecated in OpenGL 3.0.
  • Therefore, does not use VAOs on supported hardware (introduced in OGL 3).
  • Almost 60% of all GL state changes are redundant.
  • 22.42% (20941) of all GL state changes were calls to glTexEnvi. 96.65% of these calls were redundant.
  • Redundantly calls glEnableVertexAttribArray for already enabled vertex arrays.
  • Redundantly calls glClientActiveTextureARB(GL_TEXTURE0) even when GL_TEXTURE0 is specified.
  • Redundantly calls glActiveTextureARB(GL_TEXTURE0); glActiveTextureARB(GL_TEXTURE2); back-to-back.
  • Redundantly uses deprecated glDisableClientState(GL_VERTEX_ARRAY/GL_otherarrays) after finishing a render op, but re-enables the same arrays for the next render op.
  • Redundantly calls glDisableVertexAttribArray(x) after finishing a render op, but re-enables the same array indices for the next render op.
  • Redundantly specifies full fogging parameters (glFogi/glFogf calls) for each render op, even when the parameters do not change. 100% of the fog state change calls are redundant (2091 changes/frame).
  • Uses 32-bit indices for index buffers which only need 16-bit indices.
  • Redundantly specifies GL_POLYGON_OFFSET state for each render op, even when not used.
  • Repeatedly binds the same vertex buffer with a call to glBindBufferARB(GL_ARRAY_BUFFER) before rendering.
  • Specifies glMaterialfv parameters even when programmable shaders and glDisable(GL_COLOR_MATERIAL) is specified. 91.16% of these calls (2527 in total) are redundant.
  • Only 0.77% of all GL calls are actual draw calls.
  • During a single frame of rendering, that contained 716 draw calls, a total of 93398 OpenGL calls were made, i.e. ~130,4 state changes per draw call.
  • Function calls which never change during rendering, but are re-specified for each render op (100% redundant): glFogf, glPolygonMode, glPointParameterf, glColor4f, glClearDepth, glColorMask, glDepthFunc, glPointSize, glShadeModel, glBlendEquation, glPointParameterfv, glFogfv, glFogi, glCullFace, glDepthMask, glBlendFunc, glColorPointer, glLightfv, glLightf. These constitute about 25% of all state changes.

or in summary:

  • based on legacy and deprecated functions, not on modern OpenGL 3.
  • "object-oriented" nature of Ogre lends to a lot of redundant work. Not a data-driven renderer.
  • renderer is naive, no state sharing or automatic batching. A lot of the state changes profiled effective by gDebugger could be optimized by a state sharing mechanism.

As a comparison, I have recently implemented a renderer that works on D3D11+OpenGL3+GLES2 on PC, Android, OSX and iOS. A snapshot of a similar scene with lots of objects, sharing materials, meshes and textures: http://dl.dropbox.com/u/40949268/Asteroids/GLStateChanges.png . The renderer tracks shared usage of materials, meshes and textures, and sorts and batches to avoid redundant state sets, giving a very pleasing look on the object rendering inner loop (shown on the background).

Going forward, the takeaway is that the Ogre OpenGL render system cannot possibly be expected to match e.g. Unity in performance, without an extensive rewrite on top of OpenGL3, and re-doing the renderer architecture that collects and submits the renderable objects in Ogre.

This will be a major blocker when considering a port of Tundra on top of Android or iOS.

No surprise there, but marking down as an issue here, "lest we forget".

Possible future strategies:

  • rewrite/fix up the Ogre OpenGL rendersystem to a modern data-oriented GL3 renderer.
  • Investigate whether the Ogre GLES2 rendersystem would turn to an efficient GL3 renderer on desktop as well.
  • Move away from using Ogre as the renderer for Tundra.
@jonnenauha
Copy link
Member

Man seems like Ogres OpenGL is pretty poor :P I guess this mostly affects linux/mac for rendering perf. Quite deep insight on whats going on there. Going to another engine is a huge undertaking, not just replacing the rendering module but quite many ECs tie directly to ogre features directly or via the rendering module.

Anyways these shots show what we experienced in Tundra 2.3.0 Admino release that still had OpenGL rendering enabled. We never had anything with Ogre 1.7 series so it was quite the shock when first 3 test users experienced these rendering glitches and crashes of the graphics card drivers (both happened to me too). Both ATI and NVIDIA gfx cards had the problems, I wonder what so major happened in 1.8 to the GL side?

http://dl.dropbox.com/u/3589544/rmp-platform-mesh-artifact.png
http://dl.dropbox.com/u/3589544/rmp-platform-mesh-artifact2.png

@peterclemenko
Copy link
Contributor

It should be noted that there is an existing WIP OpenGL 3 render for ogre: https://bitbucket.org/masterfalcon/ogre-gl3plus

I'm not sure how far along it is, and it may need some improvements for performance, but it may be a starting position.

@jonnenauha
Copy link
Member

Yeah we have noticed that also. I think the plan is to check it out at some point. I could assume its solid code as its coming from masterfalcon himself.

@juj
Copy link
Contributor Author

juj commented May 29, 2012

ogre-safe-nocrashes is now extended to provide Ogre internals profiling data to Tundra profiler. One such trace look like this: http://dl.dropbox.com/u/40949268/Ogre/OgreRenderingProfile.png .

@peterclemenko
Copy link
Contributor

I feel obligated to post this, http://www.ogre3d.org/forums/viewtopic.php?f=4&t=70522
The roadmap just released for Ogre 1.9 and 2.0 involves a lot of things that will probably help performance.

@jonnenauha
Copy link
Member

Thanks, it got posted to the realxtend-dev google mailing list too. We are discussing it there :)

@holocronweaver
Copy link

Just stumbled upon this page via Google while doing research for my Google Summer of Code project. This summer I plan to mostly finish the OpenGL 3+ rendering system in OGRE, hopefully as a GSoC student if my proposal is accepted, though I plan to implement the most important changes this summer either way. In all likelihood this work will be done in Ogre 2.0, which also has a (probably succesful) GSoC project proposal attached to it. I will try to push my work to the main Ogre dev branch as often as possible so you can test out the new system before it is finished.
The GL3+ proposal can be viewed here.

@jonnenauha
Copy link
Member

Hey fun to see you stumbled onto our tracker :) Many of us have been following the GL3+ plugin and the DirectX equivalent (the new one) quite closely. We are currently waiting for 1.9 final to come out so we can take it in use. There is already software built based on Tundra so we probably need to stick with the official release versions/tags for Ogre, ofc we can locally develop and test the main branch where you will work on 2.0 but we are probably not going to ship anything with unfinished work.

@holocronweaver I recon you have read a bit of the GL3+ source to prepare for your GSoC project. What is your feeling currently about its state? It seems it will be shipped in 1.9 but I've gotten the impression from the forums that its not quite complete yet. Is it ready for real use, and in particular how well does it work on Windows? I'm personally curious what we should try out with the new rendering plugins once we get 1.9. How about the checklist of optimizations that are in the first post, has there been efforts to solve some of these during the GL3+ development, was its original focus to make a better performing GL rendering plugin or just to take Ogre to 3.x+?

@holocronweaver
Copy link

Unfortunately GL3+ is not yet production ready on almost any measure. If you take a look at the Ogre Samples using the GL3+ renderer, you will see that many which depend on GL3+ specific features are not yet functional (see near the end of GL3+ renderer thread for a list of working samples). My plan is to perform core work in 1.9, which will eventually be released as a patch, so official 1.9 should eventually have a production ready GL3+ renderer. However, priority will be given to Ogre 2.0. How much work is done in 1.9 mostly depends on what changes are made to the core rendering system in 2.0 and how feasible it is to port core work from 1.9 to 2.0. If 2.0 turns out to be radically different, I will work on 2.0 and leave GL3+ in 1.9 as is (partially functional).

As for Windows in particular, I cannot say as I have been working on GL3+ exclusively in Linux. However, I will be testing all three major platforms if my GSoC proposal is accepted. In terms of improvements over the old GL rendering system, I believe most performance increases in 1.9 are minor. Note that GL3+ requires the core profile, so you will be forced to use shaders instead of fixed functions, which could possibly result in some forced performance enhancements. =) The focus on 2.0 will be to improve the core rendering and scene graph performance, so I will likely contribute to that performance boost while reworking core components for GL3+.

I should mention that these are my thoughts on how to proceed, and Ogre devs may decide that focus needs to be on 1.9 no matter how difficult the eventual port to 2.0. I will presumably know more specifics next week when the accepted GSoC proposals are announced.

@jonnenauha
Copy link
Member

Thanks for the answers. If the Ogre 2.0 will break lots of things in our codebase anyhow, I guess we could wait for it to jump to the new renderers and out from fixed functions when 2.0 is out. It's just unfortunate that it will probably take a lot of time, but that's how it goes. We will probably jump to 1.9 at some point with the old renderers even if there would be no huge perf gains there.

Good luck on the GSoC project (hopefully it will go through), we'll be monitoring the progress from the bushes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants