New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Render Pipeline Bureaucracy #326

Closed
ilexp opened this Issue May 8, 2016 · 7 comments

Comments

1 participant
@ilexp
Member

ilexp commented May 8, 2016

Summary

Investigate where the render pipeline could be improved with regard to reducing virtual method calls, class (vs. struct) usage and overall OOP chit-chat.

Analysis

  • This will require a reliable rendering benchmark with a lot of objects. Edit: Moved the details to issue #549.
  • Each ICmpRenderer has its own IsVisible call. 95% of them just check for the visibility groups and / or the screen overlay flag being set, followed by checking if they're in bound-radius range of the viewport.
    • Would it be an improvement to instead just expose a group filter flags property and do all the group and radius-culling checks in a tight loop inside the render pipeline?
    • Every interface API usage is a virtual call. Reduce them to an absolute minimum.
      • ICmpRenderer.GetCullingData out-returns a struct with all information required to decide whether it will be visible to a device or not. Can be invoked to directly write into some internal RawList array store, which can then be batch-processed easily.
      • Bonus: GetCullingData only needs to be called once per frame per object and its results can be re-used by every camera and rendering pass.
      • ICmpRenderer.Draw does the rendering like it does right now.
  • There are a lot of interface calls involved. See if it is possible to get rid of some of them, or transform per-object calls to one "group" call for N objects.
    • In general, prefer one call with N objects over N calls for each object individually.
  • CalcZOffset takes a lot of time for large vertex arrays and could be avoided entirely.
    • Tilemaps and similar renderers can sometimes submit thousands of vertices. Calculating the depth offset of 4096 vertices takes about 0.1 ms, which isn't critical, but very unfortunate, given its redundant nature.
    • Most of the time, the renderer has a perfectly clear idea of the representative Z offset of a submitted drawbatch. Why not provide the API to specify it?
    • When not specifying it, CalcZOffset could be used as a fallback.
  • Don't sacrifice usability on the game side.
    • ICmpRenderer should still act "per-object" and expose one draw method per object.
    • However, it would be possible to introduce an optional ICmpMultiRenderer interface where one draw method is called with an array of all queried objects of the matching type is provided.
      • If there is a significant performance improvement, this could be a neat middle ground between having a full "grouped Component" and having one Component per object.
      • With the multi-renderer interface in mind, it might make sense to also refactor the IRendererVisibilityStrategy interface, so the query method actually provides guaranteed type-sorted renderer arrays / RawLists in some form.
        • Keep the overhead low. Potentially even keep the "exposed RawList" approach, but enforce specifying "type sections" inside that list.
      • If this shows a notable improvement (yet to be shown), we could also go a similar route for updates and provide a matching ICmpMultiUpdatable interface.
  • etc. / investigate

@ilexp ilexp added this to the v3.0 milestone May 8, 2016

@ilexp

This comment has been minimized.

Member

ilexp commented May 9, 2016

Issue:

  • The entire engine assumes that everything that is renderable implements ICmpRenderer. Introducing a new ICmpMultiRenderer interface will cause a lot of work if it is a completely distinct interface.

Potential Solution:

  • ICmpMultiRenderer could derive from ICmpRenderer.
  • They would essentially share the exact same interface (No special needs by the multi version), the only difference would be that the multi-version would be invoked only once for all Components of the same type.
  • Multi-renderer implementations would keep track of instances of their type themselves.
    • Can be easily done by populating a static or shared list in Activate / Deactivate.
    • This also circumvents the requirement to care for per-Component type sorting in the visibility strategy / drawcall collection code. Which kind of makes sense, since the core has to do some work in order to find all the matching Components, while the Component implementation itself has much easier access to that information.
    • Then again, with no way to provide a list of renderers as parameter for the Draw method, there is quite a bit of potential for bugs and out of sync data.. this might be an issue.
  • Will still have to look out for problems arising from the fact that every ICmpMultiRenderer would also be an ICmpRenderer and thus could be used in the wrong way. Is there some way around this?

Potential Solution:

  • Investigate how exactly ICmpRenderer is used everywhere and determine if a common base interface can be extracted from the single- and multi renderer versions.
  • This would be much cleaner.

Potential Solution:

  • Not have an ICmpMultiRenderer for now and leave it to the users to have a shared renderer component.
  • Actually not the worst solution either. Maybe prefer this for now.

@ilexp ilexp added the Rendering label Jul 3, 2016

ilexp referenced this issue in BraveSirAndrew/duality Jul 9, 2016

Andrew O'Connor
Removed the optimization in DrawBatch.AppendJIT
Rolling average z sort index needs more thought. Going back to
calculating z sort based on all available vertices.
@ilexp

This comment has been minimized.

Member

ilexp commented Sep 18, 2017

Progress

  • Investigated the possibility of streamlining the culling step using a generalized CullingInfo struct.
    • One problem that occurred is the special ScreenOverlay flag that is either set or not, but does not have a "don't care" state, which is required for some renderers.
    • Currently, they can just omit the check. This will no longer be an option. Need to address this before being able to start working on this.
  • Benchmarks Sample results:

Immediate ToDo

  • To prepare for the VisibilityFlag change, update all sample resources to get rid of warning noise from previous changes.
  • Change VisibilityFlag implementation to get rid of a special ScreenOverlay flag.
    • Instead, define DefaultWorld and DefaultOverlay (replacing Group0 and Group1) and define cameras and rendering setups in a way to use them in their passes.
    • Update all sample resources.
  • Rework ICmpRenderer API into a GetCullingInfo(ref info) / Draw(device) combo with no other additions.
    • Define a CullingInfo struct with position, radius, visibility groups.
    • Rework culling code to gather all culling data only once per RenderPointOfView, evaluated per-pass. Can probably even share evaluated results for position / radius visibility.
  • Investigate whether specifying a reference depth values instead of manually doing CalcZOffset pays off in the benchmarks multi-sprite renderer test.
    • If it does, introduce optional DrawDevice API to specify the depth value and use it in the Tilemaps renderer and benchmark.
@ilexp

This comment has been minimized.

Member

ilexp commented Sep 18, 2017

Progress

  • Updated all sample resources to cut down on the noise when doing the visibility flag update later on.

Immediate ToDo

  • Change VisibilityFlag implementation to get rid of a special ScreenOverlay flag.
    • Instead, define DefaultWorld and DefaultOverlay (replacing Group0 and Group1) and define cameras and rendering setups in a way to use them in their passes.
    • Update all sample resources.
  • Rework ICmpRenderer API into a GetCullingInfo(ref info) / Draw(device) combo with no other additions.
    • Define a CullingInfo struct with position, radius, visibility groups.
    • Rework culling code to gather all culling data only once per RenderPointOfView, evaluated per-pass. Can probably even share evaluated results for position / radius visibility.
  • Investigate whether specifying a reference depth values instead of manually doing CalcZOffset pays off in the benchmarks multi-sprite renderer test.
    • If it does, introduce optional DrawDevice API to specify the depth value and use it in the Tilemaps renderer and benchmark.
@ilexp

This comment has been minimized.

Member

ilexp commented Sep 19, 2017

Progress

  • As it turns out, the DefaultWorld / DefaultOverlay visibility flag solution brings along its own problems: With specific predefined groups for world and overlay and all other groups being general-purpose, there is no way to identify all world renderers vs. all overlay renderers.
    • The current ScreenOverlay flag solves this nicely by being able to be combined with any existing group.
    • This is crucial for editor support: The camera view needs to render and perform picking operations on all objects, not just the ones in a specific visibility group. However, it can only do so when knowing which are overlay and which are world renderers.
    • It seems preferable to keep the ScreenOverlay flag as-is for now and disallow renderers that are both overlay and world renderers. The existing ones can be split up into multiple objects.

Immediate ToDo

  • Rework ICmpRenderer API into a GetCullingInfo(ref info) / Draw(device) combo with no other additions.
    • Define a CullingInfo struct with position, radius, visibility groups.
    • Rework culling code to gather all culling data only once per RenderPointOfView, evaluated per-pass. Can probably even share evaluated results for position / radius visibility.
  • Investigate whether specifying a reference depth values instead of manually doing CalcZOffset pays off in the benchmarks multi-sprite renderer test.
    • If it does, introduce optional DrawDevice API to specify the depth value and use it in the Tilemaps renderer and benchmark.
@ilexp

This comment has been minimized.

Member

ilexp commented Sep 19, 2017

Progress

  • Introduced new ICmpRenderer API based on GetCullingInfo(out info).
  • Updated DefaultRendererVisibilityStrategy to only gather culling data once per frame and re-use it when querying device visibility.
  • Postponed re-usage of spatial culling results, as it is possible to use different projection matrices in each rendering step despite sharing the same viewport.
  • Profiled and optimized culling related Rect and SpriteRenderer code.
  • Performed a new benchmark sample run.
  • Changes in performance result from SpriteRenderer optimization, not the restructured culling code. However, the new structure provides better opportunities for future optimization and also feels a lot cleaner. Less "object oriented" and more "data oriented".

Immediate ToDo

  • Investigate a bug that was reported by DeathKiller:

    with the latest 3.0 commit, there is one black frame every time right after I change scene and it doesn't look good... I don't know if it's something wrong with my version or it is a bug, but I applied only "Sep 19" commits (it was ok before)

    • Can't reproduce it using the physics sample to switch scenes at runtime, neither in editor mode, nor in launcher mode.
    • Awaiting feedback / info on how to reproduce.
  • Investigate whether specifying a reference depth values instead of manually doing CalcZOffset pays off in the benchmarks multi-sprite renderer test.
    • If it does, introduce optional DrawDevice API to specify the depth value and use it in the Tilemaps renderer and benchmark.
@ilexp

This comment has been minimized.

Member

ilexp commented Sep 20, 2017

Progress

  • Fixed the bug that was reported by DeathKiller.
    • A scenes visibility strategy was updated after a scene update and after loading.
    • When a scene switch was scheduled and occurred after the update, the switched-to scene would be already ready for rendering, because culling was updated on load.
    • However, when using runtime-constructed scenes, there was no on load, leading to the first rendered frame having missed its culling update.
    • Fixed the issue by moving the on load update into on enter.

Immediate ToDo

  • Investigate whether specifying a reference depth values instead of manually doing CalcZOffset pays off in the benchmarks multi-sprite renderer test.
    • If it does, introduce optional DrawDevice API to specify the depth value and use it in the Tilemaps renderer and benchmark.

@ilexp ilexp self-assigned this Sep 21, 2017

@ilexp

This comment has been minimized.

Member

ilexp commented Sep 22, 2017

Progress

  • Investigated the performance impact of CalcZSortIndex.
    • Even with 10x the number of vertices, the impact in the multi-sprite renderer benchmark was negligible compared to writing / generating the vertex data itself.
    • Introducing another overload for specifying the depth value did however increase API complexity slightly.
    • Decided to not introduce this optimization for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment