Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moonrider performance investigation #3

Open
diarmidmackenzie opened this issue Dec 17, 2022 · 6 comments
Open

Moonrider performance investigation #3

diarmidmackenzie opened this issue Dec 17, 2022 · 6 comments

Comments

@diarmidmackenzie
Copy link
Owner

diarmidmackenzie commented Dec 17, 2022

I've been looking at some generic Three.js / A-Frame performance improvements related to updateMatrixWorld()

See: mrdoob/three.js#25142 for background.

@dmarcos suggested I look at the impact these changes would have on Moonrider, as the most popular WebXR experience. I've raised this issue to document that investigation.

Initial Results

I did some quick profiling of Moonrider on Windows/Chrome (easier environment than a VR headset), and found it was spending > 70% of CPU inside updateMatrixWorld() suggesting it would be a great candidate for optimization.

image

A quick console check revealed 2902 objects in the scene graph:

count = 0; document.querySelector('a-scene').object3D.traverse((o) => count++ ); count
2902

Moonrider runs against A-Frame 1.1.0, so I back applied the changes in 25142 to THREE.js r123.1 and used this to build a modified version of A-Frame 1.1.0.

Three.js: https://github.com/diarmidmackenzie/three.js/tree/super-r123-1-performance
A-Frame: https://github.com/diarmidmackenzie/aframe/tree/1.1.0-performance

I then modified Moonrider to run using this verson of Three.js.

It seems to be running fine with this updated version. However, the performance gains are not as remarkable as hoped.

  • updateMatrixWorld now uses about 55% of CPU.
  • bear in mind that as CPU usage of updateMatrixWorld() goes down, the frame rate should go up,
  • so this could be read as a ~50% perf gain - the "other stuff" has gone from using 30% of CPU up to using 45% of CPU.
  • one would hope that would be accompanied by a ~50% increase in frame rate, but I don't have hard evidence for that yet... (I've focussed on CPU data rather than frame rate so far).

Nevertheless, the reduction from 70% to 55% was less than see with other apps. I'll explain why that is...

Why only small gains?

Of the 2902 objects in the scene graph, a console query shows that we are skipping matrix calculations for just 446 of them (about 15%).

objs = []; document.querySelector('a-scene').object3D.traverse((o) => {if (o.privateMatrixData !== o.matrixData) { objs.push(o) }}); objs.length
446

There's a few reasons for this:

  • A-Frame changes will only affect object3Ds create by geometry, gltf-model & obj-model. Moonrider has various bits of custom code that could also be adjusted to skip matrix calculations for more objects.
  • There seem to be a lot of GLTF objects with ~10 parts each (e.g. for broken blocks). These won't result in much optimization: we will retain a separate matrix for each of the 10 parts, and only skipone matrix at the root of the object model.

However there does seem to be some substantial scope to optimize performance further...

How to optimize further

As things stand, we have a lot of objects in the scene graph with distinct matrix data. Since we can't ignore non-identity matrices without compromising the experience, to optimize further, we'd need to significantly reduce the number of objects in the scene graph.

It looks as though most of the objects in the scene graph are inactive pool objects. My understanding is these pools exist to minimize performance issues when spawning new objects - instead of creating new objects from scratch, they get switched in from a pool.

These pools of objects are the reason why there are 2902 objects in the scene graph, even when there isn't that much going on on screen. They are invisible, so they don't have any impact on rendering performance, but they are still a part of the scene graph, which means they each have to be processed every frame in updateMatrixWorld() even though they aren't visible.

It could be tempting to suggest that updateMatrixWorld() should skip over non-visible objects, but there are good reasons to ensure that non-visible objects are correctly positioned, as non-visible objects can be used as colliders, raycasted against etc.

However these objects, when resting in pools don't need to have their matrices updated every frame. The simplest way to avoid these updates would be to simply remove these objects from the scene graph when they are sitting unused in a pool.

I believe this would be a simple change, that would substantially improve the performance of Moonrider. The performance overhead of adding an object3D back into the scene graph when activating it from a pool should be minimal, just a couple of matrix multiplications.

However I have held back from making any changes to moonrider code itself, due to some GitHub issues.

Issues with moonrider repo

I'm hitting an error when cloning the moonrider repo:

Downloading assets/img/envmap.psd (604 KB)
Error downloading object: assets/img/envmap.psd (0837264): Smudge error: Error downloading assets/img/envmap.psd (0837264a7ea743565218b35ee8cd4f5b2244810963a2de0a3c2d75a84dc35553): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

This error means I've not been able to set up any version control locally for the moonrider code. I've tried making a couple of changes along the lines above, but working without version control in an unfamiliar codebase is not an ideal situation...

I can come back and look at application-level optimizations (where I think there's lots of potential), when I have a working version control setup...

For the same reason, I haven't published a live version that uses the latest A-Frame yet.

Next steps & resources

To use the version of A-Frame with performance improvements, update the A-Frame version loaded in index.html to this:

<script src="https://cdn.jsdelivr.net/gh/diarmidmackenzie/aframe@1.1.0-performance/dist/aframe-master.min.js"></script>

Next steps:

  • application-level optimizations (once I'm able to cleanly clone the repo)
  • analyze FPS improvements, from existing & future fixes.
@diarmidmackenzie
Copy link
Owner Author

diarmidmackenzie commented Dec 17, 2022

On further review, it turned out the best place to make the further changes was in the A-Frame pool component, so no application changes needed (apart from pointing to the newer A-Frame).

This fix (much simpler than all the other stuff I have been working on!)

We've now got total CPU for updateMatrixWorld() down to < 30%, less than half of what it was when we started.

image

FPS on my Windows PC is definitely up from 55, now very close to 60. It's capped at 60 on PC, I believe, and still misses sometimes, so everything's not perfectly smooth still, but the overall CPU load must be substantially reduced.

@diarmidmackenzie
Copy link
Owner Author

Managed to do some testing on Quest 2, using the OVF Metrics Tools described here:
https://developer.oculus.com/documentation/unity/ts-ovrmetricstool/#collect-performance-data-with-ovr-metrics-tool

I played Ed Sheeran, Shape of You, hitting more-or-less all the blocks (I believe block fragments are more intesive to render than blocks).

Here's a FPS chart from the original (https://moonrider.xyz)

image

In this version frame rate was ~flat at 44-45 FPS

And here from a fixed version (https://diarmidmackenzie.github.io/moonrider)

image

In this version Frame rate did sometimes dip as low as 45FPS, but was much ore variable, and sometimes a lot higher.

So on average, the frame rate was considerably better, but still highly variable.

I tried to also record data to file, which would allow for a more detailed analysis, but haven't yet figured how to access the data (supposedly it's written to csv files in: /OVRMonitorMetricsService/CapturedMetrics/ but I can't see any such files on my headset).

@diarmidmackenzie
Copy link
Owner Author

I didn't notice any functional deficiencies in the updated version - seemed to be running absolutely fine.

@diarmidmackenzie
Copy link
Owner Author

diarmidmackenzie commented Dec 22, 2022

CPU usage data...

old..
image

new...
image

So we're seeing the expected gains in updateMatrixWorld().

Big difference between running on VR headset & on my PC is that on PC the punchable blocks don't get rendered, so I wasn't seeing the cost of that in my earlier analysis. Looks like the cost of rendering those blocks is now the dominant factor in overall performance terms.

I suspect savings can be made there, but probably needs to be done at the application level, e.g. using instancing for the blocks?

@devmaxxing
Copy link

devmaxxing commented Dec 29, 2022

Came across discussion about this in the WebXR discord. Not sure if it's applicable here, but I thought I'd mention that one thing I did for my rhythm game (https://github.com/CadenzaVR/cadenza) is to use InstancedMesh for each unique note type. So I just have one mesh/object for each note type instead of one for each note.

@diarmidmackenzie
Copy link
Owner Author

Thanks, yes, using instancedMesh should give some decent performance gains.

Using aframe-instanced-mesh should allow this to be done without too much reworking of the existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants