Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Render Speed #78

Closed
KilledByAPixel opened this issue Jun 18, 2024 · 10 comments
Closed

Improve Render Speed #78

KilledByAPixel opened this issue Jun 18, 2024 · 10 comments

Comments

@KilledByAPixel
Copy link
Owner

The webgl solution that is used is very fast, but could be faster. I am not sure what is slowing it down, it is probably something simple that could make a huge difference.

In the LittleJS stress test I max out at 50,000 at 60 fps

But in this pixi.js demo I can put over 3x that at 60 fps!

So where is the bottleneck in the rendering system? The webgl component of littlejs is extremely straightforward and simple so there must be something I am missing.

2024-06-18 11_28_45-Little JS Stress Test

@thewrath
Copy link

Hello,

I'm far from being a webGL expert but I took some time to reread the webgl implementation of LittleJS, some things can be improved to gain in performance:

I think the PixiJS stress test is very optimized for rendering a single sprite, but as littleJS embeds its textures in a textureMap sent to the GPU, it's already a good performance boost in my opinion 👍

I hope these elements will be relevant, it was very interesting to take a look at this part of LittleJS 😉

@KilledByAPixel
Copy link
Owner Author

Thank you for checking into it. I have tried those things before but it didn't seem to make much of a difference but I will investigate further.

I kept the webgl part super simple and easy to understand, so once we figure out what the cause is, it shouldn't be much trouble to rework it.

@KilledByAPixel
Copy link
Owner Author

I dug into that pixi js demo and solved the mystery of how that pixijs demo is so fast. I am annoyed at that demo because it's kind of fake and not at all representative of the speed you would get when making a game with pixijs.

The reason is that demo uses a specialized rendering system that would not be suitable for a normal game. It is called PIXI.ParticleContainer, here is the line where it is set up...

new PIXI.ParticleContainer(200000, [false, true, false, false, false]);

The PIXI.ParticleContainer is a special class where you can disable stuff to allow for super fast rendering. That is what the [false, true, false, false, false] does. All those falses. Pretty much everything is disabled except for the position. Even uvs and tinting of the sprite is disabled! The reason there are different colors is because they are actually different textures!

So, it's pretty much a bust here. A more general solution would not ever be able to reach these speeds. It makes sense now why it is able to do like 3x the sprites, because it is doing just the bare minimum where each vertex is just a position and each quad shows the entire texture.

I did some testing into this and with their demo I can get 170k sprites at 60fps. That sounds like a lot. However if instead of ParticleContainer, I just use a regular Container, I can only get 50k sprites. That is almost exactly the same amount of sprints I get in the LittleJS stress test demo.

With LittleJS each vertex is a position, uv, color, and additive color. That is 4x the size of each vertex in that demo. But all that is necessary for creating an actual game so many different sprites can combine together into a single draw call.

I also did a few more tests with LittleJS...

Moving rotation/scale calculations will not help us. I commented the calculations and it did not effect the speed.

I had previously used indexed verts but removed that to simplify. With indexed verts each quad would use only 4 verts instead of 6 but with other overhead it may end up being something like a 10% speed boost at most. Just my guess. Maybe not worth the extra effort and complication.

@thewrath
Copy link

Yes, it's not very representative of how sprites would be used in a game.

In the end, they're just showing off the rendering capabilities of their particle emitter.

Would it be interesting to build the same example with the LittleJS particle system? If I'm not mistaken, particle uses drawTile and therefore glDraw so performance will be close to the example with sprites?

But is it necessary to benefit from so much performance?

@KilledByAPixel
Copy link
Owner Author

The thing is, this doesn't show off their particle emitter because you would need to enable at least a few things for particles, or every particle would look exactly the same and not fade off. It is very misleading.

I don't want to follow their lead here. My goal with the LittleJS stress test is to test the upper bound of what is achievable in a realistic game scenario do devs can be well informed when making their game.

The particle system for LittleJS uses the same rendering as everything else, so that wouldn't really help us here.

I'd like to reinvestiate indexed rendering though. I think if it ends up giving a 10% or even 5% boost that's worth it. Should only take a few hours to switch it over when I have some time.

@codyebberson
Copy link
Contributor

codyebberson commented Jul 7, 2024

With LittleJS each vertex is a position, uv, color, and additive color. That is 4x the size of each vertex in that demo. But all that is necessary for creating an actual game so many different sprites can combine together into a single draw call.

On my machine, most CPU time is in glDraw copying data into glVertexData.

I agree with @thewrath that (1) moving rotation calculations into the shader and (2) instanced rendering are the 2 big opportunities.

With the current TRIANGLE_STRIP model, each instances ends up being 6 vertices * 6 elements = 36 elements. 36 elements * 4 bytes = 144 bytes.

With instanced rendering, you should be able to reduce that to:

  1. 4 position elements (x, y, width, height)
  2. 4 texCoord elements (u, v, width, height)
  3. 1 rotation element (angle)
  4. rgba
  5. rgbaAdditive

For a total of 11 elements. 11 elements * 4 bytes = 44 bytes.

44 bytes vs 144 bytes is a potential 3-4x speedup.

@codyebberson
Copy link
Contributor

Using instanced rendering yields about a 2x speedup on my machine.

Proof-of-concept PR: #82

Before: https://killedbyapixel.github.io/LittleJS/examples/stress/

After: https://codyebberson.github.io/LittleJS/examples/stress/

@KilledByAPixel
Copy link
Owner Author

@codyebberson amazing, I will get to integrating this right away!

@codyebberson
Copy link
Contributor

After instanced rendering, the new bottleneck is in the stress test itself. The main culprit is all of the memory allocations from Vector2.add(), which allocates a new Vector2. #83

After changing the allocation to mutation, I get 300k sprites at 80+ FPS.

Updated: https://codyebberson.github.io/LittleJS/examples/stress/

@KilledByAPixel
Copy link
Owner Author

Everything is hooked up now and working super well. This is a much bigger boost then I would have hoped for! thanks so much @codyebberson and @thewrath! Check out what I did in the gl code, I feel good about how tight it is now.

So, I think with that said, we are ready to close this issue out! 🚀😅👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants