Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Squid's VBO optimizations, add some more, add canvas support #83

Conversation

toad-dev
Copy link
Member

@toad-dev toad-dev commented Apr 7, 2022

  • Merged @SquidDev's VBO backend optimizations. This gave us a huge boost in monitor rendering performance when running in shader mod compatibility mode.
  • Squid also fixed the TBO backend not respecting fog.
  • Added a few more optimizations to the VBO backend myself.
  • Fixed an incompatibility with Canvas and the VBO backend.

At this point we have pretty good shader mod support. All users have to do is leave the monitor renderer option on BEST and the mod will support both Iris and Canvas. The only downside is that the VBO backend sometimes displays a slight stitching artifact. Fixing this would come at a performance penalty so I'm going to leave it as is for now.

SquidDev and others added 5 commits April 2, 2022 10:54
 - Remove the POSITION_COLOR render type. Instead we just render a
   background terminal quad as the pocket computer light - it's a little
   (lot?) more cheaty, but saves having to create a render type.

 - Use the existing position_color_tex shader instead of our copy. I
   looked at using RenderType.text, but had a bunch of problems with GUI
   terminals. Its possible we can fix it, but didn't want to spend too
   much time on it.

 - Remove some methods from FixedWidthFontRenderer, inlining them into
   the call site.

 - Switch back to using GL_QUADS rather than GL_TRIANGLES. I know Lig
   will shout at me for this, but the rest of MC uses QUADS, so I don't
   think best practice really matters here.

 - Fix the TBO backend monitor not rendering monitors with fog.
 
   Unfortunately we can't easily do this to the VBO one without writing
   a custom shader (which defeats the whole point of the VBO backend!),
   as the distance calculation of most render types expect an
   already-transformed position (camera-relative I think!) while we pass
   a world-relative one.

 - When rendering to a VBO we push vertices to a ByteBuffer directly,
   rather than going through MC's VertexConsumer system. This removes
   the overhead which comes with VertexConsumer, significantly improving
   performance.

 - Pre-convert palette colours to bytes, storing both the coloured and
   greyscale versions as a byte array. This allows us to remove the
   multiple casts and conversions (double -> float -> (greyscale) ->
   byte), offering noticeable performance improvements (multiple ms per
   frame).

   We're using a byte[] here rather than a record of three bytes as
   notionally it provides better performance when writing to a
   ByteBuffer directly compared to calling .put() four times. [^1]

 - Memorize getRenderBoundingBox. This was taking about 5% of the total
   time on the render thread[^2], so worth doing.

   I don't actually think the allocation is the heavy thing here -
   VisualVM says it's toWorldPos being slow. I'm not sure why - possibly
   just all the block property lookups? [^2]

Note that none of these changes improve compatibility with Optifine.
Right now there's some serious issues where monitors are writing _over_
blocks in front of them. To fix this, we probably need to remove the
depth blocker and just render characters with a z offset. Will do that
in a separate commit, as I need to evaluate how well that change will
work first.

The main advantage of this commit is the improved performance. In my 
stress test with 120 monitors updating every tick, I'm getting 10-20fps
[^3] (still much worse than TBOs, which manages a solid 60-100).

In practice, we'll actually be much better than this. Our network
bandwidth limits means only 40 change in a single tick - and so FPS is
much more reasonable (+60fps).

[^1]: In general, put(byte[]) is faster than put(byte) multiple times.
Just not clear if this is true when dealing with a small (and loop
unrolled) number of bytes.

[^2]: To be clear, this is with 120 monitors and no other block entities
with custom renderers. so not really representative.

[^3]: I wish I could provide a narrower range, but it varies so much
between me restarting the game. Makes it impossible to benchmark
anything!
Somehow this hits a happier path in the JVM. I guess it has trouble
inlining the VertexEmitter.vertex() calls because there are multiple
implementations, so reducing the number of calls and giving it a
chunkier function to JIT down helps? This is all conjecture because
I haven't figured out JitWatch yet :)

Anyways, this gives about a 9% improvement in my tests.
This gives about a 3% improvement in VBO rebuild stress tests, for the
cost of a little more memory.

getVertexCount() was showing up heavy in my profiles. Changing it to a
simple upper bound calculation melts that time away. If there's a
max size 0.5 text scale monitor in the scene, the buffer will grow to
~3 MB. For comparison's sake, the images in the "blit" program were
already growing the buffer to ~2.1 MB.
Canvas handles the matrix stack a little differently. We have to
multiply in the modelView matrix ourselves.

At this point we support both Iris and Canvas to an acceptable level.
Users just have to leave the monitor renderer option on BEST and
everything will be handled for them.
@Merith-TK Merith-TK merged commit f10f987 into cc-tweaked:mc-1.18.x/1.18.2 Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants