Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Fixez #4093

Merged
merged 5 commits into from
Jan 22, 2018
Merged

More Fixez #4093

merged 5 commits into from
Jan 22, 2018

Conversation

kd-11
Copy link
Contributor

@kd-11 kd-11 commented Jan 21, 2018

  • Partial fix for extended clamp range draws; Try to preserve z information for draws outside the [0,1] range if the clip extents allow it. Range compression for [0,1] draws is disabled as it interferes with regular draw ops in a negative way, so this workaround will result in z fighting on the near and far plane when extended range is used.
  • Implement decoding a shuffle flag in the low bits of the texture remap vector. Seems to affect only some formats. Texture format fixes also added, now rpcs3 almost passes all the texture formats test from autotests.
  • Scheduler fixes. Fix a performance regression introduced in the last PR. Also tweaks ryzen optimizations to work better with the regression fixed for some significant gains. Thanks to @Zangetsu38 for helping tweak and test this one.
  • Removes a workaround in texture cache as it seems to not be required with texture formats fixed. There are almost no test cases for this but I expect no regressions.
  • OpenGL performance optimizations. Treats entire attribute buffer as a fixed heap addressed with index offsets instead of the "sliding window" approach that incurred overhead setting up the texture buffers. This was very expensive on nvidia drivers leading to very poor OpenGL performance on nvidia cards.
  • Minor fixes for overlays. Do not assume swap queue is in a well defined state since a flip can be requested externally. Also avoids processing unsupported glyphs in font handlers.

- Use edges of depth range to map clamped stuff

Disable range compression on regular draws vs extended range draws
- Some applications require full 0-1 usage without compromises.
-- TODO: This leaves the extended range z values to fight with regular draws in the .99 - 1.0 range
- Implement low bit decode override flags for 2-component textures
- Properly implement alot of texture remaps according to the autotest results

rsx: Do not unnecessarily shuffle WZYX->RGBA unless we have proof
- From looking at format swizzles, this is incorrect
- more threads for rsx
- better 1600
- opengl driver optimization for nvidia. On nvidia glTextureBufferRange performance is horrendous
-- Initialize texture buffer to whole buffer at startup and use absolute offsets to read data instead
-- Over 2x performance in some cases (Resogun, TNT racers)
- gl/vk: Do not flip non-existent display buffers. Fixes spec violation at boot in TNT racers demo
- whitespace fixes for sys_rsx
- vulkan: Do not assume an aux frame context must exist in a well defined state as set in init_buffers() since the request might be external (via overlays path)
- gl: Do not bother waiting for idle before servicing external flip requests
- gl: Queue overlay cleanup requests to ensure only glthread attempts touching the context
- overlays: Do not compute size metrics for invalid/unsupported glyphs
@kd-11
Copy link
Contributor Author

kd-11 commented Jan 22, 2018

Further tweaks to the ryzen scheduler may happen but I don't expect any major improvements. Performance should now match native linux performance on windows which is very good for 1600+ owners.

@kd-11 kd-11 merged commit 4f01794 into RPCS3:master Jan 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants