Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make helper invocations take part in subgroupQuad operations #1798

Merged
merged 1 commit into from May 25, 2022

Conversation

perlfu
Copy link
Contributor

@perlfu perlfu commented May 5, 2022

Subgroup quad broadcasts should be marked as WQM because
helper invocations take part in subgroup operations if enabled.

This patch is free standing, but is intended to be paired with
LLVM D124981 to address a potential issue with Vulkan CTS test:
dEQP-VK.draw.renderpass.shader_invocation.helper_invocation

@perlfu perlfu requested a review from a team as a code owner May 5, 2022 06:25
@amdvlk-admin
Copy link
Collaborator

Test summary for commit 777c182

CTS tests (Failed: 1/187820)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65225 (56.2%)
    • Failed: 0/65225 (0.0%)
    • Not Supported: 28580/65225 (43.8%)
    • Warnings: 0/65225 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 31111/57370 (54.2%)
    • Failed: 0/57370 (0.0%)
    • Not Supported: 26259/57370 (45.8%)
    • Warnings: 0/57370 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37802/65225 (58.0%)
    • Failed: 1/65225 (0.0%)

      Failures:

      FAILURE: dEQP-VK.synchronization.basic.event.multi_secondary_command_buffer
      Stack trace: Script:
      synchronizationWrapper->queueSubmit(queue, *fence): VK_TIMEOUT at vktSynchronizationBasicEventTests.cpp:337
      
      

    • Not Supported: 27422/65225 (42.0%)
    • Warnings: 0/65225 (0.0%)

@jayfoad
Copy link
Member

jayfoad commented May 5, 2022

You've only done this for CreateSubgroupQuadBroadcast. What about CreateSubgroupQuadSwap*?

More importantly, what about all the other subgroup operations that don't have "quad" in their name. They can also access other lanes in their quad, but they can also access lanes from outside the quad, so what does the spec say about how they are supposed to work? (I.e. are there different rules for subgroup "quad" operations and subgroup "non-quad" operations?)

@perlfu
Copy link
Contributor Author

perlfu commented May 5, 2022

You've only done this for CreateSubgroupQuadBroadcast. What about CreateSubgroupQuadSwap*?

More importantly, what about all the other subgroup operations that don't have "quad" in their name. They can also access other lanes in their quad, but they can also access lanes from outside the quad, so what does the spec say about how they are supposed to work? (I.e. are there different rules for subgroup "quad" operations and subgroup "non-quad" operations?)

Yes, I suspect it needs to be added to quite a few more operations.
We do already use softwqm for several of the ballots.

@github-actions
Copy link

github-actions bot commented May 5, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_shadercache_coverage_assertions_2274033318/index.html.
Configuration: release_clang_shadercache_coverage_assertions.

@github-actions
Copy link

github-actions bot commented May 5, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_coverage_2274033318/index.html.
Configuration: release_clang_coverage.

@ruiling
Copy link
Contributor

ruiling commented May 8, 2022

I think we need to use wqm instead of soft_wqm because we need to enable wqm to make subgroupQuadXXX() work correctly. Based on https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-helper-invocations, whether helper invocation need to be active/inactive for other subgroup operations is not quite clear. I guess we are inserting soft_wqm for subgroup vote operations is fixing either CTS or some application assume helper invocations will participate subgroup operations.

@perlfu
Copy link
Contributor Author

perlfu commented May 9, 2022

I think we need to use wqm instead of soft_wqm because we need to enable wqm to make subgroupQuadXXX() work correctly. Based on https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-helper-invocations, whether helper invocation need to be active/inactive for other subgroup operations is not quite clear. I guess we are inserting soft_wqm for subgroup vote operations is fixing either CTS or some application assume helper invocations will participate subgroup operations.

Ideally, we only want helper invocations around if they are required as not having them can save energy.
Hence why I proposed using softwqm and the backend change.
The point of the softwqm is to defer the actual decision to the backend, in a saying "run in WQM if it would change the result".
Having other explicit WQM operations, image samples, demotes, etc would change the result.

However, I am also willing to be pragmatic about it.
Testing, I found that softwqm only turned up in 29 out of 10362 game pipelines I combined.
In only one case was the softwqm not converted to WQM.
So in practice it probably makes little difference.
We can entirely remove all uses of softwqm and everything works (in the CTS test sense).

@perlfu
Copy link
Contributor Author

perlfu commented May 9, 2022

Note: I've rewritten the way WQM is handled for subgroup operations. I believe this should be based on the shader stage alone and not looking for specific operations in the SPIRV. All the operations listed were specific the fragment shader stage alone, and helper invocations are only valid in the fragment shader.

@perlfu perlfu changed the title Make helper invocations take part in subgroupQuadBroadcast. Make helper invocations take part in subgroupQuad operations May 9, 2022
@github-actions
Copy link

github-actions bot commented May 9, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_coverage_2292479142/index.html.
Configuration: release_clang_coverage.

@github-actions
Copy link

github-actions bot commented May 9, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_shadercache_coverage_assertions_2292479142/index.html.
Configuration: release_clang_shadercache_coverage_assertions.

@github-actions
Copy link

github-actions bot commented May 9, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_coverage_2292468908/index.html.
Configuration: release_clang_coverage.

@github-actions
Copy link

github-actions bot commented May 9, 2022

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_shadercache_coverage_assertions_2292468908/index.html.
Configuration: release_clang_shadercache_coverage_assertions.

@amdvlk-admin
Copy link
Collaborator

Test summary for commit d9bf703

CTS tests (Failed: 0/187823)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65226 (56.2%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 28581/65226 (43.8%)
    • Warnings: 0/65226 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 31111/57371 (54.2%)
    • Failed: 0/57371 (0.0%)
    • Not Supported: 26260/57371 (45.8%)
    • Warnings: 0/57371 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37804/65226 (58.0%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 27422/65226 (42.0%)
    • Warnings: 0/65226 (0.0%)

@ruiling
Copy link
Contributor

ruiling commented May 9, 2022

Ideally, we only want helper invocations around if they are required as not having them can save energy. Hence why I proposed using softwqm and the backend change. The point of the softwqm is to defer the actual decision to the backend, in a saying "run in WQM if it would change the result". Having other explicit WQM operations, image samples, demotes, etc would change the result.

We need to consider correctness before talking about saving energy. What if a fragment shader has only calls to subgroupQuadBroadcast() with no image sample, no demotes? The launched wave still need to have helper invocations enabled to get defined behavior if some invocations are helpers at wave launch time.

I think for quad group operations (https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-quad-operations), we should use wqm intrinsic. For non quad group operations, I think it is ok to use soft_wqm based on (https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-helper-invocations).

The change to always put a soft_wqm for subgroup vote operation in fragment shader sounds fine to me.

@jayfoad
Copy link
Member

jayfoad commented May 9, 2022

I think for quad group operations (https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-quad-operations), we should use wqm intrinsic. For non quad group operations, I think it is ok to use soft_wqm based on (https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#shaders-helper-invocations).

That makes sense to me. Thank you for explaining. I did not know that the Vulkan spec had special rules for the "quad" operations.

Copy link
Contributor

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit. I don't know the semantics well enough to comment on the correctness of this.

to address a potential issue with Vulkan CTS test:
dEQP-VK.draw.renderpass.shader_invocation.helper_invocation

If this affects a CTS test, a shaderdb test that does not require a GPU would be very welcome: https://github.com/GPUOpen-Drivers/llpc/blob/dev/docs/Contributing.md#write-useful-tests. From what I remember, there weren't many tests that exercise OpKill/OpDemoteToHelper.

lgc/builder/SubgroupBuilder.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some type of test would be nice.

@nhaehnle
Copy link
Member

I agree with @ruiling that quad ops must be (hard)wqm for correctness. For other subgroup ops, Carl's approach of enabling softwqm when demote is present is correct.

If we want to simplify things because we don't care about power, then enabling wqm unconditionally in fragment shaders is also correct.

Subgroup quad operations should be marked as WQM because
helper invocations take part in subgroup operations if enabled.

This addresses a potential issue with Vulkan CTS test:
dEQP-VK.draw.renderpass.shader_invocation.helper_invocation

Rework use of WQM intrinsics in subgroup operations to be
based only on shader stage and not use knowledge of operations
used in SPIRV.
@perlfu
Copy link
Contributor Author

perlfu commented May 11, 2022

  • Use explicit WQM for quad operations
  • Add basic tests
  • Address other review comments

@github-actions
Copy link

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_coverage_2305223867/index.html.
Configuration: release_clang_coverage.

@github-actions
Copy link

The LLPC code coverage report is available at https://storage.googleapis.com/amdvlk-llpc-github-ci-artifacts-public/coverage_release_clang_shadercache_coverage_assertions_2305223867/index.html.
Configuration: release_clang_shadercache_coverage_assertions.

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 40de282

CTS tests (Failed: 0/187823)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65226 (56.2%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 28581/65226 (43.8%)
    • Warnings: 0/65226 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 31111/57371 (54.2%)
    • Failed: 0/57371 (0.0%)
    • Not Supported: 26260/57371 (45.8%)
    • Warnings: 0/57371 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37804/65226 (58.0%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 27422/65226 (42.0%)
    • Warnings: 0/65226 (0.0%)

Copy link
Contributor

@ruiling ruiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this new version and looks great to me. Removing useHelpInvocation helps a lot to make it easy to understand.

@perlfu
Copy link
Contributor Author

perlfu commented May 16, 2022

Looks like Jenkins test stalled.
Can we retest this please?

@amdrexu
Copy link
Member

amdrexu commented May 16, 2022

I re-run the CI for you.

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 40de282

CTS tests (Failed: 0/186422)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65226 (56.2%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 28581/65226 (43.8%)
    • Warnings: 0/65226 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 30002/55970 (53.6%)
    • Failed: 0/55970 (0.0%)
    • Not Supported: 25968/55970 (46.4%)
    • Warnings: 0/55970 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37804/65226 (58.0%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 27422/65226 (42.0%)
    • Warnings: 0/65226 (0.0%)

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 40de282

CTS tests (Failed: 1/172777)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65226 (56.2%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 28581/65226 (43.8%)
    • Warnings: 0/65226 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 25675/42325 (60.7%)
    • Failed: 1/42325 (0.0%)

      Failures:

      FAILURE: dEQP-VK.robustness.image_robustness.bind.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.img.samples_1.2d.vert
      Stack trace: Script:
      Crash
      
      

    • Not Supported: 16649/42325 (39.3%)
    • Warnings: 0/42325 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37804/65226 (58.0%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 27422/65226 (42.0%)
    • Warnings: 0/65226 (0.0%)

@perlfu
Copy link
Contributor Author

perlfu commented May 18, 2022

I think the GFX9 CTS test failure is spurious, as the test involved has no subgroup operations.
Can you re-run the CI again? Thanks!

@JaxLinAMD
Copy link
Contributor

retest this please

@amdvlk-admin
Copy link
Collaborator

Test summary for commit 40de282

CTS tests (Failed: 0/186422)
  • Built with version 1.3.0.0
  • Rhel 8.2, Gfx10
    • Passed: 36645/65226 (56.2%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 28581/65226 (43.8%)
    • Warnings: 0/65226 (0.0%)
    Ubuntu 18.04, Gfx9
    • Passed: 30002/55970 (53.6%)
    • Failed: 0/55970 (0.0%)
    • Not Supported: 25968/55970 (46.4%)
    • Warnings: 0/55970 (0.0%)
    Ubuntu 20.04, Gfx8
    • Passed: 37804/65226 (58.0%)
    • Failed: 0/65226 (0.0%)
    • Not Supported: 27422/65226 (42.0%)
    • Warnings: 0/65226 (0.0%)

@amdrexu amdrexu merged commit 65603de into GPUOpen-Drivers:dev May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants