-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor divergence mask handling in subgroup tests #1379
Merged
kpet
merged 1 commit into
KhronosGroup:master
from
StuartDBrady:refactor-subgroup-divergence-masks
Jan 19, 2022
Merged
Refactor divergence mask handling in subgroup tests #1379
kpet
merged 1 commit into
KhronosGroup:master
from
StuartDBrady:refactor-subgroup-divergence-masks
Jan 19, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
StuartDBrady
requested review from
AnastasiaStulova,
bashbaug,
neildhickey,
a user,
kpet and
jlewis-austin
January 6, 2022 18:19
gwawiork
previously approved these changes
Jan 11, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes. LGTM
This changes compilation of subgroup test kernels so that a separate compilation is no longer performed for each divergence mask value. The divergence mask is now passed as a kernel argument. This also fixes all subgroup_functions_non_uniform_arithmetic testing and the sub_group_elect and sub_group_any/all_equal subtests of the subgroup_functions_non_uniform_vote test to use the correct order of vector components for GPUs with a subgroup size greater than 64. The conversion of divergence mask bitsets to uint4 vectors has been corrected to match code comments in WorkGroupParams::load_masks() in test_conformance/subgroups/subhelpers.h. Signed-off-by: Stuart Brady <stuart.brady@arm.com>
StuartDBrady
force-pushed
the
refactor-subgroup-divergence-masks
branch
from
January 11, 2022 14:51
2869bfd
to
4e1877d
Compare
gwawiork
approved these changes
Jan 13, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after fix as well
yanfeng3721
added a commit
to yanfeng3721/OpenCL-CTS
that referenced
this pull request
Aug 29, 2022
* Use macOS 10 in CI (KhronosGroup#1282) macOS jobs frequently fail. Since macos-11.0 support is considered experimental, move to macos-10, using macos-latest so we automatically move to 11 when stable. See https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners Signed-off-by: Kevin Petit <kevin.petit@arm.com> * Fix double-release of memory objects (KhronosGroup#1277) A recent update to the object wrapper classes (KhronosGroup#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. * Fix check for image support in test_basic sizeof (KhronosGroup#1269) * add basic test for cl_khr_pci_bus_info (KhronosGroup#1227) * add basic test for cl_khr_pci_bus_info * correctly use TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * fix related usage of TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * Fix double release of object in test_api and test_gl (KhronosGroup#1287) * Fix clang format only * Fix double release of objects * subgroups: Fix setting cl_halfs and progress check. (KhronosGroup#1278) * subgroups: Fix setting cl_halfs and progress check. cl_float testing uses set_value such that a generated cl_ulong of 1 is stored as 1.0F in a logical sense. However, cl_half values aren't intrinsic to C++ and generated cl_ulongs less than 1024 in particular are interpreted bitwise as subnormals. The test fails on compute devices lacking subnormal support. Perform the logical conversion to cl_half. Fix independent forward progress check. * subgroups_half: Address review comments * subgroups_half: Formatting fixes required by check-format * subgroups_half: Modified to query and use rounding mode supported by device Co-authored-by: spauls <spauls@qti.qualcomm.com> * Add tests for entrypoint cl_khr_suggested_local_work_size (KhronosGroup#1264) * Add tests for entrypoint cl_khr_suggested_local_work_size Tests added within test_conformance/workgroups. The tests cover several shapes (num dimensions) and sizes of global work size, kernels using local memory (dynamic and static) and present/non-present global work offset. Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * Fix in comparison for error checking Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * 'test_wg_suggested_local_work_size' fixes * Refactoring of 'test_wg_suggested_local_work_size' Modifications to reduce code duplication and minimize build time * remove testing for scalar vloada_half (KhronosGroup#1293) * Temporarily disable the test_kernel_attributes test case (KhronosGroup#1297) * Temporarily disable the test_kernel_attributes test case Per OpenCL spec on CL_KERNEL_ATTRIBUTES, for kernels not created from OpenCL C source and the clCreateProgramWithSource API call the string returned from this query will be empty. But in test_kernel_attributes test, it read from bc binary and expect to get kernel attribute, which is not consistent with OpenCL spec. * Fix clang format issue * Fix double free in c11_atomics tests for SVM allocations (KhronosGroup#1286) * Only Clang format changes * Fix double free object for SVM allocations * Fix double free - review fixes * Fix kernel source for cl_khr_suggested_local_work_size (KhronosGroup#1300) Use ASCII '-' instead of unicode '–' as subtration operator. Signed-off-by: Kévin Petit <kpet@free.fr> * Remove unused definitions in CMakeLists.txt (KhronosGroup#1302) Signed-off-by: Kévin Petit <kpet@free.fr> * add tests for cl_khr_integer_dot_product (KhronosGroup#1276) * cl_khr_integer_dot_product_tests * remove emulated codepaths * fix formatting * address code review comments * remove emulated codepaths again * address one more review comment * define NOMINMAX in the CMakefile to fix std::min and std::max on MSVC (KhronosGroup#1308) * Report failures in simple_{read,write}_image_pitch tests (KhronosGroup#1309) * Add cl_khr_integer_dot_product to known extensions in test compiler. (KhronosGroup#1316) * suppress MSVC strdup warning (KhronosGroup#1314) * Add missing include for gRandomSeed (KhronosGroup#1307) * Limit workgroup size for atomics tests (KhronosGroup#1197) * Limit workgroup size for atomics tests This avoids extremely large local buffer size and slow run * Always limit workgroup size * Fix memory model issue in `atomic_flag`. (KhronosGroup#1283) * Fix memory model issue in atomic_flag. In atomic_flag sub-tests that modify local memory, compilers may re-order memory accesses between the local and global address spaces which can lead to incorrect test failures. This commit ensures that both local and global memory operations are fenced to prevent this re-ordering from occurring. Fixes KhronosGroup#134. * Clang format changes. * Added missing global acquire which is necessary for the corresponding global release. Thanks to @jlewis-austin for spotting. * Clang format changes. * Match the condition for applying acquire/release fences. * remove min max macros (KhronosGroup#1310) * remove the MIN and MAX macros and use the std versions instead * fix formatting * fix Arm build * remove additional MIN and MAX macros from compat.h * gles: Fix double frees. (KhronosGroup#1323) * gles: Fix double frees. Remove a few explicit frees in the redirect_buffers test which are already handled by a wrapper. * gles: Fix double frees A recent update to the object wrapper classes (KhronosGroup#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. Co-authored-by: spauls <spauls@qti.qualcomm.com> * api: Enable cl_khr_fp16 when using half types in kernel (KhronosGroup#1327) * Update cl_khr_integer_dot_product tests for v2 (KhronosGroup#1317) * Update cl_khr_integer_dot_product tests for v2 Signed-off-by: Kevin Petit <kevin.petit@arm.com> Signed-off-by: Marco Cattani <marco.cattani@arm.com> Change-Id: I97dbd820f1f32f6b377e47d0bf638f36bb91930a * only query acceleration properties with v2+ Change-Id: I3f13a0cba7f1f686365b10adf81690e089cd3d74 * Report unsupported extended subgroup tests as skipped rather than passed (KhronosGroup#1301) * Report unsupported extended subgroup tests as skipped rather than passed Also don't check the presence of extensions for each sub-test. Signed-off-by: Kévin Petit <kpet@free.fr> * address review comments * Extended subgroups - use 128bit masks (KhronosGroup#1215) * Extended subgroups - use 128bit masks * Refactoring to avoid kernels code duplication * unification kernel names as test_ prefix +subgroups function name * use string literals that improve readability * use kernel templates that limit code duplication * WorkGroupParams allows define default kernel - kernel template for multiple functions * WorkGroupParams allows define kernel for specific one subgroup function Co-authored-by: Stuart Brady <stuart.brady@arm.com> * Remove space character from extension name (KhronosGroup#1336) * Add testing of sub_group_broadcast for (u)char and (u)short types (KhronosGroup#1347) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove excessive logging in subgroup tests (KhronosGroup#1343) This also adds some missing data type logging to the subgroup_functions_non_uniform_vote tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve error handling in subgroup tests (KhronosGroup#1352) * MPGCOMP-14761 Improve error handling in subgroup tests Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Add missing newline * Clean up logging in cl_khr_subgroup_ballot tests (KhronosGroup#1351) The tests were logging scalar results as vectors padded with zeroes for no apparent benefit. Fix this. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix missing cl_khr_semaphore extensions in compiler tests (KhronosGroup#1357) * Added missing extensions related to cl_khr_semaphore Signed-off-by: Marco Cattani <marco.cattani@arm.com> * Fix stack-use-after-scope crash in conversions (KhronosGroup#1358) The way that program sources were being constructed involved capturing pointers to strings that were allocated on the stack, and then trying to use them outside of that scope. This change uses a stringstream defined in the outer scope to build the program instead. * Use maximum subgroup size in sub_group_ballot tests (KhronosGroup#1344) sub_group_ballot_bit_count() and sub_group_ballot_find_msb() mask their input according to a subgroup size, which is assumed to be the maximum subgroup size, and not the actual subgroup size excluding non-existent work-items in the "remainder" subgroup. Fix this as per the the clarification made to the OpenCL C specification in revision 3.0.9 for issue KhronosGroup/OpenCL-Docs#626 by pull request KhronosGroup/OpenCL-Docs#689. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix conversion data loss in test_api min_max_constant_args (KhronosGroup#1355) * Subgroups tests - sub_group_non_uniform_scan_exclusive function fixes (KhronosGroup#1350) * Fix - comparing results will never happen. * No special action needed for one work item in the subgroup * Remove unused inclusion of <cstdio> (KhronosGroup#1362) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Tidy up code to determine bit mask for ballot scans (KhronosGroup#1363) It seems more intuitive to set only the bits that are required, rather than to set one more bit than is required, only to clear it again. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Test api min max - fix printing cl_ulong data type (KhronosGroup#1212) * test api - fix code formatting only * Fix printing cl_ulong type to avoid overloading. * Fix printing size_t data type * Fix printing size_t data type - set unsinged * Fix formatting for maxArgs (uint) and numberOfInts (size_t) * Fix build, glext should not be used with GLEW (KhronosGroup#1337) * Fix build, glext should not be used with GLEW * Remove additional define GL_GLEXT_PROTOTYPES * Remove includes which already defined in setup.h * Add cl_khr_command_buffer to list of extensions (KhronosGroup#1365) cl_khr_command_buffer is now public as a provisional khr extension which implementations may report. * Refactor logging of subgroup test start/pass messages (KhronosGroup#1361) Note that this also corrects the start messages logged for the sub_group_ballot_bit_count/find_msb/find_lsb tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove dead threading code (KhronosGroup#1339) Remove unused code that hasn't been used for the last three years and isn't included in makefiles. Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * test_subgroups - Set safe input values for half type and mul, add operations (KhronosGroup#1346) * Set safe input values for half type and mul, add operations * Set safe values for all data types * Typo fix * Set constant seed for shuffle * Change function name to more specific * set_value takes an integer value, not a bit pattern * Remove invalid negative_get_platform_info testcase (KhronosGroup#1374) * Remove invalid negative_get_platform_info testcase * Implementations are only required to do null checks * Fixes KhronosGroup#1318 * Fix formatting * Fix test_api get_command_queue_info (KhronosGroup#1324) * Fix test_api get_command_queue_info Decouple host and device out-of-order test enabling * Rename property sets more generically * Refactor to use std::vector to accumulate test permutations * Fix memory leaks (KhronosGroup#1378) * Fix memory leaks Fixed memory leaks in: buffers, basic, and vectors * Formatting fixes Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * Refactor divergence mask handling in subgroup tests (KhronosGroup#1379) This changes compilation of subgroup test kernels so that a separate compilation is no longer performed for each divergence mask value. The divergence mask is now passed as a kernel argument. This also fixes all subgroup_functions_non_uniform_arithmetic testing and the sub_group_elect and sub_group_any/all_equal subtests of the subgroup_functions_non_uniform_vote test to use the correct order of vector components for GPUs with a subgroup size greater than 64. The conversion of divergence mask bitsets to uint4 vectors has been corrected to match code comments in WorkGroupParams::load_masks() in test_conformance/subgroups/subhelpers.h. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of sub_group_ballot (KhronosGroup#1382) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of kernel arg info in pipe_info test (KhronosGroup#1326) The test now checks that CL_KERNEL_ARG_INFO_NOT_AVAILABLE is returned when calling clGetKernelArgInfo() with offline compilation modes. The correct function name is printed if clGetKernelArgInfo() fails when using online compilation (and not "clSetKernelArgInfo()"). When using online compilation, if the actual arg type is not as expected, the actual arg type is now logged, and the return value is now TEST_FAIL (-1) as per other failures (and not 1). All other test pass/fail values used in the test now use TEST_PASS and TEST_FAIL instead of 0 and -1 literals. An unnecessary cast of pipe_kernel_code has been removed. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Sync submission_details with conformance doc v26 (KhronosGroup#1389) Add "Patches" field Co-authored-by: Kévin Petit <kpet@free.fr> Co-authored-by: James Price <jrprice@google.com> Co-authored-by: BKoscielak <bartosz.koscielak@intel.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com> Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com> Co-authored-by: spauls <spauls@qti.qualcomm.com> Co-authored-by: kalchr01 <83217667+kalchr01@users.noreply.github.com> Co-authored-by: Feng Zou <feng.zou@intel.com> Co-authored-by: Senran (Stephen) Zhang <senran.zhang@intel.com> Co-authored-by: Jeremy Kemp <jeremy@jeremykemp.co.uk> Co-authored-by: Stuart Brady <stuart.brady@arm.com> Co-authored-by: marcat03 <94451804+marcat03@users.noreply.github.com> Co-authored-by: Ewan Crawford <ewan@codeplay.com> Co-authored-by: oramirez <oramirez@qti.qualcomm.com> Co-authored-by: Jim Lewis <j.lewis1@samsung.com>
yanfeng3721
added a commit
to yanfeng3721/OpenCL-CTS
that referenced
this pull request
Oct 18, 2023
* Use macOS 10 in CI (KhronosGroup#1282) macOS jobs frequently fail. Since macos-11.0 support is considered experimental, move to macos-10, using macos-latest so we automatically move to 11 when stable. See https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners Signed-off-by: Kevin Petit <kevin.petit@arm.com> * Fix double-release of memory objects (KhronosGroup#1277) A recent update to the object wrapper classes (KhronosGroup#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. * Fix check for image support in test_basic sizeof (KhronosGroup#1269) * add basic test for cl_khr_pci_bus_info (KhronosGroup#1227) * add basic test for cl_khr_pci_bus_info * correctly use TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * fix related usage of TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * Fix double release of object in test_api and test_gl (KhronosGroup#1287) * Fix clang format only * Fix double release of objects * subgroups: Fix setting cl_halfs and progress check. (KhronosGroup#1278) * subgroups: Fix setting cl_halfs and progress check. cl_float testing uses set_value such that a generated cl_ulong of 1 is stored as 1.0F in a logical sense. However, cl_half values aren't intrinsic to C++ and generated cl_ulongs less than 1024 in particular are interpreted bitwise as subnormals. The test fails on compute devices lacking subnormal support. Perform the logical conversion to cl_half. Fix independent forward progress check. * subgroups_half: Address review comments * subgroups_half: Formatting fixes required by check-format * subgroups_half: Modified to query and use rounding mode supported by device Co-authored-by: spauls <spauls@qti.qualcomm.com> * Add tests for entrypoint cl_khr_suggested_local_work_size (KhronosGroup#1264) * Add tests for entrypoint cl_khr_suggested_local_work_size Tests added within test_conformance/workgroups. The tests cover several shapes (num dimensions) and sizes of global work size, kernels using local memory (dynamic and static) and present/non-present global work offset. Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * Fix in comparison for error checking Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * 'test_wg_suggested_local_work_size' fixes * Refactoring of 'test_wg_suggested_local_work_size' Modifications to reduce code duplication and minimize build time * remove testing for scalar vloada_half (KhronosGroup#1293) * Temporarily disable the test_kernel_attributes test case (KhronosGroup#1297) * Temporarily disable the test_kernel_attributes test case Per OpenCL spec on CL_KERNEL_ATTRIBUTES, for kernels not created from OpenCL C source and the clCreateProgramWithSource API call the string returned from this query will be empty. But in test_kernel_attributes test, it read from bc binary and expect to get kernel attribute, which is not consistent with OpenCL spec. * Fix clang format issue * Fix double free in c11_atomics tests for SVM allocations (KhronosGroup#1286) * Only Clang format changes * Fix double free object for SVM allocations * Fix double free - review fixes * Fix kernel source for cl_khr_suggested_local_work_size (KhronosGroup#1300) Use ASCII '-' instead of unicode '–' as subtration operator. Signed-off-by: Kévin Petit <kpet@free.fr> * Remove unused definitions in CMakeLists.txt (KhronosGroup#1302) Signed-off-by: Kévin Petit <kpet@free.fr> * add tests for cl_khr_integer_dot_product (KhronosGroup#1276) * cl_khr_integer_dot_product_tests * remove emulated codepaths * fix formatting * address code review comments * remove emulated codepaths again * address one more review comment * define NOMINMAX in the CMakefile to fix std::min and std::max on MSVC (KhronosGroup#1308) * Report failures in simple_{read,write}_image_pitch tests (KhronosGroup#1309) * Add cl_khr_integer_dot_product to known extensions in test compiler. (KhronosGroup#1316) * suppress MSVC strdup warning (KhronosGroup#1314) * Add missing include for gRandomSeed (KhronosGroup#1307) * Limit workgroup size for atomics tests (KhronosGroup#1197) * Limit workgroup size for atomics tests This avoids extremely large local buffer size and slow run * Always limit workgroup size * Fix memory model issue in `atomic_flag`. (KhronosGroup#1283) * Fix memory model issue in atomic_flag. In atomic_flag sub-tests that modify local memory, compilers may re-order memory accesses between the local and global address spaces which can lead to incorrect test failures. This commit ensures that both local and global memory operations are fenced to prevent this re-ordering from occurring. Fixes KhronosGroup#134. * Clang format changes. * Added missing global acquire which is necessary for the corresponding global release. Thanks to @jlewis-austin for spotting. * Clang format changes. * Match the condition for applying acquire/release fences. * remove min max macros (KhronosGroup#1310) * remove the MIN and MAX macros and use the std versions instead * fix formatting * fix Arm build * remove additional MIN and MAX macros from compat.h * gles: Fix double frees. (KhronosGroup#1323) * gles: Fix double frees. Remove a few explicit frees in the redirect_buffers test which are already handled by a wrapper. * gles: Fix double frees A recent update to the object wrapper classes (KhronosGroup#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. Co-authored-by: spauls <spauls@qti.qualcomm.com> * api: Enable cl_khr_fp16 when using half types in kernel (KhronosGroup#1327) * Update cl_khr_integer_dot_product tests for v2 (KhronosGroup#1317) * Update cl_khr_integer_dot_product tests for v2 Signed-off-by: Kevin Petit <kevin.petit@arm.com> Signed-off-by: Marco Cattani <marco.cattani@arm.com> Change-Id: I97dbd820f1f32f6b377e47d0bf638f36bb91930a * only query acceleration properties with v2+ Change-Id: I3f13a0cba7f1f686365b10adf81690e089cd3d74 * Report unsupported extended subgroup tests as skipped rather than passed (KhronosGroup#1301) * Report unsupported extended subgroup tests as skipped rather than passed Also don't check the presence of extensions for each sub-test. Signed-off-by: Kévin Petit <kpet@free.fr> * address review comments * Extended subgroups - use 128bit masks (KhronosGroup#1215) * Extended subgroups - use 128bit masks * Refactoring to avoid kernels code duplication * unification kernel names as test_ prefix +subgroups function name * use string literals that improve readability * use kernel templates that limit code duplication * WorkGroupParams allows define default kernel - kernel template for multiple functions * WorkGroupParams allows define kernel for specific one subgroup function Co-authored-by: Stuart Brady <stuart.brady@arm.com> * Remove space character from extension name (KhronosGroup#1336) * Add testing of sub_group_broadcast for (u)char and (u)short types (KhronosGroup#1347) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove excessive logging in subgroup tests (KhronosGroup#1343) This also adds some missing data type logging to the subgroup_functions_non_uniform_vote tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve error handling in subgroup tests (KhronosGroup#1352) * MPGCOMP-14761 Improve error handling in subgroup tests Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Add missing newline * Clean up logging in cl_khr_subgroup_ballot tests (KhronosGroup#1351) The tests were logging scalar results as vectors padded with zeroes for no apparent benefit. Fix this. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix missing cl_khr_semaphore extensions in compiler tests (KhronosGroup#1357) * Added missing extensions related to cl_khr_semaphore Signed-off-by: Marco Cattani <marco.cattani@arm.com> * Fix stack-use-after-scope crash in conversions (KhronosGroup#1358) The way that program sources were being constructed involved capturing pointers to strings that were allocated on the stack, and then trying to use them outside of that scope. This change uses a stringstream defined in the outer scope to build the program instead. * Use maximum subgroup size in sub_group_ballot tests (KhronosGroup#1344) sub_group_ballot_bit_count() and sub_group_ballot_find_msb() mask their input according to a subgroup size, which is assumed to be the maximum subgroup size, and not the actual subgroup size excluding non-existent work-items in the "remainder" subgroup. Fix this as per the the clarification made to the OpenCL C specification in revision 3.0.9 for issue KhronosGroup/OpenCL-Docs#626 by pull request KhronosGroup/OpenCL-Docs#689. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix conversion data loss in test_api min_max_constant_args (KhronosGroup#1355) * Subgroups tests - sub_group_non_uniform_scan_exclusive function fixes (KhronosGroup#1350) * Fix - comparing results will never happen. * No special action needed for one work item in the subgroup * Remove unused inclusion of <cstdio> (KhronosGroup#1362) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Tidy up code to determine bit mask for ballot scans (KhronosGroup#1363) It seems more intuitive to set only the bits that are required, rather than to set one more bit than is required, only to clear it again. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Test api min max - fix printing cl_ulong data type (KhronosGroup#1212) * test api - fix code formatting only * Fix printing cl_ulong type to avoid overloading. * Fix printing size_t data type * Fix printing size_t data type - set unsinged * Fix formatting for maxArgs (uint) and numberOfInts (size_t) * Fix build, glext should not be used with GLEW (KhronosGroup#1337) * Fix build, glext should not be used with GLEW * Remove additional define GL_GLEXT_PROTOTYPES * Remove includes which already defined in setup.h * Add cl_khr_command_buffer to list of extensions (KhronosGroup#1365) cl_khr_command_buffer is now public as a provisional khr extension which implementations may report. * Refactor logging of subgroup test start/pass messages (KhronosGroup#1361) Note that this also corrects the start messages logged for the sub_group_ballot_bit_count/find_msb/find_lsb tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove dead threading code (KhronosGroup#1339) Remove unused code that hasn't been used for the last three years and isn't included in makefiles. Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * test_subgroups - Set safe input values for half type and mul, add operations (KhronosGroup#1346) * Set safe input values for half type and mul, add operations * Set safe values for all data types * Typo fix * Set constant seed for shuffle * Change function name to more specific * set_value takes an integer value, not a bit pattern * Remove invalid negative_get_platform_info testcase (KhronosGroup#1374) * Remove invalid negative_get_platform_info testcase * Implementations are only required to do null checks * Fixes KhronosGroup#1318 * Fix formatting * Fix test_api get_command_queue_info (KhronosGroup#1324) * Fix test_api get_command_queue_info Decouple host and device out-of-order test enabling * Rename property sets more generically * Refactor to use std::vector to accumulate test permutations * Fix memory leaks (KhronosGroup#1378) * Fix memory leaks Fixed memory leaks in: buffers, basic, and vectors * Formatting fixes Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * Refactor divergence mask handling in subgroup tests (KhronosGroup#1379) This changes compilation of subgroup test kernels so that a separate compilation is no longer performed for each divergence mask value. The divergence mask is now passed as a kernel argument. This also fixes all subgroup_functions_non_uniform_arithmetic testing and the sub_group_elect and sub_group_any/all_equal subtests of the subgroup_functions_non_uniform_vote test to use the correct order of vector components for GPUs with a subgroup size greater than 64. The conversion of divergence mask bitsets to uint4 vectors has been corrected to match code comments in WorkGroupParams::load_masks() in test_conformance/subgroups/subhelpers.h. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of sub_group_ballot (KhronosGroup#1382) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of kernel arg info in pipe_info test (KhronosGroup#1326) The test now checks that CL_KERNEL_ARG_INFO_NOT_AVAILABLE is returned when calling clGetKernelArgInfo() with offline compilation modes. The correct function name is printed if clGetKernelArgInfo() fails when using online compilation (and not "clSetKernelArgInfo()"). When using online compilation, if the actual arg type is not as expected, the actual arg type is now logged, and the return value is now TEST_FAIL (-1) as per other failures (and not 1). All other test pass/fail values used in the test now use TEST_PASS and TEST_FAIL instead of 0 and -1 literals. An unnecessary cast of pipe_kernel_code has been removed. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Sync submission_details with conformance doc v26 (KhronosGroup#1389) Add "Patches" field * Refactor kernel execution in subgroup tests (KhronosGroup#1391) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Update format script and drop Travis badge for branch rename (KhronosGroup#1393) `master` is now `main`, so update `check-format.sh` accordingly. Also completely drop the Travis badge as we now use GitHub actions. There is no replacement badge as the current action is pre-submission, not post-submission. * Added simple test for CL_DEVICE_PRINTF_BUFFER_SIZE. (KhronosGroup#1386) * Added simple test for CL_DEVICE_PRINTF_BUFFER_SIZE. * Clang format fix. * Check for non-uniform work-group support (KhronosGroup#1383) Only run sub-group tests with non-uniform work-groups on OpenCL 3.0 and later if it is supported by the device. * Fix build error for linux with clang-8 (KhronosGroup#1304) -Wabsolute-value warning reported as error (long double truncated to double) * add a prefix to OpenCL extension names (KhronosGroup#1311) * add a prefix to OpenCL extension names * fix formatting * conversions: Use volatile qualifier to prevent optimizations (KhronosGroup#1399) Use volatile to prevent clang optimizations, fix int2float * Add cluster size handling in subgroup test helpers (KhronosGroup#1394) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve cl_khr_subgroup_shuffle* test coverage (KhronosGroup#1402) Test cases where the index/mask/delta is greater than or equal to the maximum subgroup size. These are cases that return undefined results but are not undefined behavior. The index/mask/delta values now include values less than twice the subgroup size, and 0xffffffff. Testing for sub_group_shuffle_xor() already allowed inputs that were greater or equal to the subgroup size for the last subgroup in a workgroup, but did not properly account for this in the verification function, potentially resulting in out of bounds accesses. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * test_api_min_max.cpp: use size_t for get_global_id() value (KhronosGroup#1410) In some rare cases where get_global_id() is larger than 2G, the 32bit int type would convert the value into a negative integer. * Fix sub_group_ballot_find_msb/lsb tests (KhronosGroup#1411) As per the OpenCL Extension Specification § 38.6 Ballots: If no bits representing predicate values from all work items in the subgroup are set in the bitfield value then the return value is undefined. The case with no bits set is still worth testing, as it does not result in undefined behavior, but only an undefined return value. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * refactor work group scan and reduction tests (KhronosGroup#1401) * updated reduce test * switched all reduce tests to new framework * switch over scans to new framework * remove old files * minor fixes * add type type name to the kernel name * fix Windows build and warnings * address review comments * Test all cluster sizes for cl_khr_subgroup_clustered_reduce (KhronosGroup#1408) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix incorrect use image channel data type and filtering mode (KhronosGroup#1375) * Fix clang 10 build errors (KhronosGroup#1387) * Fix clang 10 build errors Lossy casts due to inexact float representation of CL_INT_MAX * Fix clang format * Remove implicit-const-int-float-conversion flag * test_basic/enqueue_map: Initialize all the data (KhronosGroup#1417) Signed-off-by: Kevin Petit <kevin.petit@arm.com> Signed-off-by: Kévin Petit <kpet@free.fr> Signed-off-by: Stuart Brady <stuart.brady@arm.com> Signed-off-by: Marco Cattani <marco.cattani@arm.com> Co-authored-by: Kévin Petit <kpet@free.fr> Co-authored-by: James Price <jrprice@google.com> Co-authored-by: BKoscielak <bartosz.koscielak@intel.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com> Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com> Co-authored-by: spauls <spauls@qti.qualcomm.com> Co-authored-by: kalchr01 <83217667+kalchr01@users.noreply.github.com> Co-authored-by: Feng Zou <feng.zou@intel.com> Co-authored-by: Senran (Stephen) Zhang <senran.zhang@intel.com> Co-authored-by: Jeremy Kemp <jeremy@jeremykemp.co.uk> Co-authored-by: Stuart Brady <stuart.brady@arm.com> Co-authored-by: marcat03 <94451804+marcat03@users.noreply.github.com> Co-authored-by: Ewan Crawford <ewan@codeplay.com> Co-authored-by: oramirez <oramirez@qti.qualcomm.com> Co-authored-by: Jim Lewis <j.lewis1@samsung.com> Co-authored-by: Alastair Murray <alastair.murray@codeplay.com> Co-authored-by: Jack Frankland <30410009+FranklandJack@users.noreply.github.com> Co-authored-by: Jason Tang <jason.tang@amd.com> Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
yanfeng3721
added a commit
to yanfeng3721/OpenCL-CTS
that referenced
this pull request
Oct 18, 2023
* Use macOS 10 in CI (#1282) macOS jobs frequently fail. Since macos-11.0 support is considered experimental, move to macos-10, using macos-latest so we automatically move to 11 when stable. See https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners Signed-off-by: Kevin Petit <kevin.petit@arm.com> * Fix double-release of memory objects (#1277) A recent update to the object wrapper classes (#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. * Fix check for image support in test_basic sizeof (#1269) * add basic test for cl_khr_pci_bus_info (#1227) * add basic test for cl_khr_pci_bus_info * correctly use TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * fix related usage of TEST_SKIPPED_ITSELF Co-authored-by: Kévin Petit <kpet@free.fr> * Fix double release of object in test_api and test_gl (#1287) * Fix clang format only * Fix double release of objects * subgroups: Fix setting cl_halfs and progress check. (#1278) * subgroups: Fix setting cl_halfs and progress check. cl_float testing uses set_value such that a generated cl_ulong of 1 is stored as 1.0F in a logical sense. However, cl_half values aren't intrinsic to C++ and generated cl_ulongs less than 1024 in particular are interpreted bitwise as subnormals. The test fails on compute devices lacking subnormal support. Perform the logical conversion to cl_half. Fix independent forward progress check. * subgroups_half: Address review comments * subgroups_half: Formatting fixes required by check-format * subgroups_half: Modified to query and use rounding mode supported by device Co-authored-by: spauls <spauls@qti.qualcomm.com> * Add tests for entrypoint cl_khr_suggested_local_work_size (#1264) * Add tests for entrypoint cl_khr_suggested_local_work_size Tests added within test_conformance/workgroups. The tests cover several shapes (num dimensions) and sizes of global work size, kernels using local memory (dynamic and static) and present/non-present global work offset. Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * Fix in comparison for error checking Signed-off-by: Kallia Chronaki <kallia.chronaki@arm.com> * 'test_wg_suggested_local_work_size' fixes * Refactoring of 'test_wg_suggested_local_work_size' Modifications to reduce code duplication and minimize build time * remove testing for scalar vloada_half (#1293) * Temporarily disable the test_kernel_attributes test case (#1297) * Temporarily disable the test_kernel_attributes test case Per OpenCL spec on CL_KERNEL_ATTRIBUTES, for kernels not created from OpenCL C source and the clCreateProgramWithSource API call the string returned from this query will be empty. But in test_kernel_attributes test, it read from bc binary and expect to get kernel attribute, which is not consistent with OpenCL spec. * Fix clang format issue * Fix double free in c11_atomics tests for SVM allocations (#1286) * Only Clang format changes * Fix double free object for SVM allocations * Fix double free - review fixes * Fix kernel source for cl_khr_suggested_local_work_size (#1300) Use ASCII '-' instead of unicode '–' as subtration operator. Signed-off-by: Kévin Petit <kpet@free.fr> * Remove unused definitions in CMakeLists.txt (#1302) Signed-off-by: Kévin Petit <kpet@free.fr> * add tests for cl_khr_integer_dot_product (#1276) * cl_khr_integer_dot_product_tests * remove emulated codepaths * fix formatting * address code review comments * remove emulated codepaths again * address one more review comment * define NOMINMAX in the CMakefile to fix std::min and std::max on MSVC (#1308) * Report failures in simple_{read,write}_image_pitch tests (#1309) * Add cl_khr_integer_dot_product to known extensions in test compiler. (#1316) * suppress MSVC strdup warning (#1314) * Add missing include for gRandomSeed (#1307) * Limit workgroup size for atomics tests (#1197) * Limit workgroup size for atomics tests This avoids extremely large local buffer size and slow run * Always limit workgroup size * Fix memory model issue in `atomic_flag`. (#1283) * Fix memory model issue in atomic_flag. In atomic_flag sub-tests that modify local memory, compilers may re-order memory accesses between the local and global address spaces which can lead to incorrect test failures. This commit ensures that both local and global memory operations are fenced to prevent this re-ordering from occurring. Fixes #134. * Clang format changes. * Added missing global acquire which is necessary for the corresponding global release. Thanks to @jlewis-austin for spotting. * Clang format changes. * Match the condition for applying acquire/release fences. * remove min max macros (#1310) * remove the MIN and MAX macros and use the std versions instead * fix formatting * fix Arm build * remove additional MIN and MAX macros from compat.h * gles: Fix double frees. (#1323) * gles: Fix double frees. Remove a few explicit frees in the redirect_buffers test which are already handled by a wrapper. * gles: Fix double frees A recent update to the object wrapper classes (#1268) changed the behavior of assigning to a wrapper, whereby the wrapped object is now released upon assignment. A couple of tests were manually calling clReleaseMemObject and then assigning `nullptr` to the wrapper, resulting in the wrapper calling clReleaseMemObject on an object that had already been destroyed. Co-authored-by: spauls <spauls@qti.qualcomm.com> * api: Enable cl_khr_fp16 when using half types in kernel (#1327) * Update cl_khr_integer_dot_product tests for v2 (#1317) * Update cl_khr_integer_dot_product tests for v2 Signed-off-by: Kevin Petit <kevin.petit@arm.com> Signed-off-by: Marco Cattani <marco.cattani@arm.com> Change-Id: I97dbd820f1f32f6b377e47d0bf638f36bb91930a * only query acceleration properties with v2+ Change-Id: I3f13a0cba7f1f686365b10adf81690e089cd3d74 * Report unsupported extended subgroup tests as skipped rather than passed (#1301) * Report unsupported extended subgroup tests as skipped rather than passed Also don't check the presence of extensions for each sub-test. Signed-off-by: Kévin Petit <kpet@free.fr> * address review comments * Extended subgroups - use 128bit masks (#1215) * Extended subgroups - use 128bit masks * Refactoring to avoid kernels code duplication * unification kernel names as test_ prefix +subgroups function name * use string literals that improve readability * use kernel templates that limit code duplication * WorkGroupParams allows define default kernel - kernel template for multiple functions * WorkGroupParams allows define kernel for specific one subgroup function Co-authored-by: Stuart Brady <stuart.brady@arm.com> * Remove space character from extension name (#1336) * Add testing of sub_group_broadcast for (u)char and (u)short types (#1347) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove excessive logging in subgroup tests (#1343) This also adds some missing data type logging to the subgroup_functions_non_uniform_vote tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve error handling in subgroup tests (#1352) * MPGCOMP-14761 Improve error handling in subgroup tests Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Add missing newline * Clean up logging in cl_khr_subgroup_ballot tests (#1351) The tests were logging scalar results as vectors padded with zeroes for no apparent benefit. Fix this. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix missing cl_khr_semaphore extensions in compiler tests (#1357) * Added missing extensions related to cl_khr_semaphore Signed-off-by: Marco Cattani <marco.cattani@arm.com> * Fix stack-use-after-scope crash in conversions (#1358) The way that program sources were being constructed involved capturing pointers to strings that were allocated on the stack, and then trying to use them outside of that scope. This change uses a stringstream defined in the outer scope to build the program instead. * Use maximum subgroup size in sub_group_ballot tests (#1344) sub_group_ballot_bit_count() and sub_group_ballot_find_msb() mask their input according to a subgroup size, which is assumed to be the maximum subgroup size, and not the actual subgroup size excluding non-existent work-items in the "remainder" subgroup. Fix this as per the the clarification made to the OpenCL C specification in revision 3.0.9 for issue KhronosGroup/OpenCL-Docs#626 by pull request KhronosGroup/OpenCL-Docs#689. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix conversion data loss in test_api min_max_constant_args (#1355) * Subgroups tests - sub_group_non_uniform_scan_exclusive function fixes (#1350) * Fix - comparing results will never happen. * No special action needed for one work item in the subgroup * Remove unused inclusion of <cstdio> (#1362) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Tidy up code to determine bit mask for ballot scans (#1363) It seems more intuitive to set only the bits that are required, rather than to set one more bit than is required, only to clear it again. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Test api min max - fix printing cl_ulong data type (#1212) * test api - fix code formatting only * Fix printing cl_ulong type to avoid overloading. * Fix printing size_t data type * Fix printing size_t data type - set unsinged * Fix formatting for maxArgs (uint) and numberOfInts (size_t) * Fix build, glext should not be used with GLEW (#1337) * Fix build, glext should not be used with GLEW * Remove additional define GL_GLEXT_PROTOTYPES * Remove includes which already defined in setup.h * Add cl_khr_command_buffer to list of extensions (#1365) cl_khr_command_buffer is now public as a provisional khr extension which implementations may report. * Refactor logging of subgroup test start/pass messages (#1361) Note that this also corrects the start messages logged for the sub_group_ballot_bit_count/find_msb/find_lsb tests. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Remove dead threading code (#1339) Remove unused code that hasn't been used for the last three years and isn't included in makefiles. Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * test_subgroups - Set safe input values for half type and mul, add operations (#1346) * Set safe input values for half type and mul, add operations * Set safe values for all data types * Typo fix * Set constant seed for shuffle * Change function name to more specific * set_value takes an integer value, not a bit pattern * Remove invalid negative_get_platform_info testcase (#1374) * Remove invalid negative_get_platform_info testcase * Implementations are only required to do null checks * Fixes #1318 * Fix formatting * Fix test_api get_command_queue_info (#1324) * Fix test_api get_command_queue_info Decouple host and device out-of-order test enabling * Rename property sets more generically * Refactor to use std::vector to accumulate test permutations * Fix memory leaks (#1378) * Fix memory leaks Fixed memory leaks in: buffers, basic, and vectors * Formatting fixes Co-authored-by: oramirez <oramirez@qti.qualcomm.com> * Refactor divergence mask handling in subgroup tests (#1379) This changes compilation of subgroup test kernels so that a separate compilation is no longer performed for each divergence mask value. The divergence mask is now passed as a kernel argument. This also fixes all subgroup_functions_non_uniform_arithmetic testing and the sub_group_elect and sub_group_any/all_equal subtests of the subgroup_functions_non_uniform_vote test to use the correct order of vector components for GPUs with a subgroup size greater than 64. The conversion of divergence mask bitsets to uint4 vectors has been corrected to match code comments in WorkGroupParams::load_masks() in test_conformance/subgroups/subhelpers.h. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of sub_group_ballot (#1382) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve testing of kernel arg info in pipe_info test (#1326) The test now checks that CL_KERNEL_ARG_INFO_NOT_AVAILABLE is returned when calling clGetKernelArgInfo() with offline compilation modes. The correct function name is printed if clGetKernelArgInfo() fails when using online compilation (and not "clSetKernelArgInfo()"). When using online compilation, if the actual arg type is not as expected, the actual arg type is now logged, and the return value is now TEST_FAIL (-1) as per other failures (and not 1). All other test pass/fail values used in the test now use TEST_PASS and TEST_FAIL instead of 0 and -1 literals. An unnecessary cast of pipe_kernel_code has been removed. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Sync submission_details with conformance doc v26 (#1389) Add "Patches" field * Refactor kernel execution in subgroup tests (#1391) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Update format script and drop Travis badge for branch rename (#1393) `master` is now `main`, so update `check-format.sh` accordingly. Also completely drop the Travis badge as we now use GitHub actions. There is no replacement badge as the current action is pre-submission, not post-submission. * Added simple test for CL_DEVICE_PRINTF_BUFFER_SIZE. (#1386) * Added simple test for CL_DEVICE_PRINTF_BUFFER_SIZE. * Clang format fix. * Check for non-uniform work-group support (#1383) Only run sub-group tests with non-uniform work-groups on OpenCL 3.0 and later if it is supported by the device. * Fix build error for linux with clang-8 (#1304) -Wabsolute-value warning reported as error (long double truncated to double) * add a prefix to OpenCL extension names (#1311) * add a prefix to OpenCL extension names * fix formatting * conversions: Use volatile qualifier to prevent optimizations (#1399) Use volatile to prevent clang optimizations, fix int2float * Add cluster size handling in subgroup test helpers (#1394) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Improve cl_khr_subgroup_shuffle* test coverage (#1402) Test cases where the index/mask/delta is greater than or equal to the maximum subgroup size. These are cases that return undefined results but are not undefined behavior. The index/mask/delta values now include values less than twice the subgroup size, and 0xffffffff. Testing for sub_group_shuffle_xor() already allowed inputs that were greater or equal to the subgroup size for the last subgroup in a workgroup, but did not properly account for this in the verification function, potentially resulting in out of bounds accesses. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * test_api_min_max.cpp: use size_t for get_global_id() value (#1410) In some rare cases where get_global_id() is larger than 2G, the 32bit int type would convert the value into a negative integer. * Fix sub_group_ballot_find_msb/lsb tests (#1411) As per the OpenCL Extension Specification § 38.6 Ballots: If no bits representing predicate values from all work items in the subgroup are set in the bitfield value then the return value is undefined. The case with no bits set is still worth testing, as it does not result in undefined behavior, but only an undefined return value. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * refactor work group scan and reduction tests (#1401) * updated reduce test * switched all reduce tests to new framework * switch over scans to new framework * remove old files * minor fixes * add type type name to the kernel name * fix Windows build and warnings * address review comments * Test all cluster sizes for cl_khr_subgroup_clustered_reduce (#1408) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix incorrect use image channel data type and filtering mode (#1375) * Fix clang 10 build errors (#1387) * Fix clang 10 build errors Lossy casts due to inexact float representation of CL_INT_MAX * Fix clang format * Remove implicit-const-int-float-conversion flag * test_basic/enqueue_map: Initialize all the data (#1417) * imageHelpers: add CL_UNORM_SHORT_{555, 565} in get_max_absolute_error (#1406) * imageHelpers: add CL_UNORM_SHORT_{555, 565} in get_max_absolute_error Working on a device supporting CL_UNORM_SHORT_565 image data type, I noticed that the max absolute error authorized was not the right one for such image data type. Also because of normalization, there is always an absolute error authorized whatever the filtering of the sampler. Ref #1140 * put back if statement on filter_mode * Change memory order and scope for atomics that gate final results being stored. (#1377) * Change memory order and scope for atomics that gate final results being stored. memory_order_acq_rel with memory_scope_device is now used to guarantee that the correct memory consistency is observed before final results are stored. Previously it was possible for kernels to be generated that all used relaxed memory ordering, which could lead to false-positive failures. Fixes #1370 * Disable atomics tests with global, in-program atomics. If the device does not support `memory_order_relaxed` or `memory_scope_device`, disable atomics tests that declare their atomics in-program with global memory. There is now an implicit requirement to support `memory_order_relaxed` and `memory_scope_device` for these tests. * Fix misplaced parentheses. * Change memory scope for atomic fetch and load calls in kernel Change the memory scope from memory_scope_work_group to memory_scope_device so the ordering applies across all work items Co-authored-by: Sreelakshmi Haridas <sharidas@quicinc.com> * Update Github Actions CI and add Windows (#1413) - Add one Windows build to Github Actions - Remove Appveyor config - Move a few build steps out of the script - Use Ninja as the generator (makes for more readable logs) - Add build cache (except on Windows where it seems to break) Change-Id: Ida90ee1842af98aff86e5144ab7b9766480378c9 Signed-off-by: Kevin Petit <kevin.petit@arm.com> * api/kernel_arg_info: Check for read_write image support before testing it (#1420) Code taken from api/test_min_image_formats.cpp * images: Stop checking gDeviceType != CL_DEVICE_TYPE_GPU (#1418) * images: Stop checking gDeviceType != CL_DEVICE_TYPE_GPU If the device type also advertises CL_DEVICE_TYPE_DEFAULT (which should be valid), this causes it to be considered a CPU device and the tests enforce different precision and rounding expectations. * Fix clang-format * Drop redundant NORM_OFFSET checks * Enable mipmap extension pragmas (#1349) * Enable mipmap pragmas where appopriate. * clang-format changes. * Add content to README (#1427) Fill in the placeholder readme with some basic information on building and running the project. Information on the conformance submission process and contributing are also included. Should help close a few issues referenced in https://github.com/KhronosGroup/OpenCL-CTS/issues/1096 I don't think this is all the information we want, but is a starting point from which we can progress. For example, adding the android build instructions from https://github.com/KhronosGroup/OpenCL-CTS/pull/1021 * Fixes incorrect slice pitch calculation in clCopyImage 1Darray (#1258) The slice pitch/padding calculation assumed that the 'height' variable contained the pixel height of the image, which it doesn't for IMAGE1D_ARRAY. Fixes #1257 * test_compiler_defines_for_extensions: fix overflow (#1430) GCC 11.2.0 warns about a possible string overflow (when num_not_supported_extensions+num_of_supported_extensions == 0) since no space would be allocated for the terminating null byte that string manipulation fns expect to find. This unconditionally adds an extra byte to the allocation to silence the warning and fix building with -Werror. * Fix local memory out of bounds issue in atomic_fence (replaces PR #1285) (#1437) * Fix local memory out of bounds in atomic_fence In the error condition, the atomic_fence kernel can illegally access local memory addresses. In this snippet, localValues is in the local address space and provided as a kernel argument. Its size is effectively get_local_size(0) * sizeof(int). The stores to localValues lead to OoB accesses. size_t myId = get_local_id(0); ... if(hisAtomicValue != hisValue) { // fail atomic_store(&destMemory[myId], myValue-1); hisId = (hisId+get_local_size(0)-1)%get_local_size(0); if(myValue+1 < 1) localValues[myId*1+myValue+1] = hisId; if(myValue+2 < 1) localValues[myId*1+myValue+2] = hisAtomicValue; if(myValue+3 < 1) localValues[myId*1+myValue+3] = hisValue; } * Fix formatting * Fix formatting again * Formatting * Added missing tests for integer_dot_product_input_4x8bit and integer_dot_product_input_4x8bit_packed on feature_macro compiler test. (#1432) * Added integer_dot_product_input_4x8bit and integer_dot_product_input_4x8bit_packed tests to feature_macro_test * clang formatting * Now the test checks whether the array of optional features returned by clGetDeviceInfo contains the standard optional features we are testing. * Update test_conformance/compiler/test_feature_macro.cpp Added printing the missing standard feature it it is not found inside the optional features array returned by clGetDeviceInfo. Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> * Fix test_half async_work_group_copy arguments (#1298) (#1299) Workitems in the last workgroup calls async_work_group_copy with different argument values depending on 'adjust'. According to spec, this results in undefined values. * Initial CTS for external semaphore and memory extensions (#1390) * Initial CTS for external sharing extensions Initial set of tests for below extensions with Vulkan as producer 1. cl_khr_external_memory 2. cl_khr_external_memory_win32 3. cl_khr_external_memory_opaque_fd 4. cl_khr_external_semaphore 5. cl_khr_external_semaphore_win32 6. cl_khr_external_semaphore_opaque_fd * Updates to external sharing CTS Updates to external sharing CTS 1. Fix some build issues to remove unnecessary, non-existent files 2. Add new tests for platform and device queries. 3. Some added checks for VK Support. * Update CTS build script for Vulkan Headers Update CTS build to clone Vulkan Headers repo and pass it to CTS build in preparation for external memory and semaphore tests * Fix Vulkan header path Fix Vulkan header include path. * Add Vulkan loader dependency Vulkan loader is required to build test_vulkan of OpenCL-CTS. Clone and build Vulkan loader as prerequisite to OpenCL-CTS. * Fix Vulkan loader path in test_vulkan Remove arch/os suffix in Vulkan loader path to match vulkan loader repo build. * Fix warnings around getHandle API. Return type of getHandle is defined differently based on win or linux builds. Use appropriate guards when using API at other places. While at it remove duplicate definition of ARRAY_SIZE. * Use ARRAY_SIZE in harness. Use already defined ARRAY_SIZE macro from test_harness. * Fix build issues for test_vulkan Fix build issues for test_vulkan 1. Add cl_ext.h in common files 2. Replace cl_mem_properties_khr with cl_mem_properties 3. Replace cl_external_mem_handle_type_khr with cl_external_memory_handle_type_khr 4. Type-cast malloc as required. * Fix code formatting. Fix code formatting to get CTS CI builds clean. * Fix formatting fixes part-2 Another set of formatting fixes. * Fix code formatting part-3 Some more code formatting fixes. * Fix code formatting issues part-4 More code formatting fixes. * Formatting fixes part-5 Some more formatting fixes * Fix formatting part-6 More formatting fixes continued. * Code formatting fixes part-7 Code formatting fixes for image * Code formatting fixes part-8 Fixes for platform and device query tests. * Code formatting fixes part-9 More formatting fixes for vulkan_wrapper * Code formatting fixes part-10 More fixes to wrapper header * Code formatting fixes part-11 Formatting fixes for api_list * Code formatting fixes part-12 Formatting fixes for api_list_map. * Code formatting changes part-13 Code formatting changes for utility. * Code formatting fixes part-15 Formatting fixes for wrapper. * Misc Code formatting fixes Some more misc code formatting fixes. * Fix build breaks due to code formatting Fix build issues arised with recent code formatting issues. * Fix presubmit script after merge Fix presubmit script after merge conflicts. * Fix Vulkan loader build in presubmit script. Use cmake ninja and appropriate toolchain for Vulkan loader dependency to fix linking issue on arm/aarch64. * Use static array sizes Use static array sizes to fix windows builds. * Some left-out formatting fixes. Fix remaining formatting issues. * Fix harness header path Fix harness header path While at it, remove Misc and test pragma. * Add/Fix license information Add Khronos License info for test_vulkan. Replace Apple license with Khronos as applicable. * Fix headers for Mac OSX builds. Use appropriate headers for Mac OSX builds * Fix Mac OSX builds. Use appropriate headers for Mac OSX builds. Also, fix some build issues due to type-casting. * Fix new code formatting issues Fix new code formatting issues with recent MacOS fixes. * Add back missing case statement Add back missing case statement that was accidentally removed. * Disable USE_GAS for Vulkan Loader build. Disable USE_GAS for Vulkan Loader build to fix aarch64 build. * Update Copyright Year. Update Copyright Year to 2022 for external memory sharing tests. * Android specific fixes Android specific fixes to external sharing tests. * Add tests for cl_khr_subgroup_rotate (#1439) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix newline in sample_image_pixel_float_offset log (#1446) * Fix math tests to allow ftz in relaxed mode. (#1371) * Fix math tests to allow ftz in relaxed mode. In recent spec clarification, it is agreed that ftz is a valid optimization in case of cl-fast-math-relaxed and doesn't require cl-denorms-are-zero to be passed explicitly to enforce ftz behavior for implementations that already support this. GitHub Spec Issue OpenCL-Docs#579 GitHub Spec Issue OpenCL-Docs#597 GitHub CTS Issue OpenCL-CTS#1267 * Update cl_khr_extended_async_copies tests to the latest extension version (#1426) * Update cl_khr_extended_async_copies tests to the latest version of the extension Update the 2D and 3D extended async copies tests. Previously they were based on an older provisional version of the extension. Also update the variable names to only use 'stride' to refer to the actual stride values. Previously the tests used 'stride' to refer to the end of one line or plane and the start of the next. This is not the commonly understood meaning. * Address cl_khr_extended_async_copies PR feedback * Remove unnecessary parenthesis in kernel code * Make variables `const` and rearrange so that we can reuse variables, rather than repeating expressions. * Add in missing vector size of 3 for 2D tests * Use C++ String literals for kernel code Rather than C strings use C++11 string literals to define the kernel code in the extended async-copy tests. Doing this makes the kernel code more readable. Co-authored-by: Ewan Crawford <ewan@codeplay.com> * Fix function name in error messages (#1450) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Use clProgramWrapper in math_brute_force (#1451) Simplify code by avoiding manual resource management. This allows removing clReleaseProgram from `MakeKernels` to reduce behavioral differences between `MakeKernels` and `MakeKernel`. Original patch by Marco Antognini. Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Share BuildKernelInfo struct definition (#1453) Move the main `BuildKernelInfo` definition into `common.h` to reduce code duplication. Some tests (e.g. `i_unary_double.cpp`) use a different struct; rename those structs to `BuildKernelInfo2` for now to avoid ambiguity. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Tidy up subgroup log messages (#1454) Add missing newlines and improve wording of messages. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Add missing external memory/sync extensions to list of known khr extensions (#1455) Signed-off-by: Kévin Petit <kpet@free.fr> * Fix misleading indentation and enable -Wmisleading-indentation (#1458) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix indentation of test_waitlists.cpp (#1459) * fix indentation of test_waitlists.cpp Followup of #1458 * run formatter * Tidy up BuildKernelInfo (#1461) Remove the `offset` field from both structures, because it was always set to the global `gMinVectorSizeIndex`. Improve documentation and rename some variables: - `i` becomes `vectorSize`; - `kernel_count` becomes `threadCount`. Original patch by Marco Antognini. Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Remove unused variables in subgroup tests (#1460) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix test_select verification failure reporting (#1462) When verification of the computed result fails, the test would still report as "passed". This is because `s_test_fail` is only written to and never read. Fix the immediate issue by returning a failure value and incrementing `gFailCount` if any error was detected. The error handling can be improved further, but I'm leaving that out of the scope of this fix. Fixes https://github.com/KhronosGroup/OpenCL-CTS/issues/1445 Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] Fix missing `double_double.lo` initializer (#1466) Fixes a missing-field-initializers warning. The original intent was most likely to initialize both fields (similar to other functions in this file), but a `,` was missed. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] Use Unix-style line endings (#1468) Use the same line ending style across all source files. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] Fix sign-compare warnings in math_brute_force (#1467) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Use clCommandQueueWrapper in math_brute_force (#1463) Simplify code by avoiding manual resource management. This commit only modifies tests that use one queue per thread. The other unmodified tests are single-threaded and use the global `gQueue`. Original patch by Marco Antognini. Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Fix test skipping in math_brute_force (#1475) Commit 9666ca3c ("[NFC] Fix sign-compare warnings in math_brute_force (#1467)", 2022-08-23) inadvertently changed the semantics of the if condition. The `i > gEndTestNumber` comparison was relying on `gEndTestNumber` being promoted to unsigned. When casting `i` to `int32_t`, this promotion no longer happens and as a result any tests given on the command line were being skipped. Use an unsigned type for `gStartTestNumber` and `gEndTestNumber` to eliminate the casts and any implicit conversions between signed and unsigned types. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * support format CL_ABGR (#1474) * support format CL_ABGR add code to handle format CL_ABGR * Update imageHelpers.h * fix format * Initial command-buffer extension tests (#1368) * Initial command-buffer tests Introduce some basic testing of the [cl_khr_command_buffer](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer) extension. This is intended as a starting point from which we can iteratively build up tests for the extension collaboratively. * Move tests into derived classes * Move tests from methods into derived classes implementing a `Run()` interface. * Fix memory leak when command_buffer isn't freed when a test is skipped. * Print correct error code for `CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR` * Pass `nullptr` for queue parameter to command recording entry-points * Define command-buffer type wrapper Other OpenCL object have a wrapper to reference count their use and free the wrapped object. The command-buffer object can't use the generic type wrappers which are templated on the appropriate release/retain function, as the release/retain functions are queried at runtime. Instead, define our own command-buffer wrapper class where a base object is passed on construction which contains function pointers to the release/retain functions that can be used in the wrapper. * Use create_single_kernel_helper_create_program Use `create_single_kernel_helper_create_program` rather than hardcoding `clCreateProgramWithSource` to allow for other types of program input. Also fix bug using wrong enum for passing properties on command-buffer creation, should be `CL_COMMAND_BUFFER_FLAGS_KHR` * Add out-of-order command-buffer test Introduce a basic test for checking sync-point use with out-of-order command-buffers. This also includes better checking of required queue properties. * Use clMemWrapper in math_brute_force (#1476) Simplify code by avoiding manual resource management. Original patch by Marco Antognini. Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * fix test kernel attributes when api fcts are failing (#1449) test_error returns the err given as the first argument. As the run_test function returns a bool, we end up returning true (meaning pass) when an api function fails. Instead return explicitly false (meaning fail). * Use size_t instead of cl_int (#1414) * Use size_t instead of cl_int Memory is allocated for cl_int, but mapped as size_t. Use size_t instead of cl_int during allocation and mapping for consistency. * Use size_t instead of cl_int Memory is allocated for cl_int, but mapped as size_t. Use size_t instead of cl_int during allocation and mapping for consistency. * Use size_t instead of cl_int Memory is allocated for cl_int, but mapped as size_t. Use size_t instead of cl_int during allocation and mapping for consistency. * Remove test_half changes. Remove test_half changes from other fix that got included in this commit. * Final formatting fix. * Update known extensions in compiler define test (#1480) Add [cl_khr_command_buffer_mutable_dispatch](https://github.com/KhronosGroup/OpenCL-Docs/pull/819), [cl_khr_subgroup_rotate](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_subgroup_rotate), and [cl_khr_extended_async_copies](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_extended_async_copies) to the list of known extensions used in `test_compiler_defines_for_extensions` * Minimum 2 non atomic variables per thread for the c11 atomic fence test for embedded profile devices. (#1452) * Minimum 2 Non atomic variables per thread for an embedded profile device - https://github.com/KhronosGroup/OpenCL-CTS/issues/1274 * Formatting * [NFC] Fix whitespace issues in run_conformance.py (#1491) Fix whitespace issues and remove superfluous parens in the run_conformance.py script. This addresses 288 out of the 415 issues reported by pylint. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * [NFCI] Remove unused variables and enable -Wunused-variable (#1483) Remove unused variables throughout the code base and enable the `-Wunused-variable` warning flag globally to prevent new unused variable issues being introduced in the future. This is mostly a non-functional change, with one exception: - In `test_conformance/api/test_kernel_arg_info.cpp`, an error check of the clGetDeviceInfo return value was added. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] Fix unused variable warning in Release builds (#1494) The condition inside the assert is dropped in Release builds, so `num_printed` becomes unused. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Minor cleanups for run_conformance.py (#1492) Use the print function from futures for Python 3 compatibility, remove an unreachable statement, remove unused imports, and add a missing sys.exit call when opening the log file fails. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Use correct size for memory allocation in SVM test (#1496) Memory is allocated for cl_int, but mapped as size_t. Use size_t instead of cl_int during allocation and mapping for consistency. * [NFC] Reformat code in events test (#1497) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * [NFC] Declare format tables as const (#1493) Without const, these variables would be flagged up by `-Wunused-variable`. Drop `struct` from the declarations as that is not needed in C++. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] Fix typo (enevt_type -> event_type) (#1498) Signed-off-by: Stuart Brady <stuart.brady@arm.com> * gles: Limit variable definition to the same scope as usage (#1495) Fix unused-variable errors by limiting variable definition to the case that would use it * Tests for cl-ext-image-from-buffer and cl-ext-image-requirements-info (#1438) * Add CTS tests for cl_ext_image_requirements_info Change-Id: I20c1c77ff5ba88eb475801bafba30ef9caf82601 * Add CTS tests for cl_ext_image_from_buffer Change-Id: Ic30429d77a1317d0fea7d9ecc6d603267fa6602f * Fixes for image_from_buffer and image_requirements extension * Use CL_MEM_READ_WRITE flag when creating images that support CL_MEM_KERNEL_READ_AND_WRITE (#1447) * format fixes Change-Id: I04d69720730440cb61e64fed2cb5065b2ff8bf90 Co-authored-by: Oualid Khelifi <oualid.khelifi@arm.com> Co-authored-by: oramirez <oramirez@qti.qualcomm.com> Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com> * Include release builds in GitHub Actions (#1486) The "Ninja" CMake generator does not support multiple configurations, i.e. it does not support use of the '--config' option when running 'cmake --build'. As such, the default configuration (i.e. Debug) was getting used for all builds. Use the CMAKE_BUILD_TYPE variable instead, so that we do release builds, but change one build (ubuntu-20.04 aarch64) to use Debug as its build type, to keep some build coverage for asserts, etc. For Vulkan-Loader and OpenCL-ICD-Loader, we do release builds unconditionally, as we assume there is no need in the CI workflow to actually run the binaries that are built, and therefore no need for any additional debug info. Signed-off-by: Stuart Brady <stuart.brady@arm.com> * Fix more warnings in math_brute_force (#1502) * Fix "‘nadj’ may be used uninitialized in this function [-Werror=maybe-uninitialized]". * Fix "specified bound 4096 equals destination size [-Werror=stringop-truncation]". Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Improve MTdataHolder design and use it in math_brute_force (#1490) Improve the design of the MTdataHolder wrapper: * Make it a class instead of a struct with a private member, to make it clearer that there is no direct access to the MTdata member. * Make the 1-arg constructor `explicit` to avoid unintended conversions. * Forbid copy construction/assignment as MTdataHolder is never initialised from an MTdataHolder object in the codebase. * Define move construction/assignment as per the "rule of five". Use the MTdataHolder class throughout math_brute_force, to simplify code by avoiding manual resource management. Original patch by Marco Antognini. Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Fix memory oob problem in test half (#1489) Allocate memory for argc arguments instead of argc - 1. * [NFC] Enable -Wall for math_brute_force (#1477) math_brute_force compiles cleanly with `-Wall` currently, so avoid regressing from that state. Ideally we would enable `-Wall` in the top-level CMakeLists.txt, but other tests do not compile cleanly with `-Wall` yet. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Update extension list of test_compiler (#1507) * Update extension list of test_compiler Upate extension list of test_compiler with missing external memory and semaphore extensions * cmake: Add set_gnulike_module_compile_flags (#1510) Factor out a macro to set module-specific compilation flags for GNU-like compilers. This simplifies setting compilation flags per test. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Remove __DATE__ and __TIME__ usage (#1506) These macros make the build non-deterministic. * [NFC] Fix typo in clang-format directive (#1512) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Creating common functions for image/kernel_read_write read tests (#1141) * Make InitFloatCoords suitable for all image types Contributes #616 * Create common functions neutral for image types Remove 3D specific code from common test_read_image so using it for other image types is simpler in following patches Contributes #616 * Removing unused code Tidying commented out or unnecessary code Contributes #616 Signed-off-by: Ellen Norris-Thompson <ellen.norris-thompson@arm.com> * Restoring 'lod' variable name Contributes #616 * Default cases to handle unsupported image types Contributes #616 * Resolving build issues Contributes #616 * Fix formatting Contributes #616 * Using TEST_FAIL as an error code. Contributes #616 * Add static keyword, improve error handling Contributes #616 * Fix build errors with least disruption Contributes #616 Signed-off-by: Ellen Norris-Thompson <ellen.norris-thompson@arm.com> * SVM: Fix memory allocation size. (#1514) * SVM: Fix memory allocation size. 9ad48998 generally made memory allocation and mapping consistent with a size of size_t. Apply that fix to the final two allocations. * check-format fixes Co-authored-by: spauls <spauls@qti.qualcomm.com> * [NFC] Avoid mixing signed and unsigned in subhelpers run (#1505) Fix a `-Wsign-compare` warning in the `run()` function, which resulted in many repeated warnings when compiling with `-Wall` due to the many template instantiations. Both `clGetKernelSubGroupInfo` queries return a `size_t`, so it is unclear why the results of these queries were being cast to `int`. The `dynsc` uses don't seem to work with negative values, so make the field unsigned. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] clang-format test_atomics (#1516) Add some clang-format off/on comments to keep lists and kernel code readable. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] atomics: Remove set-but-unused "succeed" variables (#1517) The "succeed" variables are never read and they don't seem to serve any purpose that's not already provided by the "fail" variables. In `add_index_bin_test` the "fail" variable is also set but unused, but that may require an actual fix, so leaving that out of this commit. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * math_brute_force: Fix -Wformat warnings (#1518) * math_brute_force: Fix -Wformat warnings The main sources of warnings were: * Printing of 64-bit types, which is now done using the `PRI*64` macros from <cinttypes> to ensure portability across 32 and 64-bit builds. * Printing of `size_t` types that lacked a `z` length modifier. * Printing of values with a `z` length modifier that weren't a `size_t` type. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * [NFC] math_brute_force: clang-format after -Wformat changes Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * Add Python 3 support to run_conformance.py (#1470) * Add missing type declaration (#1520) Add a missing type declaration to OpenCL C code strings in 2D async copy tests. * pipes: Fix typos in skip messages (#1523) Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * atomics: Fix -Wformat warnings (#1519) The main sources of warnings were: * Printing of `i` which is a `size_t` requiring the `%zu` specifier. * Printing of `cl_long` which is now done using the `PRId64` macro to ensure portability across 32 and 64-bit builds. Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> * External sharing new updates (#1482) * Fix enqueue_flags test to use correct barrier type. Currently, enqueue_flags test uses CLK_LOCAL_MEM_FENCE. Use CLK_GLOBAL_MEM_FENCE instead as all threads across work-groups need to wait here. * Add check for support for Read-Wrie images Read-Write images have required OpenCL 2.x. Read-Write image tests are already being skipped for 1.x devices. With OpenCL 3.0, read-write images being optional, the tests should be run or skipped depending on the implementation support. Add a check to decide if Read-Write images are supported or required to be supported depending on OpenCL version and decide if the tests should be run on skipped. Fixes issue #894 * Fix formatting in case of Read-Write image checks. Fix formatting in case of Read-write image checks. Also, combine two ifs into one in case of kerne_read_write tests * Fix some more formatting for RW-image checks Remove unnecessary spaces at various places. Also, fix lengthy lines. * Fix malloc-size calculation in test imagedim unsigned char size is silently assumed to be 1 in imagedim test of test_basic. Pass sizeof(type) in malloc size calculation. Also, change loop variable from signed to unsigned. Add checks for null pointer for malloced memory. * Initial CTS for external sharing extensions Initial set of tests for below extensions with Vulkan as producer 1. cl_khr_external_memory 2. cl_khr_external_memory_win32 3. cl_khr_external_memory_opaque_fd 4. cl_khr_external_semaphore 5. cl_khr_external_semaphore_win32 6. cl_khr_external_semaphore_opaque_fd * Updates to external sharing CTS Updates to external sharing CTS 1. Fix some build issues to remove unnecessary, non-existent files 2. Add new tests for platform and device queries. 3. Some added checks for VK Support. * Update CTS build script for Vulkan Headers Update CTS build to clone Vulkan Headers repo and pass it to CTS build in preparation for external memory and semaphore tests * Fix Vulkan header path Fix Vulkan header include path. * Add Vulkan loader dependency Vulkan loader is required to build test_vulkan of OpenCL-CTS. Clone and build Vulkan loader as prerequisite to OpenCL-CTS. * Fix Vulkan loader path in test_vulkan Remove arch/os suffix in Vulkan loader path to match vulkan loader repo build. * Fix warnings around getHandle API. Return type of getHandle is defined differently based on win or linux builds. Use appropriate guards when using API at other places. While at it remove duplicate definition of ARRAY_SIZE. * Use ARRAY_SIZE in harness. Use already defined ARRAY_SIZE macro from test_harness. * Fix build issues for test_vulkan Fix build issues for test_vulkan 1. Add cl_ext.h in common files 2. Replace cl_mem_properties_khr with cl_mem_properties 3. Replace cl_external_mem_handle_type_khr with cl_external_memory_handle_type_khr 4. Type-cast malloc as required. * Fix code formatting. Fix code formatting to get CTS CI builds clean. * Fix formatting fixes part-2 Another set of formatting fixes. * Fix code formatting part-3 Some more code formatting fixes. * Fix code formatting issues part-4 More code formatting fixes. * Formatting fixes part-5 Some more formatting fixes * Fix formatting part-6 More formatting fixes continued. * Code formatting fixes part-7 Code formatting fixes for image * Code formatting fixes part-8 Fixes for platform and device query tests. * Code formatting fixes part-9 More formatting fixes for vulkan_wrapper * Code formatting fixes part-10 More fixes to wrapper header * Code formatting fixes part-11 Formatting fixes for api_list * Code formatting fixes part-12 Formatting fixes for api_list_map. * Code formatting changes part-13 Code formatting changes for utility. * Code formatting fixes part-15 Formatting fixes for wrapper. * Misc Code formatting fixes Some more misc code formatting fixes. * Fix build breaks due to code formatting Fix build issues arised with recent code formatting issues. * Fix presubmit script after merge Fix presubmit script after merge conflicts. * Fix Vulkan loader build in presubmit script. Use cmake ninja and appropriate toolchain for Vulkan loader dependency to fix linking issue on arm/aarch64. * Use static array sizes Use static array sizes to fix windows builds. * Some left-out formatting fixes. Fix remaining formatting issues. * Fix harness header path Fix harness header path While at it, remove Misc and test pragma. * Add/Fix license information Add Khronos License info for test_vulkan. Replace Apple license with Khronos as applicable. * Fix headers for Mac OSX builds. Use appropriate headers for Mac OSX builds * Fix Mac OSX builds. Use appropriate headers for Mac OSX builds. Also, fix some build issues due to type-casting. * Fix new code formatting issues Fix new code formatting issues with recent MacOS fixes. * Add back missing case statement Add back missing case statement that was accidentally removed. * Disable USE_GAS for Vulkan Loader build. Disable USE_GAS for Vulkan Loader build to fix aarch64 build. * Fixes to OpenCL external sharing tests Fix clReleaseSemaphore() API. Fix copyright year. Some other minor fixes. * Improvements to OpenCL external sharing CTS Use SPIR-V shaders instead of NV extension path from GLSL to Vulkan shaders. Fixes for lower end GPUs to use limited memory. Update copy-right year at some more places. * Fix new code formatting issues. Fix code formatting issues with recent changes for external sharing tests. * More formatting fixes. More formatting fixes for recent updates to external sharing tests. * Final code formatting fixes. Minor formatting fixes to get format checks clean. * remove implicit conversion to pointer to fix 32-bit compile (#1488) * remove implicit conversion to pointer to fix 32-bit compile * fix formatting * Cap CL_DEVICE_MAX_MEM_ALLOC_SIZE to SIZE_MAX (#1501) * Fix enqueue_flags test to use correct barrier type. Currently, enqueue_flags test uses CLK_LOCAL_MEM_FENCE. Use CLK_GLOBAL_MEM_FENCE instead as all threads across work-groups need to wait here. * Add check for support for Read-Wrie images Read-Write images have required OpenCL 2.x. Read-Write image tests are already being skipped for 1.x devices. With OpenCL 3.0, read-write images being optional, the tests should be run or skipped depending on the implementation support. Add a check to decide if Read-Write images are supported or required to be supported depending on OpenCL version and decide if the tests should be run on skipped. Fixes issue #894 * Fix formatting in case of Read-Write image checks. Fix formatting in case of Read-write image checks. Also, combine two ifs into one in case of kerne_read_write tests * Fix some more formatting for RW-image checks Remove unnecessary spaces at various places. Also, fix lengthy lines. * Fix malloc-size calculation in test imagedim unsigned char size is silently assumed to be 1 in imagedim test of test_basic. Pass sizeof(type) in malloc size calculation. Also, change loop variable from signed to unsigned. Add checks for null pointer for malloced memory. * Cap CL_DEVICE_MAX_MEM_ALLOC_SIZE to SIZE_MAX Cap CL_DEVICE_MAX_MEM_ALLOC_SIZE to SIZE_MAX when CL_DEVICE_GLOBAL_MEM_SIZE is capped with SIZE_MAX. test_allocation caps the value of GLOBAL_MEM_SIZE to SIZE_MAX if it exceeds the value of SIZE_MAX(value depends on platform bitness), but doesn’t modify MAX_ALLOC_SIZE the same way. Due to this MAX_ALLOC_SIZE becomes greater than GLOBAL_MEM_SIZE and the test fails. Modify MAX_MEM_ALLOC_SIZE as GLOBAL_MEM_SIZE when it exceeds SIZE_MAX OpenCL-CTS #1022 * Factor out GetTernaryKernel (#1511) Use a common function to create the kernel source code for testing 3-argument math builtins. This reduces code duplication. 1-argument and 2-argument math kernel construction will be factored out in future work. Change the kernels to use preprocessor defines for argument types and undef values, to make the CTS code easier to read. Co-authored-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Co-authored-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Kevin Petit <kevin.petit@arm.com> Signed-off-by: Kévin Petit <kpet@free.fr> Signed-off-by: Stuart Brady <stuart.brady@arm.com> Signed-off-by: Marco Cattani <marco.cattani@arm.com> Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Signed-off-by: Marco Antognini <marco.antognini@arm.com> Signed-off-by: Ellen Norris-Thompson <ellen.norris-thompson@arm.com> Co-authored-by: Kévin Petit <kpet@free.fr> Co-authored-by: James Price <jrprice@google.com> Co-authored-by: BKoscielak <bartosz.koscielak@intel.com> Co-authored-by: Ben Ashbaugh <ben.ashbaugh@intel.com> Co-authored-by: Grzegorz Wawiorko <grzegorz.wawiorko@intel.com> Co-authored-by: Sreelakshmi Haridas Maruthur <sharidas@quicinc.com> Co-authored-by: spauls <spauls@qti.qualcomm.com> Co-authored-by: kalchr01 <83217667+kalchr01@users.noreply.github.com> Co-authored-by: Feng Zou <feng.zou@intel.com> Co-authored-by: Senran (Stephen) Zhang <senran.zhang@intel.com> Co-authored-by: Jeremy Kemp <jeremy@jeremykemp.co.uk> Co-authored-by: Stuart Brady <stuart.brady@arm.com> Co-authored-by: marcat03 <94451804+marcat03@users.noreply.github.com> Co-authored-by: Ewan Crawford <ewan@codeplay.com> Co-authored-by: oramirez <oramirez@qti.qualcomm.com> Co-authored-by: Jim Lewis <j.lewis1@samsung.com> Co-authored-by: Alastair Murray <alastair.murray@codeplay.com> Co-authored-by: Jack Frankland <30410009+FranklandJack@users.noreply.github.com> Co-authored-by: Jason Tang <jason.tang@amd.com> Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Co-authored-by: Romaric Jodin <89833130+rjodinchr@users.noreply.github.com> Co-authored-by: Karol Herbst <karolherbst@gmail.com> Co-authored-by: Jason Ekstrand <jason.ekstrand@collabora.com> Co-authored-by: paulfradgley <39525348+paulfradgley@users.noreply.github.com> Co-authored-by: jansol <jhs@psonet.com> Co-authored-by: Ahmed <36049290+AhmedAmraniAkdi@users.noreply.github.com> Co-authored-by: Wenju He <wenju.he@intel.com> Co-authored-by: Nikhil Joshi <nikhilj@nvidia.com> Co-authored-by: Sven van Haastregt <sven.vanhaastregt@arm.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Matthias Diener <matthias.diener@gmail.com> Co-authored-by: stoneforestwhu <stoneforestwhu@gmail.com> Co-authored-by: niranjanjoshi121 <43807392+niranjanjoshi121@users.noreply.github.com> Co-authored-by: Kévin Petit <kevin.petit@arm.com> Co-authored-by: Oualid Khelifi <oualid.khelifi@arm.com> Co-authored-by: Krzysztof Kosiński <tweenk.pl@gmail.com> Co-authored-by: ellnor01 <51320439+ellnor01@users.noreply.github.com> Co-authored-by: victzhan <111778801+victzhan@users.noreply.github.com> Co-authored-by: Marco Antognini <marco.antognini@arm.com>
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes compilation of subgroup test kernels so that a separate compilation is no longer performed for each divergence mask value.
The divergence mask is now passed as a kernel argument.
This also fixes all subgroup_functions_non_uniform_arithmetic testing and the sub_group_elect and sub_group_any/all_equal subtests of the subgroup_functions_non_uniform_vote test to use the correct order of vector components for GPUs with a subgroup size greater than 64.
The conversion of divergence mask bitsets to uint4 vectors has been corrected to match code comments in WorkGroupParams::load_masks() in test_conformance/subgroups/subhelpers.h.
Signed-off-by: Stuart Brady stuart.brady@arm.com