Enable previously disabled FA related Operators in UTs #1389

xinyazhang · 2024-04-08T19:16:14Z

They were disabled in AOTriton V1, but V2 should fix most of them.

Passed with

PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_meta.py -k flash_attention -v
PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_ops.py -k flash_attention -v
PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_meta.py -k functional_scaled_dot_product_attention_cuda -v
PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_ops.py -k functional_scaled_dot_product_attention_cuda -v

* changes to build Centos stream 9 images * Added scripts for centos and centos stream images * Added an extra line * Add ninja installation * Optimized code * Fixes * Add comment * Optimized code * Added AMDGPU mapping for ROCm 5.2 and invalid-url for rocm_baseurl Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>

- Rocblas API support is requested - SWDEV-383635 & sub task - SWDEV-390218

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>

* Add hip_basic tensorpipe support to PyTorch * Enabling hip_basic for Tensorpipe for pyTorch * removing upstream tensorpipe module * Adding ROCm specific tensopipe submodule * tensorpipe submodule updated * Update the hip invalid device string * Added ignore for tensorpipe git submodule * Moved include of tensorpipe_cuda.h to hipify * Updates based on review comments * Defining the variable __HIP_PLATFORM_AMD__ * Enabling the UTs Co-authored-by: Ronak Malik <Ronak.Malik@amd.com>

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

To resolve https://ontrack-internal.amd.com/browse/SWDEV-403530 and https://ontrack-internal.amd.com/browse/SWDEV-419837. For more context check upstream issue pytorch#111834

Reversed the condition as required

- Add missing common_utils.sh - Update the install vision part - Move to amdgpu rhel 9.3 builds - Update to pick python from conda path - Add a missing package - Add ROCM_PATH and magma - Updated repo radeon path

This also fixes a problem in gesvd driver when UV is not needed.

- build_environment is hard coded to value from upstream when branch for created, since the dev/QA ENV build_environment value can be varing

* Fix the parsing of /etc/os-release The old code parses OS_DISTRO as 'PRETTY_Ubuntu' on Ubuntu and thus never links to libtinfo correctly. * Configurable CMAKE_PREFIX_PATH in CI script.

- This is done as per QA request, needs to be reverted and not required to be cherry-picked into later releases.

* Moved NAVI check to the test file * Revised NAVI check as a function

TestReductionsCUDA.test_nansum_out_dtype_cuda_float32 would fail or pass depending on the random inputs. Observed by ROCm internal QA testing.

IFU cherry-picks into rocm6.2_internal_testing

- Commit from branch pytorch/rocm6.2_internal_testing

* Running triton kernel on ROCM only has one GB/s metric reported * Update test_kernel_benchmark.py

C++20 mangling rules were recently added to hip-clang. This flag maintains compatibility since pytorch is at C++17. Otherwise the linker fails.

pruthvistony and others added 30 commits March 12, 2024 11:53

Add the related_commits file

d83f528

Add the UT test_times_file

f2f5b5d

Updated to latest conda for CentOS stream 9

3f71e55

Temporarily skip test_conv3d_64bit_indexing

8661299

- Rocblas API support is requested - SWDEV-383635 & sub task - SWDEV-390218

Sync updates from hipify_torch. (#1168)

3632eec

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>

Updates to build on Jammy

716b6e2

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

Fix lstsq related regressions (part of SWDEV-392820)

4042400

[UB22.04] Updates to support latest scipy

21d1194

Build required version of libpng for CentOS7

0f370f9

Update tensorpipe submodule to support ROCm 6.0

95c4b69

Set ROCM_PATH in env for centOS docker container

f689fa1

Updated condition for libstc++ for Jammy

17460f9

Skip ddp apply_optim_in_bwd tests for gloo (#1302)

b1f6459

To resolve https://ontrack-internal.amd.com/browse/SWDEV-403530 and https://ontrack-internal.amd.com/browse/SWDEV-419837. For more context check upstream issue pytorch#111834

Changes to support docker v23

a209b77

Reversed the condition as required

[CS9] Updates to CentOS stream 9 build (#1326)

44f7860

- Add missing common_utils.sh - Update the install vision part - Move to amdgpu rhel 9.3 builds - Update to pick python from conda path - Add a missing package - Add ROCM_PATH and magma - Updated repo radeon path

Update to hipify mapping

3277ca0

Correcting usage of USE_ROCM

9aa41ee

Enable gesvda for ROCM >= 6.1 (#1339)

b3ac140

This also fixes a problem in gesvd driver when UV is not needed.

Increase lifespan of test-times files

eb77bc2

- build_environment is hard coded to value from upstream when branch for created, since the dev/QA ENV build_environment value can be varing

Fixes CI build script (#1350)

6bd1ef8

* Fix the parsing of /etc/os-release The old code parses OS_DISTRO as 'PRETTY_Ubuntu' on Ubuntu and thus never links to libtinfo correctly. * Configurable CMAKE_PREFIX_PATH in CI script.

[NO CP] Temporary dumping of test exec log to stderr

acff1c1

- This is done as per QA request, needs to be reverted and not required to be cherry-picked into later releases.

Add skipIfRocmArch decorator for Navi skips (#1356)

0b08ab6

Converted NAVI check as a function (#1364)

d0d25c3

* Moved NAVI check to the test file * Revised NAVI check as a function

Triton build conditionalized on ROCM_VERSION

7c76d3b

relax tol for flaky nansum_out_dtype_cuda_float32 test

ca9a30c

TestReductionsCUDA.test_nansum_out_dtype_cuda_float32 would fail or pass depending on the random inputs. Observed by ROCm internal QA testing.

Remove ROCmloops specific test

ab06040

Removing DOCKER_BUILDKIT=1 to support internal CI

8036be2

Merge pull request #1367 from ROCm/IFU_CP_03122024

d4ce7aa

IFU cherry-picks into rocm6.2_internal_testing

pruthvistony and others added 6 commits March 12, 2024 15:39

Triton commit update

a6feb59

- Commit from branch pytorch/rocm6.2_internal_testing

Bad import in test_torchinductor and skip torchvision related UT (#1374)

c838806

skip test_inductor_freezing failing UTs (#1375)

a990b07

Skip test_mm_triton_kernel_benchmark (#1376)

5d7b0c3

* Running triton kernel on ROCM only has one GB/s metric reported * Update test_kernel_benchmark.py

add -fclang-abi-compat=17 to HIP_HIPCC_FLAGS (#1377)

de739ad

C++20 mangling rules were recently added to hip-clang. This flag maintains compatibility since pytorch is at C++17. Otherwise the linker fails.

Enable previously disabled FA related Operators in UTs

2f4f158

pruthvistony force-pushed the rocm6.2_internal_testing branch from 1b2a3a0 to 633a013 Compare April 23, 2024 01:04

xinyazhang requested review from jeffdaily, jithunnair-amd, jataylo and pruthvistony as code owners April 23, 2024 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable previously disabled FA related Operators in UTs #1389

Enable previously disabled FA related Operators in UTs #1389

xinyazhang commented Apr 8, 2024 •

edited

Enable previously disabled FA related Operators in UTs #1389

Are you sure you want to change the base?

Enable previously disabled FA related Operators in UTs #1389

Conversation

xinyazhang commented Apr 8, 2024 • edited

xinyazhang commented Apr 8, 2024 •

edited