{ai}[foss/2022a] PyTorch v2.0.1 #19066

Flamefire · 2023-10-24T07:38:35Z

(created using eb --new-pr)

…2.0.1_add-missing-vsx-vector-shift-functions.patch, PyTorch-2.0.1_avoid-test_quantization-failures.patch, PyTorch-2.0.1_disable-test-sharding.patch, PyTorch-2.0.1_fix-numpy-compat.patch, PyTorch-2.0.1_fix-shift-ops.patch, PyTorch-2.0.1_fix-skip-decorators.patch, PyTorch-2.0.1_fix-test_memory_profiler.patch, PyTorch-2.0.1_fix-test-ops-conf.patch, PyTorch-2.0.1_fix-torch.compile-on-ppc.patch, PyTorch-2.0.1_fix-ub-in-inductor-codegen.patch, PyTorch-2.0.1_fix-vsx-loadu.patch, PyTorch-2.0.1_no-cuda-stubs-rpath.patch, PyTorch-2.0.1_remove-test-requiring-online-access.patch, PyTorch-2.0.1_skip-diff-test-on-ppc.patch, PyTorch-2.0.1_skip-failing-gradtest.patch, PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch, PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch

Flamefire · 2023-10-25T01:39:29Z

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (1 easyconfigs in total)
taurusml3 - Linux RHEL 7.6, POWER, 8335-GTX, 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/9d9ef6f2149c6f32d63ae0c0bb95364f for a full test report.

boegel · 2023-10-25T07:51:07Z

@boegelbot please test @ jsc-zen2
CORE_CNT=16

boegelbot · 2023-10-25T07:55:08Z

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19066 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19066 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

exit code: 0
output:

Submitted batch job 3640

Test results coming soon (I hope)...

- notification for comment with ID 1778707471 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2023-10-25T13:27:38Z

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2g1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/6c8f0ed02d28e91f34b96ab6824d86f9 for a full test report.

SebastianAchilles · 2023-10-25T14:38:39Z

Test report by @SebastianAchilles
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
zen2-rockylinux-88 - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7452 32-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/SebastianAchilles/8a7ebe899a6aae5ae612763d13dd7dc7 for a full test report.

branfosj · 2023-10-25T15:56:32Z

Test report by @branfosj
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
bear-pg0105u03a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/639f0f4c14e5b35f5d1a30ffe7764d24 for a full test report.

boegel · 2023-10-25T19:32:40Z

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3100.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/c427b409621702145c5db62c883e0213 for a full test report.

boegel · 2023-10-26T07:37:24Z

@branfosj Can you dig into the log file and extract more details on the failing inductor/test_torchinductor_opinfo?

My vote goes to ignoring this test for now, so we can merge this PR and follow-up in another PR to get that (quirky?) test fixed, since we've seen success on a range of different systems here (incl. POWER!).

branfosj · 2023-10-26T08:00:48Z

@branfosj Can you dig into the log file and extract more details on the failing inductor/test_torchinductor_opinfo?

=
FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_index_add_cpu_float16 - RuntimeError: unexpected success index_add, torch.float16, cpu
FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_scatter_add_cpu_float16 - RuntimeError: unexpected success scatter_add, torch.float16, cpu
FAILED inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_scatter_reduce_sum_cpu_float16 - RuntimeError: unexpected success scatter_reduce.sum, torch.float16, cpu

With the traceback being short:

_
Traceback (most recent call last):
  File "/dev/shm/branfosj/build-up-EL8/PyTorch/2.0.1/foss-2022a/pytorch-v2.0.1/test/inductor/test_torchinductor_opinfo.py", line 606, in test_comprehensive
    raise RuntimeError(
RuntimeError: unexpected success scatter_reduce.sum, torch.float16, cpu

Flamefire · 2023-10-26T09:11:44Z

I added a patch to disable that check (and similar ones) by setting a flag (could also be set by an env var) which catches unexpected success in this test. This should make this test succeed without any potential influence on other tests. See https://github.com/pytorch/pytorch/blob/v2.0.1/test/inductor/test_torchinductor_opinfo.py#L605-L608

I think with that patch added we can consider the report as a success and merge this without another test (I verified with --stop=patch that the patch applies)

branfosj · 2023-10-26T17:18:41Z

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0105u03a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/f038e8f0042ddb24153dca4fe2ae31f0 for a full test report.

branfosj · 2023-10-26T17:19:08Z

Going in, thanks @Flamefire!

Flamefire · 2023-10-27T11:08:00Z

Test report by @Flamefire
SUCCESS
Build succeeded for 2 out of 2 (1 easyconfigs in total)
taurusi8018 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor, 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/d48255954eeb2cad36dec3a26d543612 for a full test report.

VRehnberg · 2023-11-06T11:53:56Z

Test report by @VRehnberg
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
alvis-s1 - Linux Rocky Linux 8.8, x86_64, Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, Python 3.6.8
See https://gist.github.com/VRehnberg/088e9766b72bdfeec1018b5e4ccd9b19 for a full test report.

VRehnberg · 2023-11-06T15:08:53Z

Test report by @VRehnberg
SUCCESS
Build succeeded for 3 out of 3 (1 easyconfigs in total)
alvis-c1 - Linux Rocky Linux 8.8, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/VRehnberg/42502f8695413eceea214c62a822f421 for a full test report.

Flamefire added 2 commits October 24, 2023 09:38

Downgrade sympy (duplicate in 2022a)

dd69d00

This comment was marked as outdated.

Sign in to view

Micket added the update label Oct 25, 2023

SebastianAchilles added this to the 4.x milestone Oct 25, 2023

branfosj mentioned this pull request Oct 26, 2023

{ai}[foss/2022b] PyTorch v2.0.1 #19067

Merged

Workaround test_torchinductor_opinfo failure

f7d8e97

Add patch description

0e3ef19

branfosj approved these changes Oct 26, 2023

View reviewed changes

branfosj modified the milestones: 4.x, next release (4.8.2?) Oct 26, 2023

branfosj merged commit c07c4b1 into easybuilders:develop Oct 26, 2023
5 checks passed

Flamefire deleted the 20231024093827_new_pr_PyTorch201 branch October 26, 2023 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{ai}[foss/2022a] PyTorch v2.0.1 #19066

{ai}[foss/2022a] PyTorch v2.0.1 #19066

Flamefire commented Oct 24, 2023

This comment was marked as outdated.

Flamefire commented Oct 25, 2023

boegel commented Oct 25, 2023

boegelbot commented Oct 25, 2023

boegelbot commented Oct 25, 2023

SebastianAchilles commented Oct 25, 2023

branfosj commented Oct 25, 2023

boegel commented Oct 25, 2023

boegel commented Oct 26, 2023

branfosj commented Oct 26, 2023

Flamefire commented Oct 26, 2023

branfosj commented Oct 26, 2023

branfosj commented Oct 26, 2023

Flamefire commented Oct 27, 2023

VRehnberg commented Nov 6, 2023

VRehnberg commented Nov 6, 2023

{ai}[foss/2022a] PyTorch v2.0.1 #19066

{ai}[foss/2022a] PyTorch v2.0.1 #19066

Conversation

Flamefire commented Oct 24, 2023

This comment was marked as outdated.

Flamefire commented Oct 25, 2023

boegel commented Oct 25, 2023

boegelbot commented Oct 25, 2023

boegelbot commented Oct 25, 2023

SebastianAchilles commented Oct 25, 2023

branfosj commented Oct 25, 2023

boegel commented Oct 25, 2023

boegel commented Oct 26, 2023

branfosj commented Oct 26, 2023

Flamefire commented Oct 26, 2023

branfosj commented Oct 26, 2023

branfosj commented Oct 26, 2023

Flamefire commented Oct 27, 2023

VRehnberg commented Nov 6, 2023

VRehnberg commented Nov 6, 2023