Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patches for PyTorch v2.1.2 with foss/2023a to fix test failures on non-x86 platforms #19573

Merged

Conversation

Flamefire
Copy link
Contributor

(created using eb --new-pr)

@boegel boegel changed the title Improve PyTorch-2.1.2-foss-2023a for non-x86 add patches for PyTorch v2.1.2 to fix test failures on non-x86 platforms Jan 12, 2024
@boegel boegel added the bug fix label Jan 12, 2024
@boegel boegel added this to the release after 4.9.0 milestone Jan 12, 2024
@boegel
Copy link
Member

boegel commented Jan 12, 2024

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=19573 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19573 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12586

Test results coming soon (I hope)...

- notification for comment with ID 1889714999 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/89e4978e2b391cf92c7ea96ee800d632 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 13, 2024

These additional patches help to fix ~10 failing tests on aarch64/*, see overview in EESSI/software-layer#444 (comment)

@boegel boegel added EESSI Related to EESSI project aarch64 Related to Arm 64-bit (aarch64) labels Jan 13, 2024
@boegel boegel changed the title add patches for PyTorch v2.1.2 to fix test failures on non-x86 platforms add patches for PyTorch v2.1.2 with foss/2023a to fix test failures on non-x86 platforms Jan 13, 2024
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
i8015 - Linux Rocky Linux 8.7, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/Flamefire/97a7fc29ec06de27b8e759129b1ede8c for a full test report.

@boegel
Copy link
Member

boegel commented Jan 13, 2024

@Flamefire There's some German swear words in the errors, but the problems there don't seem related to the changes in this PR at all?

@boegel
Copy link
Member

boegel commented Jan 13, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3135.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/3f695b3a5a82e67f61d4070891d3ba05 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 13, 2024

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19573 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19573 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4073

Test results coming soon (I hope)...

- notification for comment with ID 1890753405 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Flamefire
Copy link
Contributor Author

@Flamefire There's some German swear words in the errors, but the problems there don't seem related to the changes in this PR at all?

😆 Just LMod complaining that it found foss/2023a but not OpenBLAS. Will need to check why...

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/7954b383daf4c40132f6f24442d07ae8 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 15, 2024

Going in, thanks @Flamefire!

@boegel boegel merged commit c774c8f into easybuilders:develop Jan 15, 2024
9 checks passed
@Flamefire Flamefire deleted the 20240112153558_new_pr_PyTorch212 branch January 15, 2024 09:43
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
n1505 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/01be78875bc12e1980e841776f36048f for a full test report.

@boegel
Copy link
Member

boegel commented Jan 15, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3171.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/1f010d0b36b16ebcb141df586e03cae9 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
i8024 - Linux Rocky Linux 8.7, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 545.23.08, Python 3.6.8
See https://gist.github.com/Flamefire/822fcc35be309b7b8262175b7c580c8d for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusml16 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/a082566497d3c48fc03e5df1064dc55a for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aarch64 Related to Arm 64-bit (aarch64) bug fix EESSI Related to EESSI project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants