-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ai}[foss/2023b] PyTorch v2.3.0 #20489
base: develop
Are you sure you want to change the base?
{ai}[foss/2023b] PyTorch v2.3.0 #20489
Conversation
…2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch, PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch
Tests that are failing for me are:
|
@Flamefire |
@boegelbot Please test @ jsc-zen3 |
@akesandgren: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2098172875 processed Message to humans: this is just bookkeeping information for me, |
Test report by @akesandgren |
Test report by @boegelbot |
Test report by @akesandgren |
Interesting... |
Not unusual for PyTorch ;-) |
These fail because SANDCASTLE=1 when run as part of build
And those are the diff between my standalone test run (which was without SANDCASTLE) and the test-while-building |
@Flamefire Do you know why we set SANDCASTLE=1 in the easyblock? |
Yes, there are a lot of things like So we might need to patch those failing ones. For |
I'm doing a test without SANDCASTLE set and test_hub disabled, that's one of only two I found that is doing external downloads, the other being one test in test_nn. |
And I have manually run the full test suite without SANDCASTLE set on a previous build and saw only 3 failed tests. |
Might be. I used it because it disable a LOT of tests, especially those downloading stuff IIRC. See https://github.com/search?q=repo%3Apytorch%2Fpytorch%20IS_SANDCASTLE&type=code Two such instances seems to skip whole classes of tests at once: https://github.com/pytorch/pytorch/blob/20aa7cc6788ff10dee2d927057b10a81af638a32/test/jit/test_backends.py#L69-L73 and https://github.com/pytorch/pytorch/blob/2e4d0111953e6db7e4ce5cf041e6a78770092495/test/jit/test_torchbind.py#L37-L38
If it is indeed the case that now NOT setting it causes fewer failures then we should. Best to condition it on 2.3+ to not introduce regressions. I'll try to push a change upstream to use something like |
easybuild/easyconfigs/t/tlparse/tlparse-0.3.5-GCCcore-13.2.0.eb
Outdated
Show resolved
Hide resolved
We have another issue: pytest-rerun-failures interferes with our test parsing. We want some output like
But now we get:
|
@boegelbot Please test @ jsc-zen3 |
@akesandgren: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2112638268 processed Message to humans: this is just bookkeeping information for me, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optree requires typing-extensions/4.10.0-GCCcore-13.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? I installed it just fine:
... python -m pip check
completed successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you getting typing-extensions? It is not part of Python-3.11.5-GCCcore-13.2.0.eb. optree build fails without typing-extensions.
== installing...
== ... (took 29 secs)
== taking care of extensions...
== restore after iterating...
== postprocessing...
== sanity checking...
== ... (took 3 secs)
== FAILED: Installation ended unsuccessfully (build directory: /build/optree/0.11.0/GCCcore-13.2.0): build failed (first 300 chars): `/app/software/Python/3.11.5-GCCcore-13.2.0/bin/python -m pip check` failed:
optree 0.11.0 requires typing-extensions, which is not installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you need to reinstall Python. The current develop version and release 4.9.1 contains it:
easybuild-easyconfigs/easybuild/easyconfigs/p/Python/Python-3.11.5-GCCcore-13.2.0.eb
Lines 56 to 58 in 43ff814
('typing_extensions', '4.8.0', { | |
'checksums': ['df8e4339e9cb77357558cbdbceca33c303714cf861d1eef15e1070055ae8b7ef'], | |
}), |
However it was a change between 4.8.2 and 4.9.x by #19777
From the looks of that PR this was made because too many other ECs depended on that. And IMO it makes sense to include it in Python by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, --rebuild --skip added four packages. This will fix many things for me.
== installing extension tomli 2.0.1 (1/4)...
== configuring...
== building...
== testing...
== installing...
== ... (took 11 secs)
== installing extension packaging 23.2 (2/4)...
== configuring...
== building...
== testing...
== installing...
== ... (took 2 secs)
== installing extension typing_extensions 4.8.0 (3/4)...
== configuring...
== building...
== testing...
== installing...
== ... (took 2 secs)
== installing extension setuptools-scm 8.0.4 (4/4)...
Test report by @boegelbot |
@Flamefire Any ideas on how to deal with the error output parsing problem? |
Not many. I still have an open issue for that: pytorch/pytorch#126523 No luck so far to get a machine readable output from PyTorch directly. I.e. I wanted them to get the We could try to get that option working by patching the test files to make sure Another option would be to revert their changes to the rerun feature using a custom implementation that broke our detection: pytorch/pytorch@3b7d60b That might get difficult to keep going forward but I don't see any current alternatives. |
(created using
eb --new-pr
)