-
Notifications
You must be signed in to change notification settings - Fork 12
{2023.06}[foss/2023a] TensorFlow v2.15.1 w/ CUDA 12.1.1 + eb_hooks.py #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
New job on instance
|
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
New job on instance
|
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
New job on instance
|
e877c9e
to
5befb75
Compare
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic |
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90 |
New job on instance
|
The failure is:
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90 |
New job on instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80 |
New job on instance
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Instance
|
Instance
|
Instance
|
bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:cascaselake accel:nvidia/cc70
I think it might be better to always open a secondary pr where you do the actual testing to make sure that non of the builds get deployed from the pr and only the changed scripts like I did with #49 and #22. But I know that I said that I was gonna write out the policy but I have not gotten to it. |
@laraPPr marking the PR as draft as long as the easystack file is in there probably helps, but indeed, we may need to come up with a better approach |
It is gonna fail the test step but lets see for the build step. |
The gent bot crashed because of a local problem. I'll update the reframe_config and try again later. |
Ah no it does seem still alive but I made a mistake in the comment so lets see if this works: |
Thirds the charm I hope |
No job is being created for some reason and I can't tell why
|
Debug building with new bot release... |
No job was submitted, possibly because the
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90 |
New job on instance
|
Try cross-compiling for cc80... |
Supplying several values for the |
New job on instance
|
bot: show_config |
Instance
|
Instance
|
Instance
|
Instance
|
Instance
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70 |
New job on instance
|
@TopRichard can you sync this pr with the main branch because I think it is not picking up these changes #59 |
…into TensorFlow-CUDA
@TopRichard apparently its a bigger issue so I'm moving my experementing to EESSI/software-layer#1147 |
This PR uses a CUDA-ARM patch to workaround the previously seen error:
On x86_64 with cc80:
CPU tests:
Executed 847 out of 847 tests: 847 tests pass.
GPU tests
Executed 189 out of 189 tests: 189 tests pass.
On aarch64 with cc90 :
CPU tests:
Executed 847 out of 847 tests: 847 tests pass.
GPU tests
Executed 189 out of 189 tests: 188 tests pass and 1 fails locally