Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix --sanity-check-only and --module-only for UCX plugins #3007

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Sep 25, 2023

(created using eb --new-pr)

The self.plugins variable was initialized empty and only populated in the configure step. This leads to errors in the sanity check when the configure step is not run.
Convert to a property which is lazy-initialized.

Note: I was thinking about querying the dependencies instead of using get_software_root as this implementation might fail when self.plugins is accessed without the modules loaded. But I think this cannot happen, can it?

@boegel boegel added the bug fix label Sep 26, 2023
@boegel boegel added this to the next release (4.8.2?) milestone Sep 26, 2023
@boegel boegel changed the title fix sanity-check-only and module-only for UCX plugins fix --sanity-check-only and --module-only for UCX plugins Sep 27, 2023
boegel
boegel previously approved these changes Sep 27, 2023
@boegel
Copy link
Member

boegel commented Sep 27, 2023

@boegelbot please test @ generoso
EB_ARGS="UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb"

@boegelbot
Copy link

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=3007 EB_ARGS="UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks /opt/software/slurm/bin/sbatch --job-name test_PR_3007 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 11787

Test results coming soon (I hope)...

- notification for comment with ID 1737516669 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb
  • SUCCESS UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb

Build succeeded for 2 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/b229d817b40e9f0e6e838c06f4751352 for a full test report.

@boegel
Copy link
Member

boegel commented Sep 27, 2023

Test report by @boegel

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 2 (2 easyconfigs in total)
node3157.skitty.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/e9cda3e08b584888f4a38ec738d9a7c1 for a full test report.

@boegel
Copy link
Member

boegel commented Sep 27, 2023

Hmm, looks OK, but still hitting a problem with --sanity-check-only, see last test report.

The problem here is that until the parent sanity_check_step is called, the modules for dependencies are not loaded yet.
We need a workaround like in #2828 (+ #2865) for that to work.

Another approach could indeed be to not rely on get_software_root, but directly query the dependencies via self.cfg.dependencies().

Allows working with --sanity-check-only where the modules may not be
loaded yet
@Flamefire
Copy link
Contributor Author

Another approach could indeed be to not rely on get_software_root, but directly query the dependencies via self.cfg.dependencies().

Done that, looks simpler although now the code in plugins and configure_step is slightly different.

@Flamefire Flamefire requested a review from boegel October 20, 2023 08:15
@akesandgren
Copy link
Contributor

akesandgren commented Oct 23, 2023

ping @boegel Can you test this again?

@Flamefire
Copy link
Contributor Author

BTW: Given that we use dep['name'] for dep in self.cfg.dependencies() already in several places does it maybe make sense to add a property dependency_names to EasyConfig in framework wrapping that? Might be slower for repeated uses compared to storing in a set but makes usage easier

@boegel
Copy link
Member

boegel commented Oct 25, 2023

@Flamefire Yes, makes sense, we should be using that more often since it helps with keeping easyblocks compatible with --module-only and --sanity-check-only.

Are you up for opening a PR for that?

@Flamefire
Copy link
Contributor Author

Flamefire commented Oct 25, 2023

Are you up for opening a PR for that?

easybuilders/easybuild-framework#4360

easybuild/easyblocks/u/ucx_plugins.py Outdated Show resolved Hide resolved
easybuild/easyblocks/u/ucx_plugins.py Outdated Show resolved Hide resolved
@boegel boegel force-pushed the 20230925111141_new_pr_ucx_plugins branch from 5630f1c to 56dbb5d Compare October 27, 2023 10:16
@boegel
Copy link
Member

boegel commented Oct 27, 2023

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb
  • SUCCESS UCX-CUDA-1.11.0-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.5.2.eb
  • SUCCESS UCX-CUDA-1.12.1-GCCcore-11.3.0-CUDA-11.7.0.eb
  • SUCCESS UCX-CUDA-1.13.1-GCCcore-12.2.0-CUDA-12.0.0.eb
  • SUCCESS UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb

Build succeeded for 7 out of 7 (7 easyconfigs in total)
node3100.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/e8bd87c0934af93f979ebdabb301e909 for a full test report.

edit this was using --module-only

@boegel
Copy link
Member

boegel commented Oct 27, 2023

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb
  • SUCCESS UCX-CUDA-1.11.0-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.5.2.eb
  • SUCCESS UCX-CUDA-1.12.1-GCCcore-11.3.0-CUDA-11.7.0.eb
  • SUCCESS UCX-CUDA-1.13.1-GCCcore-12.2.0-CUDA-12.0.0.eb
  • SUCCESS UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb

Build succeeded for 7 out of 7 (7 easyconfigs in total)
node3100.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/f6595f684ca9923af14e60f9944d99b6 for a full test report.

edit: this was using --sanity-check-only

@boegel
Copy link
Member

boegel commented Oct 27, 2023

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS UCX-CUDA-1.10.0-GCCcore-10.3.0-CUDA-11.3.1.eb
  • SUCCESS UCX-CUDA-1.11.0-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.4.1.eb
  • SUCCESS UCX-CUDA-1.11.2-GCCcore-11.2.0-CUDA-11.5.2.eb
  • SUCCESS UCX-CUDA-1.12.1-GCCcore-11.3.0-CUDA-11.7.0.eb
  • SUCCESS UCX-CUDA-1.13.1-GCCcore-12.2.0-CUDA-12.0.0.eb
  • SUCCESS UCX-CUDA-1.14.1-GCCcore-12.3.0-CUDA-12.1.1.eb

Build succeeded for 7 out of 7 (7 easyconfigs in total)
node3100.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/88d49e8fd074e2f8c0bb5ec092ebb4c5 for a full test report.

@boegel boegel merged commit 02e7e05 into easybuilders:develop Oct 27, 2023
47 checks passed
@Flamefire Flamefire deleted the 20230925111141_new_pr_ucx_plugins branch October 28, 2023 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants