Skip to content

update debugging page #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

laraPPr
Copy link
Collaborator

@laraPPr laraPPr commented Jun 30, 2025

No description provided.

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think we should restructure this entire page.

The first approach to debugging (since we have EESSI-extend) is just to try and reproduce the issue interactively, with EESSI-extend, without using the EESSI container. I think this reflects what most of us do in practice, since it is the least effort. It would be something like:

  • If your failure is architecture specific, get your hands on a node of that architecture
  • Make sure EESSI is available
  • Load EESSI-extend
  • If your PR is not using the same EasyBuild version as loaded by EESSI-extend: swap EB versions
  • Clone your feature branch & pass the easystack file from your feature branch to EasyBuild (--easystack) or simply manually write out the argumenst from the easystack file yourself (e.g. eb <myapp.eb> --from-commit <commit_sha> if you're building from a commit)
  • If you have altered the eb_hooks.py for your build, pass your altered hooks file to the --hooks argument

This is much simpler than our current debugging instructions. And honestly, sufficient for 95% of the issues I see.

If, and only if, you cannot reproduce your issue this way, should you proceed to mimicking the full container workflow. The only issues I've typically caught this way are issues related to e.g. fuse-overlay (tests that time out because the overlay filesystem is slower than the tests expect, or something like that). For this approach, we should either

  • Update the "Starting a shell in the EESSI container" section, because there are now many more arguments that are typically passed to eessi_contianer.sh during a build.

or

  • Remove the specific instructions related to the container invocation altogether, since they get outdated so fast. In this case, we should just refer people to the build logs (problem is though that external people can't read those).

Just to show the complexity of the current-day container invocation commands:

eessi_container.sh --verbose --access rw --mode run --container docker://ghcr.io/eessi/build-node:debian12 --repository eessi.io-2023.06-software --extra-bind-paths /project/60006/SHARED/jobs/2025.07/pr_39/event_2e6c9c00-6279-11f0-9e4e-62951e5c9cf4/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software,/dev --pass-through --contain --save /project/60006/SHARED/jobs/2025.07/pr_39/event_2e6c9c00-6279-11f0-9e4e-62951e5c9cf4/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/previous_tmp/build_step --storage /tmp/bot/EESSI/eessi_job.8V4He6n0qg --nvidia install --host-injections /project/def-users/bot/shared/host-injections

Honestly, keeping those up to date might be more work than for one of us to just have a look in case someone can not reproduce with EESSI-extend. I for sure don't see myself making time to update those docs for the new arguments (pass-through, contain, extra-bind-paths ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants