Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update known issue about lmod hook in host-injection #183

Merged
merged 5 commits into from
Jun 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion docs/known_issues/eessi-2023.06.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

<p>This is an error that occurs with OpenMPI after updating to OFED 23.10.</p>

<p>Their is an upstream issue on this problem opened with EasyBuild.
<p>There is an upstream issue on this problem opened with EasyBuild.
See: https://github.com/easybuilders/easybuild-easyconfigs/issues/20233</p>

<b>Workarounds</b>
Expand All @@ -26,3 +26,17 @@ export OMPI_MCA_pml='ucx'
export OMPI_MCA_mtl='^ofi'
```
</div>

### `Bug in EESSI initialization and priority mechanisms: site OpenMPI or UCX not loaded`
Copy link
Collaborator

@casparvl casparvl Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are two seperate issues, right? I mean #456 was about site-specific tuning in general, but it only became an 'issue' when people hit the Failed-to-modify error. Shouldn't we just add to the "Failed to modify UD QP to INIT... etc" that the OMPI_MCA_* environment variables can be done in host_injections/...?

I was browsing the docs, I think we should have some general explaination of host_injections under Advanced usage. I.e. that this directory is used for site specific tuning. And then, in a paragraph there, we can add how it can be used to execute LMOD hooks for site specific tuning. We can give an example there of a hook (might as well use the example of setting these three OpenMPI environment variables). It's a good point, I wanted to make this documentation, but didn't get to it before my leave. Writing it as a todo for this afternoon... We can then refer to that from the "Failed to modify UD QP to INIT" known issue and give the code for the LMOD hook that can be used as a workaround for this particular issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs PR for Lmod hooks made here. I'd simply put an example hook to resolve this issue under the existing Failed to modify UD QP to INIT on mlx5_0: Operation not permitted header, and reference the new docs on Lmod hooks for further information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casparvl Adjusted on the known issues pages. Could you review it and if everything is okay, merge it?

<div style="padding-left: 30px;">

<p>This error may occur when bugs resolving or site-specific tuning is needed for OpenMPI or UCX.</p>

<p>There is an issue on this problem opened with EESSI software layer repository.
See: https://github.com/EESSI/software-layer/issues/456</p>

<b>Workarounds</b>

<p>The workaround is to specify site properties and allow defining lmod hooks in host injections (see https://github.com/EESSI/software-layer/pull/525).
</div>

Loading