Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update known issue about lmod hook in host-injection #183

Merged
merged 5 commits into from
Jun 25, 2024

Conversation

xinan1911
Copy link
Collaborator

No description provided.

@@ -26,3 +26,17 @@ export OMPI_MCA_pml='ucx'
export OMPI_MCA_mtl='^ofi'
```
</div>

### `Bug in EESSI initialization and priority mechanisms: site OpenMPI or UCX not loaded`
Copy link
Collaborator

@casparvl casparvl Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are two seperate issues, right? I mean #456 was about site-specific tuning in general, but it only became an 'issue' when people hit the Failed-to-modify error. Shouldn't we just add to the "Failed to modify UD QP to INIT... etc" that the OMPI_MCA_* environment variables can be done in host_injections/...?

I was browsing the docs, I think we should have some general explaination of host_injections under Advanced usage. I.e. that this directory is used for site specific tuning. And then, in a paragraph there, we can add how it can be used to execute LMOD hooks for site specific tuning. We can give an example there of a hook (might as well use the example of setting these three OpenMPI environment variables). It's a good point, I wanted to make this documentation, but didn't get to it before my leave. Writing it as a todo for this afternoon... We can then refer to that from the "Failed to modify UD QP to INIT" known issue and give the code for the LMOD hook that can be used as a workaround for this particular issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs PR for Lmod hooks made here. I'd simply put an example hook to resolve this issue under the existing Failed to modify UD QP to INIT on mlx5_0: Operation not permitted header, and reference the new docs on Lmod hooks for further information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casparvl Adjusted on the known issues pages. Could you review it and if everything is okay, merge it?

@laraPPr
Copy link
Collaborator

laraPPr commented Jun 12, 2024

@xinan1911 could you also look at mkdocs.yml ? because the menu now has v2022/02 and pilot but their are no know issues on those repositories and no pages for those repos.

@xinan1911
Copy link
Collaborator Author

@xinan1911 could you also look at mkdocs.yml ? because the menu now has v2022/02 and pilot but their are no know issues on those repositories and no pages for those repos.

Adjusted! Those pages are removed in the mkdocs

@laraPPr
Copy link
Collaborator

laraPPr commented Jun 19, 2024

@Xin the CI is failing because you behind on main

Co-authored-by: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com>
@xinan1911
Copy link
Collaborator Author

xinan1911 commented Jun 19, 2024

@Xin the CI is failing because you behind on main

@laraPPr
The build error message shows

INFO    -  Doc file 'test-suite/ReFrame-configuration-file.md' contains a link '#RFM_CONFIG_FILES', but there is no such anchor on this page.

INFO    -  Doc file 'test-suite/ReFrame-configuration-file.md' contains a link '#RFM_PREFIX', but there is no such anchor on this page.

INFO    -  Doc file 'test-suite/installation-configuration.md' contains a link '#reframe-config-file', but there is no such anchor on this page.

INFO    -  Doc file 'test-suite/installation-configuration.md' contains a link '#logging', but there is no such anchor on this page.

INFO    -  Doc file 'test-suite/release-notes.md' contains a link 'installation-configuration.md#reframe-config-file', but the doc 'test-suite/installation-configuration.md' does not contain an anchor '#reframe-config-file'.

INFO    -  Doc file 'test-suite/release-notes.md' contains a link 'installation-configuration.md#logging', but the doc 'test-suite/installation-configuration.md' does not contain an anchor '#logging'.

INFO    -  Doc file 'test-suite/release-notes.md' contains a link 'installation-configuration.md#partitions', but the doc 'test-suite/installation-configuration.md' does not contain an anchor '#partitions'.

INFO    -  Doc file 'test-suite/release-notes.md' contains a link 'usage.md#gromacs', but the doc 'test-suite/usage.md' does not contain an anchor '#gromacs'.

INFO    -  Doc file 'test-suite/release-notes.md' contains a link 'usage.md#tensorflow', but the doc 'test-suite/usage.md' does not contain an anchor '#tensorflow'.

INFO    -  Doc file 'test-suite/usage.md' contains a link '#Configuring-ReFrame', but there is no such anchor on this page.

INFO    -  Doc file 'test-suite/usage.md' contains a link 'installation-configuration.md#logging', but the doc 'test-suite/installation-configuration.md' does not contain an anchor '#logging'

Caspar has already fixed these in his PR https://github.com/EESSI/docs/pull/188/files#diff-679f410211edd9a310548400e25680c354062d32ab8a7df0c963d42ac85d2da8 which hasn't been merged yet. Maybe his PR can be merged first before this PR.

@laraPPr
Copy link
Collaborator

laraPPr commented Jun 20, 2024

I was referring to these warnings

WARNING -  A reference to 'available_software/overview.md' is included in the 'nav' configuration, which is not found in the documentation files.
WARNING -  A reference to 'getting_access/eessi_wsl.md' is included in the 'nav' configuration, which is not found in the documentation files.
WARNING -  A reference to 'getting_access/eessi_limactl.md' is included in the 'nav' configuration, which is not found in the documentation files.

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@casparvl casparvl merged commit 78134c3 into main Jun 25, 2024
2 checks passed
@bedroge bedroge deleted the xinan1911-patch-1 branch June 25, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants