Skip to content

Conversation

@ericjbohm
Copy link
Contributor

Add support for 8 NICs as found on Aurora.

@ericjbohm ericjbohm added this to the 8.0.1 milestone Nov 12, 2024
@ericjbohm ericjbohm self-assigned this Nov 12, 2024
Copy link
Contributor

@lvkale lvkale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a straightforward change, but since I have not looked at this code before:
does it handle only those 2 cases (1, 4 and 8)? I see the comment in code line 842 (near the assert) address this, but I am curious. I guess we will find out when the assert fails. What about cloud environments (or similar features coming on supercomputers) where a single node is allocated to multiple jobs, which each job presumably getting a subset of NICs.

@ericjbohm
Copy link
Contributor Author

This looks like a straightforward change, but since I have not looked at this code before: does it handle only those 2 cases (1, 4 and 8)? I see the comment in code line 842 (near the assert) address this, but I am curious. I guess we will find out when the assert fails. What about cloud environments (or similar features coming on supercomputers) where a single node is allocated to multiple jobs, which each job presumably getting a subset of NICs.

The four NIC case is special because the optimal ordering on machines like Frontier is linked to both the GPU and NIC, and a simple in order mapping [0,1,2,3] will be suboptimal. So, really the 8 cpu case is one that could probably be handled by a general approach with a carve out for 4. But there is only one such known configuration.

@ericjbohm ericjbohm requested review from adityapb, lvkale, mayantaylor and ritvikrao and removed request for adityapb, mayantaylor and ritvikrao February 13, 2025 18:05
@ericjbohm ericjbohm merged commit 2733e93 into main Feb 19, 2025
22 checks passed
@ritvikrao ritvikrao deleted the ofi-cxi-aurora branch June 10, 2025 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants