Skip to content

compiler: Make nbytes available mapper aware of visible devices environment variables#2746

Merged
FabioLuporini merged 11 commits intomainfrom
deviceid
Oct 6, 2025
Merged

compiler: Make nbytes available mapper aware of visible devices environment variables#2746
FabioLuporini merged 11 commits intomainfrom
deviceid

Conversation

@EdCaunt
Copy link
Contributor

@EdCaunt EdCaunt commented Sep 26, 2025

Still needs tests.

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, but looks fine

def visible_devices(self):
device_vars = (
'CUDA_VISIBLE_DEVICES',
'ROCR_VISIBLE_DEVICES',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCR or ROCM

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jeez 😂 ...

OK, look, can you add ROCM_VISIBLE_DEVICES too? They might add it in the future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not currently an option, so I don't think that's a good idea. If the user accidentally set it then you may get unexpected, hard-to-debug behaviours since Devito will act as if devices have been set, but the ROCM runtime will not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this could be a util inside arch/archinfo rather than a private method since AFAICT it's totally unrelated to self itself

@codecov
Copy link

codecov bot commented Sep 26, 2025

Codecov Report

❌ Patch coverage is 72.34043% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.06%. Comparing base (c207ded) to head (35304b9).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
tests/test_gpu_common.py 59.18% 20 Missing ⚠️
devito/operator/operator.py 78.57% 2 Missing and 1 partial ⚠️
devito/parameters.py 90.00% 1 Missing and 1 partial ⚠️
devito/arch/archinfo.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2746      +/-   ##
==========================================
- Coverage   83.06%   83.06%   -0.01%     
==========================================
  Files         248      248              
  Lines       50356    50438      +82     
  Branches     4432     4437       +5     
==========================================
+ Hits        41830    41897      +67     
- Misses       7768     7782      +14     
- Partials      758      759       +1     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX 68.74% <72.34%> (-0.01%) ⬇️
pytest-gpu-nvc-nvidiaX 69.29% <72.34%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

op = Operator(eq)

argmap = op.arguments()
# deviceid should see the world from within CUDA_VISIBLE_DEVICES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the wanted behavior. DEVITO_DEVICEID is the id of the device not the index withing the devices so this isn't compatible with what the configuration does and might lead to problems for users currently using deviceid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked current main and this is the existent behaviour of DEVITO_DEVICEID in combination with CUDA_VISIBLE_DEVICES. If you set CUDA_VISIBLE_DEVICES="1,2" and DEVITO_DEVICEID=1, the kernel will run on device 2, due to how devices appear to CUDA programs inside an environment with CUDA_VISIBLE_DEVICES set. Essentially for anything other than nvidia-smi, the visible devices appear renumbered from zero. See here for why this is the case.

As to whether this is the wanted behaviour, I would argue it is. Consider a scheduler which runs a job with two available GPUs out of a total four. This is presumably achieved under the hood with CUDA_VISIBLE_DEVICES. In that job a Devito script setting deviceid=0 is used. The intuitive behaviour of that script would be to use the first device available to the job. This would be device 0 so far as the job is concerned, albeit not necessarily device 0 on the whole node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree

assert argmap1._physical_deviceid == 1

# Make sure the switchenv doesn't somehow persist
for i in ("CUDA", "ROCR", "HIP"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are likely to break on most systems. You need to fetch it before switchenv then check it's reverted to the orginal one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think I should actually split this out into a specific test for the switchenv class. It was mainly in anticipation of a possible silent failure route

rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0

logical_deviceid = max(self.get('deviceid', 0), 0) + rank
if self._visible_devices is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be simpler with self._visible_devices.get(logical_deviceid, logical_deviceid) and just have _visible_devices return {}

for v in device_vars:
try:
return tuple(int(i) for i in os.environ[v].split(','))
except (ValueError, KeyError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be split?

  • ValueError -> there is an id so set it to os.environ[v].split(',').index(i)
  • KeyError -> no env var, no device

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError would be expected to coincide with device UUIDs set in CUDA_VISIBLE_DEVICES which aren't (and were not previously) parsed into integer IDs. This one should probably raise at least a warning to mention that the UUID is being ignored.

Alternatively, the UUID -> integer ID mapping could potentially be reverse-engineered from nvidia-smi somewhere in the device sniffing, but that may be overkill. Another approach would be to widen support for device UUIDs, but this, again, might be overkill.

# Get the physical device ID (as CUDA_VISIBLE_DEVICES may be set)
rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0

logical_deviceid = max(self.get('deviceid', 0), 0) + rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, currently lots of user pass a deviceid per rank already this is gonna make it into an id that doesn't exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hmm, should it be:

logical_deviceid = max(self.get('deviceid', rank), 0)

then?

Currently if you just leave it as the default value, it checks available memory on the first device for every rank.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be self.get('deviceid', max(rank, 0)) and keep user input untouched

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think desired behaviour would be:

  • If user sets deviceid=-1, use the MPI rank
  • If user sets an nonnegative integer deviceid, use it with no modification
  • If default value obtained, or no deviceid found, use the MPI rank

So:

logical_deviceid = self.get('deviceid', -1)
if logical_deviceid < 0:
    logical_deviceid = rank

def _physical_deviceid(self):
if isinstance(self.platform, Device):
# Get the physical device ID (as CUDA_VISIBLE_DEVICES may be set)
rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Get_rank return zero for COMM_NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears not:

>>> MPI.COMM_NULL.Get_rank()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/mpi4py/MPI.src/Comm.pyx", line 110, in mpi4py.MPI.Comm.Get_rank
mpi4py.MPI.Exception: MPI_ERR_COMM: invalid communicator

op = Operator(eq)

argmap = op.arguments()
# deviceid should see the world from within CUDA_VISIBLE_DEVICES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree

@FabioLuporini FabioLuporini merged commit a04b3af into main Oct 6, 2025
36 checks passed
@FabioLuporini FabioLuporini deleted the deviceid branch October 6, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants