compiler: Make nbytes available mapper aware of visible devices environment variables#2746
compiler: Make nbytes available mapper aware of visible devices environment variables#2746FabioLuporini merged 11 commits intomainfrom
Conversation
FabioLuporini
left a comment
There was a problem hiding this comment.
minor comments, but looks fine
devito/operator/operator.py
Outdated
| def visible_devices(self): | ||
| device_vars = ( | ||
| 'CUDA_VISIBLE_DEVICES', | ||
| 'ROCR_VISIBLE_DEVICES', |
There was a problem hiding this comment.
There was a problem hiding this comment.
jeez 😂 ...
OK, look, can you add ROCM_VISIBLE_DEVICES too? They might add it in the future...
There was a problem hiding this comment.
It's not currently an option, so I don't think that's a good idea. If the user accidentally set it then you may get unexpected, hard-to-debug behaviours since Devito will act as if devices have been set, but the ROCM runtime will not
There was a problem hiding this comment.
btw this could be a util inside arch/archinfo rather than a private method since AFAICT it's totally unrelated to self itself
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2746 +/- ##
==========================================
- Coverage 83.06% 83.06% -0.01%
==========================================
Files 248 248
Lines 50356 50438 +82
Branches 4432 4437 +5
==========================================
+ Hits 41830 41897 +67
- Misses 7768 7782 +14
- Partials 758 759 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| op = Operator(eq) | ||
|
|
||
| argmap = op.arguments() | ||
| # deviceid should see the world from within CUDA_VISIBLE_DEVICES |
There was a problem hiding this comment.
I don't think that is the wanted behavior. DEVITO_DEVICEID is the id of the device not the index withing the devices so this isn't compatible with what the configuration does and might lead to problems for users currently using deviceid
There was a problem hiding this comment.
Just checked current main and this is the existent behaviour of DEVITO_DEVICEID in combination with CUDA_VISIBLE_DEVICES. If you set CUDA_VISIBLE_DEVICES="1,2" and DEVITO_DEVICEID=1, the kernel will run on device 2, due to how devices appear to CUDA programs inside an environment with CUDA_VISIBLE_DEVICES set. Essentially for anything other than nvidia-smi, the visible devices appear renumbered from zero. See here for why this is the case.
As to whether this is the wanted behaviour, I would argue it is. Consider a scheduler which runs a job with two available GPUs out of a total four. This is presumably achieved under the hood with CUDA_VISIBLE_DEVICES. In that job a Devito script setting deviceid=0 is used. The intuitive behaviour of that script would be to use the first device available to the job. This would be device 0 so far as the job is concerned, albeit not necessarily device 0 on the whole node.
tests/test_gpu_common.py
Outdated
| assert argmap1._physical_deviceid == 1 | ||
|
|
||
| # Make sure the switchenv doesn't somehow persist | ||
| for i in ("CUDA", "ROCR", "HIP"): |
There was a problem hiding this comment.
These are likely to break on most systems. You need to fetch it before switchenv then check it's reverted to the orginal one
There was a problem hiding this comment.
Good point. I think I should actually split this out into a specific test for the switchenv class. It was mainly in anticipation of a possible silent failure route
devito/operator/operator.py
Outdated
| rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0 | ||
|
|
||
| logical_deviceid = max(self.get('deviceid', 0), 0) + rank | ||
| if self._visible_devices is not None: |
There was a problem hiding this comment.
would be simpler with self._visible_devices.get(logical_deviceid, logical_deviceid) and just have _visible_devices return {}
devito/operator/operator.py
Outdated
| for v in device_vars: | ||
| try: | ||
| return tuple(int(i) for i in os.environ[v].split(',')) | ||
| except (ValueError, KeyError): |
There was a problem hiding this comment.
Should these be split?
- ValueError -> there is an id so set it to
os.environ[v].split(',').index(i) - KeyError -> no env var, no device
There was a problem hiding this comment.
ValueError would be expected to coincide with device UUIDs set in CUDA_VISIBLE_DEVICES which aren't (and were not previously) parsed into integer IDs. This one should probably raise at least a warning to mention that the UUID is being ignored.
Alternatively, the UUID -> integer ID mapping could potentially be reverse-engineered from nvidia-smi somewhere in the device sniffing, but that may be overkill. Another approach would be to widen support for device UUIDs, but this, again, might be overkill.
devito/operator/operator.py
Outdated
| # Get the physical device ID (as CUDA_VISIBLE_DEVICES may be set) | ||
| rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0 | ||
|
|
||
| logical_deviceid = max(self.get('deviceid', 0), 0) + rank |
There was a problem hiding this comment.
Wait, currently lots of user pass a deviceid per rank already this is gonna make it into an id that doesn't exist
There was a problem hiding this comment.
Ah, hmm, should it be:
logical_deviceid = max(self.get('deviceid', rank), 0)then?
Currently if you just leave it as the default value, it checks available memory on the first device for every rank.
There was a problem hiding this comment.
I think should be self.get('deviceid', max(rank, 0)) and keep user input untouched
There was a problem hiding this comment.
I think desired behaviour would be:
- If user sets
deviceid=-1, use the MPI rank - If user sets an nonnegative integer
deviceid, use it with no modification - If default value obtained, or no deviceid found, use the MPI rank
So:
logical_deviceid = self.get('deviceid', -1)
if logical_deviceid < 0:
logical_deviceid = rank
devito/operator/operator.py
Outdated
| def _physical_deviceid(self): | ||
| if isinstance(self.platform, Device): | ||
| # Get the physical device ID (as CUDA_VISIBLE_DEVICES may be set) | ||
| rank = self.comm.Get_rank() if self.comm != MPI.COMM_NULL else 0 |
There was a problem hiding this comment.
Doesn't Get_rank return zero for COMM_NULL?
There was a problem hiding this comment.
It appears not:
>>> MPI.COMM_NULL.Get_rank()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/mpi4py/MPI.src/Comm.pyx", line 110, in mpi4py.MPI.Comm.Get_rank
mpi4py.MPI.Exception: MPI_ERR_COMM: invalid communicator| op = Operator(eq) | ||
|
|
||
| argmap = op.arguments() | ||
| # deviceid should see the world from within CUDA_VISIBLE_DEVICES |
Still needs tests.