Summary
When process_batch in src/sampleworks/eval/generate_synthetic_sf.py is run with n_jobs > 1 and a CUDA device, multiple loky worker processes each initialise their own CUDA context, competing for GPU memory. This can cause OOM errors or severe performance degradation.
Background
Raised during review of PR #234 (comment: #234 (comment)).
Even with multiple workers, jobs likely converge on one or a small number of GPUs, so a more targeted fix may involve specifying the device more precisely per worker rather than simply forcing n_jobs=1.
Suggested approaches
- Detect
device.type == "cuda" and, depending on available memory, either warn and cap n_jobs or assign each worker an explicit GPU device index.
- Consider exposing a per-worker device assignment strategy.
Requested by
@marcuscollins
Summary
When
process_batchinsrc/sampleworks/eval/generate_synthetic_sf.pyis run withn_jobs > 1and a CUDA device, multiplelokyworker processes each initialise their own CUDA context, competing for GPU memory. This can cause OOM errors or severe performance degradation.Background
Raised during review of PR #234 (comment: #234 (comment)).
Even with multiple workers, jobs likely converge on one or a small number of GPUs, so a more targeted fix may involve specifying the device more precisely per worker rather than simply forcing
n_jobs=1.Suggested approaches
device.type == "cuda"and, depending on available memory, either warn and capn_jobsor assign each worker an explicit GPU device index.Requested by
@marcuscollins