TODO
Use case
PyTorch uses the shared memory to do parallel processing. When compute workers feature a lot of CPUs and RAM, it would be beneficial to be able to increase the shared memory.
Implementation
The --shm-size flag can be used to set the shared memory. By default it is 64MB. The flag is not called when creating the submission container:
https://github.com/codalab/codabench/blob/1c3d36618bc760963d5b1e37a599e1ef3254b481/compute_worker/compute_worker.py#L613C13-L613C13
engine_cmd = [
CONTAINER_ENGINE_EXECUTABLE,
'run',
# Remove it after run
'--rm',
f'--name={self.ingestion_container_name if kind == "ingestion" else self.program_container_name}',
# Don't allow subprocesses to raise privileges
'--security-opt=no-new-privileges',
# Set the volumes
'-v', f'{self._get_host_path(program_dir)}:/app/program',
'-v', f'{self._get_host_path(self.output_dir)}:/app/output',
'-v', f'{self.data_dir}:/app/data:ro',
# Start in the right directory
'-w', '/app/program',
# Don't buffer python output, so we don't lose any
'-e', 'PYTHONUNBUFFERED=1',
]
Manual edit and test
I added '--shm-size', '32g' manually in the script in some workers and it worked.
TODO
shm-sizeas a compute_worker.envsettingUse case
PyTorch uses the shared memory to do parallel processing. When compute workers feature a lot of CPUs and RAM, it would be beneficial to be able to increase the shared memory.
Implementation
The
--shm-sizeflag can be used to set the shared memory. By default it is 64MB. The flag is not called when creating the submission container:https://github.com/codalab/codabench/blob/1c3d36618bc760963d5b1e37a599e1ef3254b481/compute_worker/compute_worker.py#L613C13-L613C13
Manual edit and test
I added
'--shm-size', '32g'manually in the script in some workers and it worked.