Skip to content

Submission container shared memory #1189

@Didayolo

Description

@Didayolo

TODO

  • Add shm-size as a compute_worker .env setting

Use case

PyTorch uses the shared memory to do parallel processing. When compute workers feature a lot of CPUs and RAM, it would be beneficial to be able to increase the shared memory.

Implementation

The --shm-size flag can be used to set the shared memory. By default it is 64MB. The flag is not called when creating the submission container:

https://github.com/codalab/codabench/blob/1c3d36618bc760963d5b1e37a599e1ef3254b481/compute_worker/compute_worker.py#L613C13-L613C13

        engine_cmd = [
            CONTAINER_ENGINE_EXECUTABLE,
            'run',
            # Remove it after run
            '--rm',
            f'--name={self.ingestion_container_name if kind == "ingestion" else self.program_container_name}',

            # Don't allow subprocesses to raise privileges
            '--security-opt=no-new-privileges',

            # Set the volumes
            '-v', f'{self._get_host_path(program_dir)}:/app/program',
            '-v', f'{self._get_host_path(self.output_dir)}:/app/output',
            '-v', f'{self.data_dir}:/app/data:ro',

            # Start in the right directory
            '-w', '/app/program',

            # Don't buffer python output, so we don't lose any
            '-e', 'PYTHONUNBUFFERED=1',
        ]

Manual edit and test

I added '--shm-size', '32g' manually in the script in some workers and it worked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementFeature suggestions and improvementsPost-itInternal ideas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions