Submission container shared memory

### TODO

- [ ] Add `shm-size` as a compute_worker `.env` setting


### Use case

PyTorch uses the shared memory to do parallel processing. When compute workers feature a lot of CPUs and RAM, it would be beneficial to be able to increase the shared memory.

### Implementation

The `--shm-size` flag can be used to set the shared memory. By default it is 64MB. The flag is not called when creating the submission container:

https://github.com/codalab/codabench/blob/1c3d36618bc760963d5b1e37a599e1ef3254b481/compute_worker/compute_worker.py#L613C13-L613C13

```python
        engine_cmd = [
            CONTAINER_ENGINE_EXECUTABLE,
            'run',
            # Remove it after run
            '--rm',
            f'--name={self.ingestion_container_name if kind == "ingestion" else self.program_container_name}',

            # Don't allow subprocesses to raise privileges
            '--security-opt=no-new-privileges',

            # Set the volumes
            '-v', f'{self._get_host_path(program_dir)}:/app/program',
            '-v', f'{self._get_host_path(self.output_dir)}:/app/output',
            '-v', f'{self.data_dir}:/app/data:ro',

            # Start in the right directory
            '-w', '/app/program',

            # Don't buffer python output, so we don't lose any
            '-e', 'PYTHONUNBUFFERED=1',
        ]
```

### Manual edit and test

I added `'--shm-size', '32g'` manually in the script in some workers and it worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission container shared memory #1189

TODO

Use case

Implementation

Manual edit and test

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Submission container shared memory #1189

Description

TODO

Use case

Implementation

Manual edit and test

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions