Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartSim Singularity Integration #204

Merged
merged 31 commits into from
Jun 9, 2022

Conversation

ben-albrecht
Copy link
Contributor

@ben-albrecht ben-albrecht commented Jun 8, 2022

This PR adds the ability for SmartSim to launch workloads in Singularity (Apptainer) as described in #101. It also lays the groundwork for supporting other container runtimes such as docker, shifter, and podman in the future, as well as launching the orchestrator in a container.

Design Variations

During development, it became clear that a few design changes from the original proposal were required. I have documented them below with their rationale:

1. Argument name: bind_paths -> mount

The terms bind path and mount are mostly used interchangeably across different container runtimes. When writing tests, I kept forgetting if it was bind_path or bind_paths, which hinted to me that it was not a great arg name, so I swapped it out for the more succinct and easy to remember mount.

2. create_run_settings(..., container: str) -> create_run_settings(..., container: Container)

We originally thought it would be easier for users to get started with containers by allowing them to pass a string into create_run_settings(container='singularity') instead of having to create a Container object. While writing tests, I realized that this was potentially very confusing for users because 1) the container arg types change between create_run_settings and RunSettings, and 2) if you need to add other metadata such as container args or container mount paths, you have to switch from using create_run_settings to RunSettings in your code, which is very annoying. Because creating Containers is so lightweight, I think it is best to keep the container interface consistent across all functions that accept them.

3. dropped getter/setter methods

Because command generation and validation happens upon execution, users can freely modify Container.args and Container.mount without getter/setter methods to update any other state. Therefore, I dropped these methods from the interface.

Testing

Added 2 test suites for containers: One for WLM testing and one for local testing. The local testing suite runs in GitHub actions. Due to the added time from building Singularity and pulling the container-testing image, the singularity testing only happens on one configuration of the build matrix: python 3.9 + redisai 1.2.5 on linux

@ben-albrecht ben-albrecht requested a review from ashao June 8, 2022 18:00
@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2022

Codecov Report

Merging #204 (6bee2cf) into develop (4299e7d) will increase coverage by 0.00%.
The diff coverage is 92.18%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #204   +/-   ##
========================================
  Coverage    84.18%   84.18%           
========================================
  Files           58       59    +1     
  Lines         3244     3301   +57     
========================================
+ Hits          2731     2779   +48     
- Misses         513      522    +9     
Impacted Files Coverage Δ
smartsim/experiment.py 80.98% <ø> (ø)
smartsim/settings/containers.py 90.00% <90.00%> (ø)
smartsim/_core/launcher/step/localStep.py 90.00% <100.00%> (+0.71%) ⬆️
smartsim/settings/__init__.py 100.00% <100.00%> (ø)
smartsim/settings/base.py 94.44% <100.00%> (+0.15%) ⬆️
smartsim/settings/settings.py 65.62% <100.00%> (ø)
smartsim/_core/launcher/taskManager.py 79.88% <0.00%> (-2.37%) ⬇️

Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice PR @ben-albrecht . Super excited for this functionality to be in play. There's a few cosmetic changes with the only comment of substance being to limit the WLM to slurm for now OR test to make sure it works on PBS.

I'll submit approval for now so that you can merge in as you want after you add the tests. Please do make sure that the CI tests before merging in.

.github/workflows/run_tests.yml Outdated Show resolved Hide resolved
docker/testing/Dockerfile Outdated Show resolved Hide resolved
smartsim/settings/base.py Outdated Show resolved Hide resolved
smartsim/settings/base.py Outdated Show resolved Hide resolved
smartsim/settings/slurmSettings.py Outdated Show resolved Hide resolved
tests/on_wlm/test_containers_wlm.py Show resolved Hide resolved
tests/on_wlm/test_containers_wlm.py Outdated Show resolved Hide resolved
tests/test_containers.py Outdated Show resolved Hide resolved
tests/on_wlm/test_containers_wlm.py Outdated Show resolved Hide resolved
Ben Albrecht and others added 2 commits June 8, 2022 14:35
---
Signed-off-by: Ben Albrecht <ben-albrecht@users.noreply.github.com>
Ben Albrecht and others added 4 commits June 8, 2022 16:46
---
Signed-off-by: Ben Albrecht <ben-albrecht@users.noreply.github.com>
---
Signed-off-by: Ben Albrecht <ben-albrecht@users.noreply.github.com>
---
Signed-off-by: Ben Albrecht <ben-albrecht@users.noreply.github.com>
Copy link
Member

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ben-albrecht for the changes and for the thoroughness of the tests and implementation. The PR description is also very helpful for us going forward to understand how and why some of the implementation deviated from the original design decisions. They are all fully justified and make sense to me.

@ben-albrecht ben-albrecht merged commit 1e686ea into CrayLabs:develop Jun 9, 2022
@ben-albrecht ben-albrecht deleted the singularity branch June 9, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants