Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pinning of the co-located database #306

Merged
merged 16 commits into from
Jul 1, 2023

Conversation

ashao
Copy link
Member

@ashao ashao commented Jun 21, 2023

This updates the strategy for adding a co-located deployment for simulation and database. Prior, the co-located database was always pinned to the last N logical processors on a machine. In the case of a Slurm machine, user-specified bind settings were being overwritten leading to a general inflexibility for users trying to maximize performance. Additionally, limit_app_cpus was not working as intended because only the launcher command was being pinned and not the application launched by the launcher command.

The changes here provide two options, limit_db_cpus and db_cpu_list to control the pinning of the co-located database. limit_db_cpus enables the pinning using taskset and db_cpu_list is a user-provided string that specifies which processors the orchestrator should be bound to. If db_cpu_list is not provided, the database is automatically pinned to the first N logical processors.

@ashao ashao requested review from MattToast and mellis13 June 21, 2023 23:13
Copy link
Contributor

@mellis13 mellis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments as you work on getting the CI/CD working

smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
doc/orchestrator.rst Outdated Show resolved Hide resolved
tests/on_wlm/test_colocated_model.py Outdated Show resolved Hide resolved
tests/test_colo_model_local.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Merging #306 (d4dbb41) into develop (f743d71) will increase coverage by 0.33%.
The diff coverage is 95.12%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #306      +/-   ##
===========================================
+ Coverage    86.78%   87.12%   +0.33%     
===========================================
  Files           59       59              
  Lines         3482     3518      +36     
===========================================
+ Hits          3022     3065      +43     
+ Misses         460      453       -7     
Impacted Files Coverage Δ
smartsim/entity/model.py 95.89% <94.28%> (-0.51%) ⬇️
smartsim/_core/launcher/colocated.py 94.44% <100.00%> (+2.22%) ⬆️
smartsim/_core/launcher/step/step.py 100.00% <100.00%> (+2.70%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave some initial thoughts in a quick first pass while the cli was getting sorted out; lmk what you think!

conftest.py Outdated Show resolved Hide resolved
conftest.py Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
smartsim/_core/launcher/colocated.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
tests/conftest/setup_test_colo/colocated_model.out Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more initial thoughts for you while we sort out the CI

smartsim/_core/launcher/step/step.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/_core/launcher/colocated.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
smartsim/_core/launcher/step/step.py Outdated Show resolved Hide resolved
tests/on_wlm/test_colocated_model.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Outdated Show resolved Hide resolved
tests/test_colo_model_local.py Outdated Show resolved Hide resolved
smartsim/entity/model.py Show resolved Hide resolved
Copy link
Member

@MattToast MattToast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending CI and Docs)!! Thanks for all the hard-work on this one!

"--loadmodule",
CONFIG.redisai
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh perfectly styled! :)

@ashao ashao dismissed mellis13’s stale review July 1, 2023 00:14

Questions were addressed offline and current implementation was carefully checked with Toast

@ashao ashao merged commit 6a57e1e into CrayLabs:develop Jul 1, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants