-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis starter script for Orchestrator #73
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: Eric Gustin <eric.gustin@hpe.com>
SmartRedis was prematurely updated to 0.2.0 that has been reverted Also added back in is redis-py-cluster and redis-py for the orchestrator is_active and check_cluster_status functions.
Since the redis_starter.py script provides the IP address given a network interface specified by the user, we no longer need the RedisIP module built at startup. The IP address is now parsed from the output of that script.
Prefixed SmartSim hyperlink targets in order to avoid duplicate target name conflicts with SmartRedis
Would it be possible to add an interface to be used for tests as we do for launchers and accounts? So that a user could do something along the lines of (also: it might already be in the code base and I might have just overlooked it) |
The database now supports binding to a network interface through the interface argument on the Orchestrator init this commit adds SMARTSIM_TEST_INTERFACE to the config so that the tests can switch interfaces between machines
Edit Experiment API Edit Entity API Edit Orchestrator API"
Edit RunSettings Edit Slurm
Some slurm systems with infiniband networks cause problems with the socket library IP lookup due to the listing in /etc/hosts to get around this, the slurm launcher is no longer responsible for obtaining hostnames for the database. SmartSim relies on the redis_starter script and output files for all launchers now (was the case for all except slurm before)
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area: orchestrator
Issues related to the Ochestrator API, launch, and runtime
type: feature
Issues that include feature request or feature idea
type: refactor
Issues focused on refactoring existing code
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently it was discovered that redis will often change the IP address of a shard unless it is bound to a specific IP address.
To address this, each Orchestrator has been adapted to use the
redis_starter.py
script that binds the redis instance to a specific address. Essentially this is a Python script that takes in a command and does a lookup on the network interface specified by the user in order to bind the server to a specific address.Also in this PR is a change back to redis-py and redis-py-cluster for the orchestrator functions
is_active
andcheck_cluster_status
. This was after a bug was discovered in clang/pybind/cray PrgEnv. As there isn't much we can do about the bug at the moment we now require both dependencies in the library.TODO
All tests pass (gpu and cpu) on Horizon and MacOS.