-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config filter parameter to simulate_factors() #14
Conversation
Add a new optional `only_factor` parameter for the simulate_factors() method. The only_factor parameter is specified, only a single simulation will be run. The simulation that runs is for the factor at the index specified by the value of the only_factor argument. For example, if the snowbird command-line arguments look like this: meta.cmd_line: sb -f host.queue_depth 4,16 -f fm.fim.xfer_rate_MBps 400,800 -f fm.ase.t_wait_rd 4,5,6 -s duration 10 ms --only-factor 4 Then snowbird will pass in only_factor=4 as the argument to desmod and the meta.sim.index and meta.sim.workspace will be set to 4 and workspace/4 respectively. Additionally, meta.sim.special will look like this: meta.sim.special: - [host.queue_depth, 4] - [fm.fim.xfer_rate_MBps, 800] - [fm.ase.t_wait_rd, 5] That is, only a single simulation, for the factor indicated by meta.sim.special, was run. One use for this new feature is orchestrating the execution of multiple, per-factor, simulations on separate machines (e.g., in the cloud) with a single command (with multiple factors).
Some new config options have been added: - sim.workspace.s3_sync: When true, syncs any workspaces to s3 - sim.workspace.s3_bucket: The s3 bucket to sync to - sim.workspace.s3_sync_prefix: A string to act as the root path in the s3 bucket. By default, this is set the user's username. Here's an example of running snowbird with these new options: $ sb -s sim.workspace ~/locker/test -s sim.workspace.s3_sync_prefix bgmerrell -n test-rd-rnd -s sim.duration 10ms -s sim.workspace.s3_sync True -f host.queue_depth 4,16 This would result in the following being synced to the default bucket in s3: /bgmerrell/test/0/config.yaml /bgmerrell/test/0/result.yaml /bgmerrell/test/0/sim.log /bgmerrell/test/1/config.yaml /bgmerrell/test/1/result.yaml /bgmerrell/test/1/sim.log The method to do the sync work is called when the _Workspace context manager is exiting. The uploading to s3 is done using the concurrent.features package. This package isn't available in the Python 2 library, so the requirements.txt file has been updated to install the package for Python 2 only.
2 similar comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this capability, but there are several wrinkles to smooth out.
desmod/simulation.py
Outdated
os.chdir(self.orig_dir) | ||
|
||
def sync(self): | ||
if self.config['sim.workspace.s3_sync']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within desmod, we want to avoid fatal effects if the user provides an under-specified config. And when the user under-specifies and desmod uses a default, we want the config dict to be updated to reflect the default that desmod used--i.e. desmod's implicit defaults become explicit.
To that end, we should use self.config.setdefault()
to interrogate the config dict here.
W.r.t. the config key name, I also advocate that it be sim.workspace.s3_sync.enable
. The ".enable" suffix has a lot of precedence in other config keys in the "sim" namespace. We could probably make a good case for some other naming strategies:
self.config.setdefault('sim.workspace.s3_sync.enable', False)
self.config.setdefault('sim.s3_sync.enable', False)
self.config.setdefault('sim.sync.s3.enable', False)
If we are going to have other workspace synchronization methods, then the last option creats a clean namespace.
N.B. that the two other new config items should fit into this scheme. E.g.:
sim.sync.s3.bucket
sim.sync.s3.prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.config.setdefault('sim.sync.s3.enable', False)
Cool. My original approach was to use sim.workspace.sync.s3.enable
, but I changed my mind because I thought it might match existing patterns better to be able to simply specify -s s3_sync True
on the snowbird command-line. I'm happy to be wrong on this one.
One question: do you I should including "workspace" in sim.workspace.sync.s3.enable
now that the syncing takes place in _Workspace
, or is that an unnecessary coupling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bias is to use sim.sync.s3.
as the base namespace. In addition to being decoupled from sim.workspace
, it also ends up being less typing and a shallower hierarchy--both positives.
desmod/workspacesync/s3.py
Outdated
@@ -0,0 +1,59 @@ | |||
from concurrent.futures import ThreadPoolExecutor | |||
|
|||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should defer importing boto3 until we know that S3 synchronization is enabled. That allows boto3 to be an optional dependency as opposed to a required dependency. The same holds for concurrent.futures.
My bias is that most people most of the time will not be using the S3 sync feature and thus keeping the requirements list shorter outweighs the additional burden this would put on users who do want to use S3 sync.
Note that we do deferred import (non import-time import) of the progressbar
library as well.
desmod/simulation.py
Outdated
artifacts = [] | ||
for root, _, files in os.walk(os.curdir, topdown=False): | ||
for filename in files: | ||
artifacts.append(os.path.join(root, filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little curious that _Workspace
has responsibility for walking the artifacts versus the S3Sync
taking that on itself. Is the idea that if we have multiple synchronization backends that we could reuse the same artifacts list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had this logic in workspacesync/s3.py
and moved it into _Workspace for that exact reason you asked about. Having the sync backends using the same artifacts input is nice for consistency, code reuse, and computational cycles. That said, I'm solving a problem that doesn't exist yet, so I'm happy to move it back to s3.py
for now if you think that is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<shrug>
I think it's fine as-is.
desmod/workspacesync/s3.py
Outdated
username = getpass.getuser() | ||
except: | ||
username = "unknown" | ||
config['sim.workspace.s3_sync_prefix'] = username |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right policy. While it seems like certain desmod users might have a policy that requires organizing synchronized artifacts by username, I don't think it is universal enough to drive a default in desmod. Instead, I think that the default prefix should simply be "/" and users should have to compose their own prefix that incorporates usernames, timestamps, or other such metadata.
desmod/workspacesync/s3.py
Outdated
import os | ||
|
||
|
||
class S3Sync(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this could have been implemented as a regular function instead of as a class with a single method. I.e.
sync_to_s3(...)
versus:
S3Sync(...).sync()
I don't see it as imperative that we change this. It is not a user-facing API and we could thus revisit later if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I went with this approach to perhaps have common set of methods for multiple backend synchronizers, which could register themselves and perhaps one day be called in a loop like so:
for cls in sync_registry:
cls(config, artifacts).sync()
That said, that's a lot of hand waving, and we only have one sync backend now, so I probably shouldn't be solving problems that don't exist yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without using classes, each synchronization backend module could register its sync()
function. Lots of ways to skin this. As long as these syncronization backends are all driven by configuration and not by user-facing API, then we retain lots of license to revisit this detail.
I do think there is at least one compelling case for having a class: extract and check the configuration before simulation.
desmod/workspacesync/s3.py
Outdated
os.path.split(self.workspace)[1], | ||
(artifact[2:])) | ||
self.client.upload_file( | ||
artifact, self.config['sim.workspace.s3_sync_bucket'], dest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract the required configuration prior to submitting to the thread pool. The config is user input and thus may have all manner of garbage in it.
A good case for retaining the class-oriented strategy here would be so that we can extract configuration in __init__()
before simulation and then have confidence that the user provided valid configuration well before end of simulation.
class _Workspace(...):
def __init__(self, ...):
...
if config.setdefault('sim.sync.s3.enable', False):
self.s3_sync = S3Sync(...)
else:
self.s3_sync = None
...
def __exit__(self, ...):
...
if self.s3_sync:
self.s3_sync.sync()
class S3Sync(...):
def __init__(self, config, ...):
self.bucket = config['sim.sync.s3.bucket'] # user must provide bucket
self.prefix = config.setdefault('sim.sync.s3.prefix', '/')
...
desmod/simulation.py
Outdated
if len(configs) == 0: | ||
raise ValueError('No factor at index {}'.format(only_factor)) | ||
assert len(configs) == 1, 'exactly one config expected when ' \ | ||
'--only_factor is used' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string associated with this assert presumes a --only_factor
command line option, but that's not a valid assumption.
desmod/config.py
Outdated
:param int only_factor: | ||
When specified, only a single config will be yielded; the config will | ||
contain only the single factor at the specified index of the list of | ||
the cartesian product of the unrolled factors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain against plumbing this only_factor
kwarg into factorial_config()
. We can isolate this capability to simulate_factors()
. I know that imposes a somewhat larger runtime cost, but I see that cost as marginal and the conceptual complexity of spreading this feature across two modules as the more important concern.
desmod/simulation.py
Outdated
@@ -205,7 +220,7 @@ def simulate(config, top_type, env_type=SimEnvironment, reraise=True, | |||
|
|||
|
|||
def simulate_factors(base_config, factors, top_type, | |||
env_type=SimEnvironment, jobs=None): | |||
env_type=SimEnvironment, jobs=None, only_factor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if instead of having this new only_factor
argument, we instead had a filter function? I.e. the user would pass in a function that evaluated whether each config was either included or not. E.g.:
simulate_factors(..., config_filter=lambda config: config['meta.sim.index'] == 42)
This would make this feature more general. I could, for example, envision this being useful for at least two other applications:
- Selecting more than one index:
config_filter=lambda c: c['meta.sim.index'] in [42, 142]
. - Sampling configs from a very large configuration space:
config_filter: lambda c: random.random() < 0.1
. I.e. pick a random 10% of the factorial configs.
I think it would be more reasonable to push this more general capability into factorial_config()
than the specific only_factor
capability. Unfortunately, I don't think it would afford the same performance optimization that plumbing only_factor
affords.
N.B. if we pushed this filter function capability down to factorial_config()
, that function would have to take responsibility for assigning meta.sim.index
. We currently do that assignment in simulate_factors()
.
* only_factor -> config_filter * use setdefault() for some workspace sync config keys * default s3 sync prefix is now "" * don't importa boto3 or concurrent.futures until needed. * sim.workspace.s3_sync -> sim.sync.s3.enable (and similar for other s3 sync-related keys)
desmod/simulation.py
Outdated
@@ -5,6 +5,12 @@ | |||
from multiprocessing import cpu_count, Process, Queue | |||
from pprint import pprint | |||
from threading import Thread | |||
try: | |||
from future_builtins import filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use six
for Python 2/3 compatibility. To get filter
, we can unconditionally:
from six.moves import filter
desmod/config.py
Outdated
@@ -271,7 +271,7 @@ def factorial_config(base_config, factors, special_key=None): | |||
for keys, values_list in factors: | |||
unrolled_factors.append([(keys, values) for values in values_list]) | |||
|
|||
for keys_values_lists in product(*unrolled_factors): | |||
for idx, keys_values_lists in enumerate(product(*unrolled_factors)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove enumerate()
to completely unwind the changes in this function.
desmod/simulation.py
Outdated
os.chdir(self.config['sim.workspace']) | ||
self.sync() | ||
finally: | ||
os.chdir(self.orig_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When __exit__()
is called, we have a strong expectation that the current directory will be meta.sim.workspace
. In multi-simulation cases, that would be something like ws/42
. The new code here changes to the sim.workspace
directory prior to calling sync()
; i.e. just ws
, not ws/42
where this simulation's artifacts reside.
I question whether it is right for sync()
to operate outside of the individual simulation's workspace?
It seems like this paradigm of changing to the multi-sim session level directory (sim.workspace
) can only work correctly if the new config_filter
is used to limit the multi-simulation session to exactly one simulation. Otherwise, each of many simulations would call sync()
such that they would all try to sync ws/0
, ws/1
, ..., ws/N
.
To put a fine point on it, what I'm suggesting is that the control-flow here should be:
def __exit__(...):
self.sync()
os.chdir(self.prev_dir)
Sorry I didn't pick up on this in my first review.
desmod/simulation.py
Outdated
@@ -124,11 +131,12 @@ def schedule(self, delay=0): | |||
class _Workspace(object): | |||
"""Context manager for workspace directory management.""" | |||
def __init__(self, config): | |||
self.config = config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maintain that it is important to fully extract all configuration from config
here in _Workspace.__init__()
. Doing so would obviate the need to capture config
with the self.config
instance attribute. It would also demand that we instead instantiate an S3Sync
instance. I.e.:
def __init__(self, config):
self.s3_sync = S3Sync(config)
...
def sync(self):
if self.s3_sync.is_enabled:
...
self.s3_sync.sync(artifacts)
We also want S3Sync.__init__()
to follow suit and extract all needed config as instance attributes such that the config dict is not retained as an instance attribute and is thus not referenced in S3Sync.sync()
.
desmod/workspacesync/s3.py
Outdated
artifact, self.config['sim.sync.s3.bucket'], dest) | ||
|
||
def sync(self): | ||
"""Concurrently upload the artifacts to s3.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential problem with sync()
is that it does not sync--it only uploads new files.
I assume client.upload_file()
will happily overwrite files without error.
What is missing is any concept of removing files that exist in S3, but are not in the list of artifacts to synchronize. We would ostensibly need to interrogate S3 with list_objects()
to figure out which extra files (objects) need to be deleted.
However, doing that alone would leave extra per-simulation directories unaccounted for. E.g. using the same workspace, if I first run a multi-factor set of 10 simulations and then run again with only 5 simulations, even if we synchronized per-simulation directories 0..4 correclty, S3 would still contain bogus/stale directories 5..9.
tests/test_tracer.py
Outdated
@@ -37,6 +37,7 @@ def config(): | |||
'sim.vcd.start_time': '', | |||
'sim.vcd.stop_time': '', | |||
'sim.workspace': 'workspace', | |||
'sim.sync.s3.enable': False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like since sim.sync.s3.enable
defaults to False
, we should not have had to add this here. Is it really needed?
The sync code will not live in desmod. We've decided to take a different approach.
A new optional
config_filter
parameter for thesimulate_factors()
method isadded. The config filter is a function which (when not None), is passed the list
of configs from a multi-factor simulation. Only the simulations for which the
config filter function returns true are run.
This helps with the orchestrating the execution of multiple, per-factor,
simulations on separate machines (e.g., in the cloud) with a single command
(with multiple factors). For example, snowbird could pass the following
config_filter
argument tosimulate_factors()
:In the case that
args.only_factor = 1
, only the simulation at factor index 1would be executed, and the user would see the following artifacts (assuming
a workspace of
~/locker/test
):